Not quite everywhere. When you have enough traffic, the load spikes resulting from putting cache generation in the user path becomes seriously painful, so often a separate process is responsible instead.
(Imagine 5,000 threads all deciding they want exactly the same data at exactly the same time, then trying to write it to exactly the same location. Now imagine 50,000 more threads trying to do exactly the same thing because of the delays caused by the first set. Now imagine your web site is down and your mobile phone is ringing)
Yes, the thundering herd problem. While the site may be briefly less responsive, if the traffic is all to a single piece of content as long as one of them goes through you'll end up with the content in cache and then the load immediately drops.
The bigger problem is when your entire cache is cold (ex. memcached was restarted) and there's a ton of traffic to lots of different content. A single piece of content should not be that crippling, unless it's stupidly-slow to render and cache.
(Imagine 5,000 threads all deciding they want exactly the same data at exactly the same time, then trying to write it to exactly the same location. Now imagine 50,000 more threads trying to do exactly the same thing because of the delays caused by the first set. Now imagine your web site is down and your mobile phone is ringing)