Back to Notes

Cache Stampede: The Invisible Killer of Backend Systems

A cache stampede happens when many requests hit the backend at the same time because a cached item expired or is missing. Instead of serving from cache, all requests try to recompute or fetch the data simultaneously, overwhelming the backend.

Imagine a high-traffic news website. It's Tuesday morning, and a major breaking news story drops. Millions of users are hitting the site simultaneously. To handle the load, the article is cached in Redis for 60 seconds.

Everything runs smoothly until... tick-tock... the 60th second passes. The cache entry expires.

Suddenly, those millions of simultaneous requests find an empty cache. Every single request falls through to the database or the heavy computation logic to "recompute" the data and repopulate the cache. This is a Cache Stampede, also known as the Thundering Herd Problem.

Cache Stampede: Solving the Thundering Herd Problem

Battle-Tested Solutions

The good news? Cache stampedes are preventable. Over the years, engineers at companies like Facebook, Netflix, and Amazon have developed several battle-tested strategies to mitigate or entirely eliminate this problem. Here are five proven approaches, from simple quick fixes to sophisticated probabilistic algorithms.

1. TTL Jitter (Randomized Expiration)

One common cause of stampedes is "Synchronized Expiration." If you cache 1,000 products at once with a 1-hour TTL, they will all expire at the exact same second one hour later.

The Solution: Add a small random "jitter" to the expiration time. This spreads the load over a wider window.

Example: Instead of setting a hard EXPIRY = 3600, use a range:

1const baseTTL = 3600 // 1 hour
2const jitter = Math.floor(Math.random() * 300) // 0-5 minutes
3// Future keys will expire between 1:00:00 and 1:05:00
4redis.set(key, value, "EX", baseTTL + jitter)
Real-World Usage

CDNs like Cloudflare and AWS CloudFront use TTL jitter to prevent synchronized cache expiration across their edge servers. Social media platforms like Twitter and Instagram apply jitter to user feed caches, ensuring that millions of users don't trigger simultaneous cache refreshes when their timelines expire.

2. Mutex / Cache Locking

When a cache miss occurs, the system ensures that only one request is allowed to fetch the data from the source. Others must wait or serve stale data.

The Solution: Use a distributed lock (like Redis SETNX).

Example:

1async function getData(key) {
2 let val = await redis.get(key)
3 if (!val) {
4 // Try to acquire a lock for 5 seconds
5 const lockAcquired = await redis.set(`lock:${key}`, "1", "NX", "EX", 5)
6
7 if (lockAcquired) {
8 val = await fetchFromDB(key)
9 await redis.set(key, val, "EX", 3600)
10 await redis.del(`lock:${key}`)
11 } else {
12 // Someone else is already fetching.
13 // Wait 100ms and retry the cache.
14 await sleep(100)
15 return getData(key)
16 }
17 }
18 return val
19}
Real-World Usage

E-commerce giants like Amazon and Shopify use mutex locks for high-value cache keys such as product pages during flash sales. When thousands of users try to view a popular product simultaneously, only one request fetches from the database while others wait for the cache to populate. Financial trading platforms also rely on mutex locks to prevent race conditions when updating cached market data.

3. Stale-While-Revalidate (SWR)

SWR decouples the expiration from the eviction. It stores data with a "soft" expiry and a "hard" expiry.

The Solution: If a request arrives after the "soft" expiry, return the stale data immediately but trigger an asynchronous background task to refresh the cache.

Example: Imagine a leaderboard.

If a user hits the cache at second 65, they get the data from second 0 instantly, and the system kicks off a background update for the next user.

Real-World Usage

Vercel (the creators of the SWR library) uses this pattern extensively in their Next.js framework for ISR (Incremental Static Regeneration). GitHub employs SWR for their dashboard and repository pages, serving slightly stale data instantly while refreshing in the background. News sites like The New York Times use SWR for article pages, prioritizing speed over perfect freshness.

4. Probabilistic Early Recompute (XFetch)

This is the most advanced solution. It uses a mathematical formula to decide whether to recompute the data before it expires, based on how long the recomputation usually takes.

The Solution: As the expiration time gets closer, the probability that a request will trigger an early refresh increases.

Example: If recomputing a heavy report takes 2 seconds, and the TTL is 60 seconds:

Real-World Usage

Facebook (now Meta) developed and open-sourced XFetch to handle their massive scale. They use it for News Feed rankings and friend suggestions, where recomputation is expensive but freshness is critical. High-traffic systems like LinkedIn and Reddit have adopted similar probabilistic approaches for their recommendation engines.

5. Cache Warming

Sometimes the best defense is a good offense. Don't wait for a user to trigger a cache miss.

The Solution: Proactively populate the cache using background workers or triggers.

Example: If you're launching a marketing campaign at 10:00 AM:

  1. At 9:55 AM, a cron job runs.
  2. It fetches all "hero" products and campaign data.
  3. It primes the cache globally. When 10:00 AM hits, the first million users find a "warm" cache waiting for them.
Real-World Usage

Netflix pre-warms their CDN caches before major releases (like new seasons of popular shows) by pushing content to edge servers hours in advance. E-commerce sites like Amazon Prime Day and Alibaba's Singles Day use cache warming to preload product catalogs and pricing data before the sale begins. News organizations warm caches before scheduled events like elections or major sports events.


Comparison of Solutions

Summary

Cache Stampedes can take down even the most robust systems if not handled. While TTL Jitter is a great "quick fix," high-scale systems often combine Mutex Locking with SWR or Cache Warming to ensure that the backend stays protected even during the most intense traffic storms.


Takeaway: Don't let your cache be a single point of failure. Plan for the moment it expires.

Back to Notes