Cache Stampede: Solving the Thundering Herd Problem

Cache Stampede: The Invisible Killer of Backend Systems

A cache stampede happens when many requests hit the backend at the same time because a cached item expired or is missing. Instead of serving from cache, all requests try to recompute or fetch the data simultaneously, overwhelming the backend.

Imagine a high-traffic news website. It's Tuesday morning, and a major breaking news story drops. Millions of users are hitting the site simultaneously. To handle the load, the article is cached in Redis for 60 seconds.

Everything runs smoothly until... tick-tock... the 60th second passes. The cache entry expires.

Suddenly, those millions of simultaneous requests find an empty cache. Every single request falls through to the database or the heavy computation logic to "recompute" the data and repopulate the cache. This is a Cache Stampede, also known as the Thundering Herd Problem.

Cache Stampede: Solving the Thundering Herd Problem

Battle-Tested Solutions

The good news? Cache stampedes are preventable. Over the years, engineers at companies like Facebook, Netflix, and Amazon have developed several battle-tested strategies to mitigate or entirely eliminate this problem. Here are five proven approaches, from simple quick fixes to sophisticated probabilistic algorithms.

1. TTL Jitter (Randomized Expiration)

One common cause of stampedes is "Synchronized Expiration." If you cache 1,000 products at once with a 1-hour TTL, they will all expire at the exact same second one hour later.

The Solution: Add a small random "jitter" to the expiration time. This spreads the load over a wider window.

Example: Instead of setting a hard EXPIRY = 3600, use a range:

1const baseTTL = 3600 // 1 hour
2const jitter = Math.floor(Math.random() * 300) // 0-5 minutes
3// Future keys will expire between 1:00:00 and 1:05:00
4redis.set(key, value, "EX", baseTTL + jitter)

2. Mutex / Cache Locking

When a cache miss occurs, the system ensures that only one request is allowed to fetch the data from the source. Others must wait or serve stale data.

The Solution: Use a distributed lock (like Redis SETNX).

Example:

1async function getData(key) {
2  let val = await redis.get(key)
3  if (!val) {
4    // Try to acquire a lock for 5 seconds
5    const lockAcquired = await redis.set(`lock:${key}`, "1", "NX", "EX", 5)
6
7    if (lockAcquired) {
8      val = await fetchFromDB(key)
9      await redis.set(key, val, "EX", 3600)
10      await redis.del(`lock:${key}`)
11    } else {
12      // Someone else is already fetching.
13      // Wait 100ms and retry the cache.
14      await sleep(100)
15      return getData(key)
16    }
17  }
18  return val
19}

3. Stale-While-Revalidate (SWR)

SWR decouples the expiration from the eviction. It stores data with a "soft" expiry and a "hard" expiry.

The Solution: If a request arrives after the "soft" expiry, return the stale data immediately but trigger an asynchronous background task to refresh the cache.

Example: Imagine a leaderboard.

Soft Expiry: 60 seconds (Data is "stale" but usable)
Hard Expiry: 300 seconds (Data is too old, must block)

If a user hits the cache at second 65, they get the data from second 0 instantly, and the system kicks off a background update for the next user.

4. Probabilistic Early Recompute (XFetch)

This is the most advanced solution. It uses a mathematical formula to decide whether to recompute the data before it expires, based on how long the recomputation usually takes.

The Solution: As the expiration time gets closer, the probability that a request will trigger an early refresh increases.

Example: If recomputing a heavy report takes 2 seconds, and the TTL is 60 seconds:

At 30 seconds, there's a 1% chance a random request will refresh it.
At 58 seconds, there's a 90% chance. This ensures a "lucky" request refreshes the cache before it ever hits zero, virtually eliminating the chance of a stampede.

5. Cache Warming

Sometimes the best defense is a good offense. Don't wait for a user to trigger a cache miss.

The Solution: Proactively populate the cache using background workers or triggers.

Example: If you're launching a marketing campaign at 10:00 AM:

At 9:55 AM, a cron job runs.
It fetches all "hero" products and campaign data.
It primes the cache globally. When 10:00 AM hits, the first million users find a "warm" cache waiting for them.

Comparison of Solutions

TTL Jitter: Best for predicting traffic spikes. (Low Complexity)
Mutex Lock: Best for single high-value keys like homepages. (Medium Complexity)
SWR: Best for user-facing UIs where speed is more important than perfectly fresh data. (Medium Complexity)
XFetch: Best for maximum consistency with zero downtime in high-traffic systems. (High Complexity)
Cache Warming: Best for scheduled events and major product launches. (Low/Medium Complexity)

Summary

Cache Stampedes can take down even the most robust systems if not handled. While TTL Jitter is a great "quick fix," high-scale systems often combine Mutex Locking with SWR or Cache Warming to ensure that the backend stays protected even during the most intense traffic storms.

Takeaway: Don't let your cache be a single point of failure. Plan for the moment it expires.

Back to Notes