ZERONE
Back to insights
Infrastructure2026-04-18 · 6 min read

Cache without a lock is thundering herd: 14 endpoints, 8 workers, one dead database

Cache expiry is the one moment when parallel workers all get expensive at the same time. Without a per-key lock, every refresh dumps your full load down the slowest path.

Symptoms: PostgreSQL primary with load spikes of 15–17 on a 5-minute cadence. Backend timeouts. Monitoring showed brief 502 bursts, then back to 200. Every time the spike faded, nothing suspicious was visible on the backend.

First guess: long-running connections. No. Second guess: a particular batch job. Also no.

The truth was in the cache.

What was happening

A central monitor endpoint returned an aggregate — COUNT(*) FILTER (... WHERE quality_score >= 80) over 1.4 M rows. Query time: roughly 450 ms under normal load. Cached for 5 minutes in Redis. Every 5 minutes the cache expired.

The backend ran with eight uvicorn workers. In the split second after cache expiry, requests hit all eight workers simultaneously. Each worker checked the cache, found it empty, and fired the aggregation in parallel. Eight concurrent COUNT(*) FILTER queries on the same table → buffer thrashing, lock contention, disk-I/O jam. PostgreSQL buckles.

The first worker to finish writes the result to the cache. The other seven throw theirs away. Eight queries for one answer.

Why 14 endpoints instead of one

In the audit we found 14 other endpoints with the same pattern. All of them used a simple @cache(ttl=300) decorator. None had a lock.

The implementation looked harmless:

async def get_quality_count():
    cached = await redis.get("quality_count")
    if cached:
        return int(cached)
    # Cache miss → ALL workers race to fill
    count = await run_expensive_query()
    await redis.set("quality_count", count, ex=300)
    return count

On a dev machine with one worker: perfect. In production with eight workers and a tightly cadenced traffic pattern: catastrophe.

The fix: one lock per key, two levels

from redis.asyncio import Redis
import asyncio
from contextlib import asynccontextmanager

_worker_locks: dict[str, asyncio.Lock] = {}

@asynccontextmanager
async def cache_lock(key: str, redis: Redis, ttl: int = 30):
    # Short-circuit inside the same worker
    local = _worker_locks.setdefault(key, asyncio.Lock())
    async with local:
        # Distributed lock across workers
        got = await redis.set(f"lock:{key}", "1", ex=ttl, nx=True)
        if got:
            try:
                yield True
            finally:
                await redis.delete(f"lock:{key}")
        else:
            # Another worker is rebuilding the cache. Wait briefly
            # then read it back.
            await asyncio.sleep(0.1)
            yield False

async def get_quality_count():
    cached = await redis.get("quality_count")
    if cached is not None:
        return int(cached)

    async with cache_lock("quality_count", redis) as have_lock:
        if have_lock:
            count = await run_expensive_query()
            await redis.set("quality_count", count, ex=300)
            return count
        cached = await redis.get("quality_count")
        if cached is not None:
            return int(cached)
        return await run_expensive_query(timeout=2.0)

Two levels are intentional:

  1. asyncio.Lock per worker — prevents a single worker from firing the same query multiple times if several coroutines coincide.
  2. Redis lock globally — prevents N workers from all rebuilding the cache at the same time.

Outcome

After the rollout on three critical endpoints:

  • DB CPU load stable at 53 % (previously peaking at 91 %).
  • Query P99 on the monitor endpoint dropped from 1.8 s to 210 ms.
  • Zero backend timeouts in the following week.

What we learned

  • A cache without a lock is a time bomb that detonates exactly once per TTL interval. The shorter the TTL, the more often.
  • The decorator approach (@cache(ttl=300)) is invisible in single-worker dev environments and fails only under load.
  • Audit requirement on every fix: if a pattern is wrong on one endpoint, it's probably wrong on several. Find them all.

The core principle: "If it's expensive and can happen in parallel, you need a lock per key, not per request."

A similar fire?

We've likely seen something close. Get in touch.

Talk to us