Cache-Aside (Lazy Loading) | Microservices Course

Problem

Reading from the store on every request is slow and expensive. The same rows get fetched over and over, the database does redundant work, and read latency tracks disk and query time even when the data hasn't changed since the last request. You want a fast in-memory layer in front of the store, but you don't want to load everything into it up front, since most keys are never read and the working set shifts over time. You need a layer that holds only what's actually being asked for and fills itself on demand.

Solution

Put a cache in front of the store and let the application drive it. On a read, check the cache first. On a hit, return the cached value. On a miss, read from the store, write the value back into the cache, and return it. The cache fills lazily, one key at a time, in response to real reads, so it ends up holding the working set rather than the whole dataset.

The application owns the logic; the cache itself knows nothing about the store and never talks to it. The store stays the system of record, and the cache is a disposable copy that can be flushed and rebuilt by the next round of reads. Entries are bounded by a TTL and by an eviction policy (LRU and similar), so the cache stays within a fixed memory budget and discards keys that have gone cold.

Writes are where the staleness comes from. The cache and the store can disagree the moment the store changes, so a write has to either update the cached entry or, more commonly, delete it so the next read re-populates from the store. Deleting on write (sometimes called invalidation) is the usual default because it's simpler to reason about than keeping two copies in sync.

This is distinct from read-through and write-through, where the cache itself sits inline and loads or writes to the store on your behalf. In cache-aside the store is never hidden behind the cache; the application reads and writes both directly.

Tradeoffs

Property	Effect
Read latency	Hits served from memory; the main reason to do this
Memory footprint	Only requested keys are cached, bounded by TTL and eviction; the working set, not the whole dataset
Resilience	A cache outage degrades to direct store reads rather than failing; the store is always reachable
Cold start	An empty or just-flushed cache sends every read through to the store until it refills, a load spike the store must absorb
Staleness	Cache and store diverge after a write until the entry expires or is invalidated; reads can be stale for up to the TTL
Write path	The application must invalidate or update on every write, and missing one leaves a stale entry indefinitely
Thundering herd	Many concurrent misses on the same hot key can stampede the store at once unless you coalesce the fills

Implementations

Minimal pseudocode

def get(key):
    val = cache.get(key)
    if val is not None:
        return val                 # hit
    val = db.read(key)             # miss: fall through to the store
    cache.set(key, val, ttl=300)   # populate for next time
    return val

def put(key, val):
    db.write(key, val)
    cache.delete(key)              # invalidate; next get re-populates

Redis in front of SQL

The common production setup: Redis holds serialized rows or query results keyed by id, with a per-key TTL and an eviction policy (maxmemory-policy, e.g. allkeys-lru) capping memory. The application reads Redis, falls through to Postgres or MySQL on a miss, and SETs the result back with an expiry. Writes go to SQL and then DEL the key. To blunt the thundering herd on hot keys, fills are coalesced behind a short lock so only one request rebuilds a missing key while others wait. Docs: https://redis.io/docs/latest/develop/use-cases/caching/.

Memcached in front of SQL

The older and simpler of the two, a pure in-memory key/value store with no persistence, used the same way: get the key, on miss read SQL and set it back with a TTL, delete on write. Memcached has no built-in fall-through to a store, which is exactly the point of cache-aside, the application does it. Its slab allocator and LRU eviction keep it within a fixed memory ceiling. This is the pattern Facebook scaled to enormous read volume in front of MySQL. Docs: https://docs.memcached.org/.