Eviction Policies | Microservices Course

Problem

A cache holds far less than the store behind it, so once it's full, admitting a new entry means dropping a resident one. Which one you drop sets the hit rate, and the hit rate is the entire point of the cache: a few points of swing translates into large changes in load on the store and in tail latency, since every miss pays the full back-end cost. The decision is a prediction problem with no future information. You're guessing which resident entry is least likely to be read again, from the access history alone, under a tight budget for the bookkeeping (a cache that spends a kilobyte of metadata per entry to make perfect decisions has defeated its own purpose).

There are actually two decisions hiding here. Eviction picks which resident entry leaves when space is needed. Admission decides whether a newly requested object should enter the cache at all. Most policies only do the first and admit everything; the second turns out to matter a great deal in settings where many objects are requested exactly once.

Solution

Approximate "least likely to be reused" from what you've observed. The usable signals are recency (recently used implies soon again) and frequency (often used implies often again), and the policies differ in which signal they trust and how cheaply they track it.

LRU drops the least-recently-used entry. It's O(1) with tiny per-entry state and does well when recency predicts reuse, which covers a lot of traffic. Its failure mode is a scan: a one-time pass over many cold items pushes the genuinely hot set out the back.

LFU drops the least-frequently-used entry, which suits a stable, skewed hot set. Without an aging mechanism, an item that was popular long ago lingers forever, so practical LFU decays its counts over time. It carries more bookkeeping than LRU and ignores recency.

ARC adapts between the two. It keeps one list for entries seen once and another for entries seen more than once, plus non-resident "ghost" lists recording recently evicted keys; a hit on a ghost list shifts the target size toward whichever signal is currently paying off, so it self-tunes and resists scans. The costs are real: the ghost lists roughly double the metadata, and the algorithm is patented by IBM, which kept it out of much open-source software.

TinyLFU and Window-TinyLFU separate admission from eviction. A compact frequency sketch (a count-min sketch, halved periodically so old counts decay) estimates how often any key has been seen, including keys not currently resident. When a new candidate would evict a victim, the candidate is admitted only if its estimated frequency exceeds the victim's. Window-TinyLFU puts a small LRU window in front to catch short recency bursts before the frequency filter judges them, feeding a larger segmented-LRU main region. This reaches hit rates close to the theoretical optimum at a couple of bytes per entry.

Belady's optimal (MIN) evicts the entry whose next use is furthest in the future. It needs the future, so it can't be implemented; it exists as the upper bound the others are measured against.

Admission control is the CDN-scale lesson that admitting every object on first miss is a mistake. A large share of CDN objects are requested exactly once, and admitting those one-hit-wonders churns the genuinely popular set out of a small cache. So the edge decides whether to admit at all, by frequency (admit only after N sightings) or by size (don't let one huge object evict many small popular ones). On a hot-object cache, getting admission right moves the hit rate more than any eviction tweak.

The common thread: cheap policies ride one signal, the strong ones combine recency and frequency and keep some memory of keys beyond what's resident (ghost lists in ARC, the frequency sketch in TinyLFU), and a small-cache CDN adds an admission gate because the cost of admitting the wrong object is steep.

Tradeoffs

Policy	Effect
LRU	O(1) with tiny metadata, strong on recency-skewed traffic; collapses under a scan and ignores frequency
LFU	Holds a stable hot set; without aging, once-popular items overstay, and it ignores recency at higher bookkeeping cost
ARC	Adapts recency vs frequency automatically and resists scans; ghost lists roughly double metadata, and the IBM patent limited where it could be used
W-TinyLFU	Near-optimal hit rate at about two bytes per entry; more moving parts, the sketch must be aged, and the window needs tuning
Admission control	Avoids polluting a small cache with one-hit-wonders, a large hit-rate gain on CDN traffic; a wrong rejection costs a hit and the threshold needs tuning to the traffic
Belady / MIN	Optimal, but unimplementable; useful only as the benchmark

Implementations

Minimal pseudocode

# LRU: promote on access, evict from the tail
def get(key):
    if key in cache:
        order.move_to_front(key)          # now most-recently-used
        return cache[key]

def evict_lru():
    drop(order.tail())                    # least-recently-used leaves

# TinyLFU admission: let the newcomer in only if it beats the victim
def admit(candidate, victim, sketch):
    return sketch.estimate(candidate) > sketch.estimate(victim)
# sketch is a count-min sketch, halved periodically so old counts decay

Redis LRU / LFU

Redis caps memory with maxmemory and chooses victims by maxmemory-policy (allkeys-lru, allkeys-lfu, volatile-*, noeviction, and so on). It does not keep a true LRU list, which would cost too much memory; instead each object carries a small clock field, and on eviction Redis samples maxmemory-samples random keys (default 5, raise toward 10 for accuracy) and drops the oldest from the sample, maintaining a candidate pool across rounds. LFU, added in 4.0, replaces the clock with a logarithmic (Morris) counter plus a decay so old popularity fades, tuned by lfu-log-factor and lfu-decay-time. Docs: https://redis.io/docs/latest/develop/reference/eviction/.

Caffeine (Window-TinyLFU)

The Java caching library uses W-TinyLFU as its default. A small admission-window LRU feeds a large segmented-LRU main region, and a TinyLFU frequency sketch decides, on each eviction, whether the entry leaving the window should replace the main region's victim. The sketch is aged by periodically halving its counters so the cache tracks shifting popularity, and the window size self-adjusts by hill climbing on the observed hit rate. The result lands within roughly a few percent of Belady's optimal on real-world traces while adding about two bytes of overhead per entry. Design notes: https://github.com/ben-manes/caffeine/wiki/Efficiency.

CDN admission control (AdaptSize)

On a CDN's hot-object memory cache the decisive question is admission, not eviction, because many objects are evicted before they're ever requested a second time, so admitting everything wastes the small cache on one-hit-wonders. AdaptSize is a size-aware admission policy built on Varnish that continuously retunes its admission threshold with a Markov model of the traffic, and its authors report substantially higher object hit ratios than stock Varnish across production traces. The simpler production technique is frequency-based admission: only cache an object after it has been seen N times, often using a Bloom or counting filter to track the sightings. Paper: https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/berger.