Write-Through / Write-Behind | Microservices Course

Problem

Cache-aside puts the cache in front of reads but leaves writes going straight to the store, with the cache invalidated afterward. That keeps writes durable but does nothing for write latency or write volume: every write pays the full store round-trip, and a burst of writes to the same key hits the store once per write even though only the last value matters. When the write path is hot, the store becomes the bottleneck, and there's no layer absorbing or smoothing the load the way the read cache absorbs reads.

The question is what role the cache plays on a write. You can make the cache part of the write path, but then you have to choose what "written" means: written to the store as well, or only written to the cache with the store catching up later. That choice is the whole pattern.

Solution

Route writes through the cache instead of around it. Two variants, differing only in when the store is updated.

Write-through writes the cache and the store together, synchronously, and acknowledges only after both succeed. The cache and the store never disagree, and a read served from the cache is always current. The cost is that every write pays the store's latency; the cache adds nothing to write speed and a little overhead.

Write-behind (also called write-back) writes the cache, acknowledges immediately, and updates the store later in the background. Writes return at memory speed. The flush can coalesce repeated writes to the same key into a single store write and batch many keys into one store round-trip, so write throughput against the store drops sharply. The cost is a window in which the new value exists only in the cache; a crash in that window loses it unless the cache itself is durable or backed by a log.

In both, the cache is inline on the write path and the store is no longer written directly by the application. Reads are served from the cache, which is authoritative for the keys it holds. Write-behind's durability gap is the same problem the WAL piece solved: write a cheap sequential log entry synchronously for durability, and let the expensive update to the main structure happen lazily afterward.

Tradeoffs

Property	Effect
Write latency	Write-through pays the full store latency on every write; write-behind returns at cache speed
Consistency	Write-through keeps cache and store in lockstep, so cached reads are never stale; write-behind lets them diverge until the flush
Durability	Write-through loses nothing on crash; write-behind loses any un-flushed writes unless the cache is durable or log-backed
Store throughput	Write-behind coalesces repeated writes per key and batches across keys, cutting store write volume; write-through does neither
Read latency	Hot keys served from cache in both; this is the same win as cache-aside
Failure handling	Write-through fails the write when the store is down; write-behind absorbs the outage into its queue but widens the loss window and must apply back-pressure when the queue fills
Complexity	Write-through is straightforward; write-behind needs a flush queue with ordering, retry, and back-pressure, and is the harder of the two to get right

Implementations

Minimal pseudocode

# write-through: cache and store written together, ack after both
def write_through(key, val):
    cache.set(key, val)
    db.write(key, val)            # synchronous; ack only after this
    return ack

# write-behind: ack after the cache, store updated later in batches
def write_behind(key, val):
    cache.set(key, val)
    queue.enqueue(key)            # ack immediately; flush is deferred
    return ack

def flush(queue, db):             # runs in the background
    batch = queue.drain()         # coalesces repeats, groups many keys
    db.write_many(batch)          # one store round-trip for the batch

DynamoDB Accelerator (DAX)

A write-through cache in front of DynamoDB. A write goes through DAX to the underlying table and returns only after the DynamoDB write succeeds, then the item is updated in the DAX cache, so reads through DAX see the value just written. You get sub-millisecond cached reads without a staleness gap on the write path, at the cost of every write carrying DynamoDB's write latency. Docs: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.concepts.html.

Database buffer pools

The buffer pool (InnoDB's, Postgres's shared buffers) is a write-behind cache over the data files. A modified page is changed in memory, marked dirty, and acknowledged before it reaches disk; background writers and checkpoints flush dirty pages to the data files later, coalescing many changes to a page into fewer disk writes. The loss window is closed by the write-ahead log: the redo record is forced to disk at commit, so a crash replays the log to recover changes the data files hadn't received yet. Write-behind for the pages, made durable by a synchronous log. InnoDB: https://dev.mysql.com/doc/refman/8.0/en/innodb-buffer-pool.html.

CDN origins (push model)

Most CDN caching is read-through: an edge pulls from the origin on a miss, which is closer to cache-aside. A push CDN inverts the write path, you upload content to the CDN and it propagates to the edges, so the edge is populated on write rather than on first read. That makes it write-through-shaped: the write reaches the serving layer before any client asks for it, which suits large assets you know will be requested and want warm everywhere ahead of time. Cloudflare's model: https://developers.cloudflare.com/cache/.