Read Replicas & Replication Lag | Microservices Course

Problem

The leader-follower piece showed that adding followers scales reads, since any of them can serve a query. The catch is what the same piece flagged about asynchronous replication: a follower trails the leader, so a write committed on the leader has not yet reached the replica. Route a read to a lagging replica and it returns stale data, which surfaces as concrete, user-visible anomalies rather than an abstract inconsistency.

The common ones: a user edits their profile and immediately reads it back from a replica that hasn't caught up, sees the old value, and concludes the save failed (a read-your-writes violation). Two successive reads land on replicas with different lag, so the second read returns an older state than the first and data appears to vanish or move backward in time (a monotonic reads violation). Across partitions, a reader can see an effect before its cause when the relevant writes replicate at different speeds.

Lag is also not a small constant you can wave away. It spikes under write bursts, when a replica restarts and replays a backlog, during long transactions, and on network trouble, so any design that assumes "the replica is basically current" breaks exactly when load is highest. And a replica that has fallen far behind is a liability on failover: promoting it loses every acknowledged write it never received, the durability cost the leader-follower piece attributed to asynchronous replication.

Solution

Choose the guarantee per read and route accordingly, spending only as much as each read needs. The techniques run from cheap and weak to expensive and strong.

Most reads tolerate seconds of staleness, a timeline or a search result among them, and those go to replicas freely, which is the entire reason replicas exist. For the reads that can't tolerate it, the simplest fix is to send them to the leader: after a user writes, route that user's reads to the leader for a short window so they always see their own change. It is one line of routing logic, but it pushes load back onto the leader and undoes the scaling if you apply it broadly.

The scalable version uses a position token. When the leader commits a write it returns its log position, the same offset the WAL and replication-stream pieces relied on, exposed as an LSN in Postgres or a GTID in MySQL. The client carries that token and presents it on the next read, and the router picks a replica that has already applied up to at least that position, waits briefly for one to catch up, or falls back to the leader if none has. This gives read-your-writes without funneling everything through the leader. Pinning a session to one replica (sticky routing) handles monotonic reads, since a user who always reads from the same replica never goes backward. Bounded-staleness goes further by having a replica refuse or delay a read until its lag is under a threshold, and a lag-aware pool simply removes a replica from rotation once it falls too far behind.

Tradeoffs

Property	Effect
Read scaling	Add replicas to serve reads cheaply: the main reason to do this
Staleness	Async lag means replica reads can be behind, surfacing as read-your-writes and monotonic-read anomalies
Read-your-writes cost	Leader-pinning is trivial but reloads the leader; position tokens keep the scaling but need plumbing through the client
Lag variability	Lag spikes under load and on restart, so any bounded-staleness assumption fails when it matters most
Routing complexity	Per-read consistency requires a router that knows each replica's applied position
Failover durability	Promoting a lagging replica loses acknowledged writes; synchronous replication avoids it by paying write latency

Implementations

Minimal pseudocode

# write returns the leader's log position so the client can read its own writes
def write(leader, change):
    leader.apply(change)
    return leader.commit_position()        # LSN (Postgres) / GTID (MySQL)

# read: serve from a replica caught up to `since`, else fall back to the leader
def read(query, since=None):
    if since is None:
        return any_replica().query(query)              # staleness acceptable
    for r in replicas_sorted_by_position(descending=True):
        if r.applied_position() >= since:              # has the user's write
            return r.query(query)
    return leader.query(query)                          # none caught up yet

# keep laggards out of the read pool
def read_pool(max_lag):
    return [r for r in replicas if r.lag_seconds() <= max_lag]

RDBMS read replicas (Postgres / MySQL)

Streaming replication ships the leader's WAL in Postgres or binlog in MySQL to read replicas, and applications offload SELECTs to them. Both expose a position the client can reason about: Postgres surfaces the replay LSN and lets a session wait for a replica to reach a given LSN, while MySQL tracks GTIDs and offers WAIT_FOR_EXECUTED_GTID_SET to block until a replica has executed a specific transaction. Managed offerings (RDS and Aurora read replicas, Cloud SQL) package this with a replica-lag metric you alert on, and read-your-writes is handled by routing recent writers to the primary or waiting on a position. Postgres docs: https://www.postgresql.org/docs/current/hot-standby.html.

PlanetScale / Vitess

Vitess, which PlanetScale is built on, routes reads by tablet type: querying keyspace@primary gives read-after-write consistency from the primary, while keyspace@replica load-balances across replicas for eventually consistent reads meant for OLTP traffic (and @rdonly for analytics). This lets the same application target strong or stale reads per query through the connection or a USE statement. For the in-between case, Vitess can route a read to a replica only once that replica has executed at least a given GTID, providing causal consistency while still offloading the read from the primary. Docs: https://vitess.io/docs/reference/features/tablet-types/.