Problem
A single node handles every read and write, so reads compete with writes for the same CPU, memory, and disk. A read-heavy workload pushes that one machine to its limit, and you can't relieve it by spreading reads elsewhere because nowhere else holds the data. If the machine fails, the data is unreachable until it recovers, and any writes not yet flushed to durable storage are lost. There is no second copy to serve traffic in the meantime or to take over in its place.
Solution
Designate one node the leader and route every write to it. The leader records each change to a log and ships that log to one or more followers, which apply the changes in the same order and converge to the same state. Reads can then be served by any node, so directing them to followers offloads the leader and lets you scale read capacity by adding more followers. The replication stream is usually the write-ahead log itself or a logical derivative of it, the same sequential log the WAL piece relied on, reused here as the channel between nodes.
The leader can wait for followers before confirming a write or not, and this choice is the central knob. Synchronous replication has the leader block until a follower acknowledges, so a confirmed write survives the leader's death, but a slow or stalled follower stalls every write behind it. Asynchronous replication confirms immediately and lets followers catch up on their own, which keeps the write path fast but means followers lag and a leader crash can lose any writes that hadn't propagated yet. Semi-synchronous is the middle setting: one follower acknowledges synchronously, the rest trail asynchronously.
Failover is where the design earns its difficulty. When the leader dies you must detect that it died (a timeout, but choosing the length trades false promotions against longer downtime), pick the follower with the most up-to-date log, redirect writes to it, and reconfigure the recovered old leader as a follower so it stops accepting writes. Two failures here are expensive: split-brain, where two nodes both believe they lead and accept conflicting writes, and lost writes, where an async failover promotes a follower that never received the leader's last few writes. Production systems lean on a separate coordination component to make the promotion decision rather than letting nodes decide for themselves.
Tradeoffs
| Property | Effect |
|---|---|
| Read scaling | Add followers to serve more reads: the main reason to do this |
| Write throughput | Bottlenecked by the single leader; every write serializes through one node, and followers don't help |
| Replication lag | Async followers serve stale data; a client can fail to read its own recent write |
| Durability | Synchronous replication survives leader loss; async can drop the most recent writes |
| Failover | Automatic promotion is complex and error-prone (split-brain, lost writes, timeout tuning): the hard part |
| Read consistency | Read from the leader for fresh data, from followers for scale; you pick per query |
Implementations
Minimal pseudocode
# leader: apply the write, log it, propagate to followersdef write(leader, change):leader.apply(change)leader.log.append(change) # the replication streamfor f in leader.followers:f.send(change) # async: don't wait# sync variant: await f.ack() before returning# follower: apply changes in the leader's orderdef follow(follower, stream):for change in stream:follower.apply(change) # same order as the leaderfollower.ack()# read: route by freshness needdef read(query, need_fresh):node = leader if need_fresh else pick_follower()return node.query(query)# failover: promote the most caught-up followerdef promote(followers):new_leader = max(followers, key=lambda f: f.log_offset)redirect_writes_to(new_leader)return new_leader
PostgreSQL / MySQL replication
PostgreSQL streaming replication ships WAL records from the primary to standby servers, which replay them to stay current; you configure each standby as asynchronous or synchronous, and synchronous_commit controls how many standbys must acknowledge before a commit returns. MySQL replicates by streaming the binary log (binlog) of committed changes to replicas, which apply them either by statement or by row. Neither ships built-in automatic failover; promotion is handled by external tooling (Patroni for Postgres, Orchestrator for MySQL). Postgres docs: https://www.postgresql.org/docs/current/warm-standby.html.
Redis primary–replica
Replicas connect to a primary and receive an asynchronous stream of the commands the primary executed, applying them to mirror its dataset; replicas serve reads while writes stay on the primary. Failure detection and promotion come from Redis Sentinel, a separate set of processes that monitor the primary, agree it is down, and promote a replica. Docs: https://redis.io/docs/management/replication/.
MongoDB replica sets
A replica set is one primary and several secondaries that replicate the primary's operation log (oplog). When the primary becomes unreachable, the members hold an election using a Raft-derived protocol and one secondary becomes the new primary automatically, which is the main thing that distinguishes it from the databases above: failover is built in rather than bolted on. Docs: https://www.mongodb.com/docs/manual/replication/.