Two-Phase & Three-Phase Commit

Problem

A transaction touches data on several independent participants, separate shards, databases, or services, and must be atomic across all of them: every participant commits or every participant aborts, with no partial outcome, even if a participant crashes or the network fails partway through. Letting each participant commit on its own invites exactly the failure a transaction exists to prevent, money debited on one node and never credited on the other.

This is a different problem from agreeing on an ordered log. Here the participants are fixed (the ones this transaction touched) and the decision is a single bit: commit or abort. The job is to get all of them to reach and honor that one decision despite failures.

Solution

Two-phase commit. In phase one a coordinator asks every participant to prepare: do the work, durably record it in a state from which it can still go either way, hold its locks, and vote. A yes vote is a binding promise that the participant will commit if told to and will not unilaterally abort. In phase two, if every participant voted yes the coordinator durably logs commit and instructs everyone to commit; if any voted no or timed out, it logs abort and instructs everyone to abort. The coordinator's durable decision is the single source of truth that makes the outcome atomic.

The weakness that shapes everything after 2PC is blocking. A participant that voted yes is uncertain: bound to the coordinator's eventual decision but unable to act alone. If the coordinator crashes after collecting votes but before broadcasting the decision, every prepared participant stalls, holding its locks, until the coordinator recovers and replays its log. The coordinator is both a single point of failure and a single point of blocking.

Three-phase commit inserts a pre-commit phase so a participant can infer the outcome from its own state and time out toward a decision rather than wait forever. It removes blocking only under a synchronous, partition-free network; during a real partition it can let two sides decide differently, so production almost never uses it, because it trades a safety property for a liveness one that isn't acceptable.

The approaches that scale remove the single blocking coordinator instead. One path replicates the commit decision through consensus so a coordinator crash strands no one: Spanner runs 2PC where each participant and the coordinator are themselves Paxos groups, layered on TrueTime, so any individual node failure is just a group failover. The other path abandons distributed atomic commit altogether and uses sagas: a chain of local transactions, each paired with a compensating transaction that undoes it. A saga holds no global locks and never blocks, at the cost of isolation, since intermediate states are visible and a failure is unwound by compensation rather than rolled back.

Tradeoffs

PropertyEffect
Atomicity2PC delivers all-or-nothing across participants, the reason to use it
BlockingA coordinator crash after voting strands prepared participants holding locks, the defining flaw
Lock durationParticipants hold locks across the full round-trip, hurting throughput and tail latency
Single point of failureThe coordinator's durable log is on the critical path for both commit and recovery
3PCNon-blocking only under synchrony with no partitions, and unsafe under partition, so rarely deployed
Consensus-backed commitReplicating the decision via Paxos or Raft removes blocking at the cost of extra latency
SagasNo blocking and no global locks, but no isolation and application-level compensation logic

Implementations

Minimal pseudocode (2PC)

# coordinator
def commit_txn(participants, txn):
for p in participants: # phase 1: prepare + vote
if p.prepare(txn) != YES:
broadcast(participants, ABORT); return ABORT
log.write(COMMIT) # durable decision point
for p in participants: # phase 2: deliver, retry until acked
p.commit(txn)
return COMMIT
# participant
def prepare(txn):
do_work(txn); persist(PREPARED) # can still commit OR abort
return YES # binding: now awaits the decision
def commit(txn):
apply(txn); persist(COMMITTED); release_locks()

The line that creates the blocking problem is return YES after persist(PREPARED): the participant has given up the right to decide on its own, so if the coordinator goes silent it can only wait.

XA transactions

The X/Open XA standard defines the interface between a transaction manager and resource managers (databases, message brokers) that implement 2PC, exposed in Java through JTA and XAResource. It's broadly supported across engines like Oracle, PostgreSQL, and MySQL, and it's also where teams meet 2PC's operational cost firsthand: a coordinator failure leaves in-doubt transactions holding locks that sometimes require manual resolution.

Distributed RDBMS

Sharded and distributed relational databases use 2PC internally to commit a transaction that spans nodes. PostgreSQL exposes the primitives directly with PREPARE TRANSACTION and COMMIT PREPARED, which write the prepared state to disk so the transaction survives a restart and can be completed or rolled back during recovery, the same prepare-then-decide shape as the protocol above.

Spanner-style and sagas

Spanner performs 2PC across the Paxos groups that own each affected key range, so every participant is a replicated state machine and a node crash becomes a failover rather than a stall; CockroachDB takes a similar approach over its Raft ranges. Where even that coordination is too costly, microservice architectures replace atomic commit with sagas, orchestrated by tools such as Temporal or Camunda or run as event choreography, accepting eventual consistency and writing explicit compensating actions in exchange for no distributed locks and no blocking coordinator.