Problem
You want only one node at a time to act on a resource: write to a file, serve as the primary, run a scheduled job. A plain lock fails two ways. If the holder crashes while holding it, the lock is stuck forever, because nothing can distinguish a crashed holder from a slow one. And even with a timeout, the holder can be paused past that timeout without noticing, a stop-the-world garbage collection, a hypervisor freeze, a slow network round-trip, and during the pause the lock expires and another node legitimately acquires it. When the paused node resumes it still believes it holds the lock and writes to the resource. Two nodes now act as the exclusive owner and corrupt it.
Checking "is my lock still valid?" right before writing doesn't save you, because the check and the write straddle the pause: valid at the check, expired by the write.
Solution
Two mechanisms, layered.
A lease is a lock with an expiry. The grantor gives exclusive rights for a bounded interval, the holder must renew before it lapses, and if the holder crashes or goes silent the lease simply expires and the resource frees up with no human intervention. That fixes the stuck-forever failure. To stay safe against clock skew the holder treats its lease as expiring slightly earlier than the grantor does, so it stops acting before the grantor would consider the lease available to anyone else.
A lease alone doesn't fix the pause, so add fencing tokens. Each grant carries a monotonically increasing number from the lock authority. Every operation on the protected resource includes its token, the resource records the highest token it has accepted, and it rejects any operation carrying a lower one. When the paused old holder wakes and writes with its now-stale, smaller token, the resource refuses it, because a newer holder has already written with a higher token. The part that makes this work is that the resource itself enforces the token. A lock service can hand out exclusivity, but only the downstream resource can stop a zombie that has stopped talking to the lock service entirely.
The pause still happens; fencing doesn't prevent the race, it makes the late write harmless.
Tradeoffs
| Property | Effect |
|---|---|
| Automatic recovery | Lease expiry frees a dead holder's lock with no operator action, the reason to prefer leases over plain locks |
| Failover vs stability | A short lease recovers fast but renews often and falsely expires under transient slowness; a long lease is stable but recovers slowly |
| Clock dependence | Leases assume bounded drift, so the holder must expire conservatively to stay safe |
| Resource cooperation | Fencing requires the protected resource to validate and persist the highest token; without that, leases alone are unsafe for correctness-critical writes |
| Token source | The monotonic counter usually comes from a consensus log (a zxid or Raft index), tying the lock to a consensus system |
| Limited scope | Fencing stops the damage, not the underlying pause or the duplicated belief in ownership |
Implementations
Minimal pseudocode
# lock service: grant a lease with a monotonic tokendef acquire(client):if held and not expired(): return DENIEDtoken = next_token() # monotonic, e.g. consensus indexgrant(client, expiry=now() + TTL, token=token)return (LEASE, expiry, token)# holder: renew before expiry, carry the token on every writedef use(resource, data, token, expiry):if now() > expiry - SKEW: renew_or_stop() # treat the lease as expiring earlyresource.write(data, token)# resource: reject any write carrying a stale token ← the safety checkdef write(data, token):if token < self.max_token_seen: return REJECTED # fence off the zombieself.max_token_seen = tokenapply(data)
The resource-side token < max_token_seen check is the line that provides safety; the lease and renewal around it are about availability and recovery, not correctness.
GFS chunk leases
The GFS master grants a lease, around sixty seconds and renewable through heartbeats, to one replica of a chunk, designating it the primary that serializes all mutations to that chunk for the lease's duration. When a primary fails or partitions, its authority simply lapses and the master grants a fresh lease elsewhere rather than waiting on the dead primary. Each chunk also carries a version number that the master bumps on new leases, so a replica that missed updates shows an older version and is detected as stale and garbage-collected, which is the fencing role applied to replicas.
ZooKeeper locks
The standard ZooKeeper lock recipe creates an ephemeral sequential znode; the client with the lowest sequence number holds the lock, and because the node is ephemeral it vanishes when the client's session times out, releasing the lock automatically in the way a lease does. The monotonic sequence number, or the znode's zxid, serves as the fencing token: the holder passes it to the protected resource, which rejects operations stamped with an older value, so ZooKeeper supplies exclusivity and the token while the resource supplies the actual fence.
The classic fencing pattern
The textbook design, popularized by Martin Kleppmann's critique of fencing-less locks, has the lock service issue an incrementing token with every grant while the storage service tracks the highest token it has accepted and rejects writes bearing an older one. This is what defeats the garbage-collection-pause zombie: a held lock backed only by mutual exclusion is unsafe for correctness-critical data, because a paused holder can resume and write after losing the lock, and only a monotonic token enforced at the resource turns that stale write into a rejected one.