Problem
A client sends a request and the response never arrives, because of a timeout, a dropped connection, or a crash. The client cannot tell which of two things happened: the server processed the request and the reply was lost, or the server never received it at all. The available-system move is to retry, but if the operation has a side effect, charging fifty dollars, creating an order, transferring funds, and the server did process the original, the retry double-charges.
Duplicates aren't a rare edge case here; they're built into the architecture. Queues redeliver, load balancers re-route, and sagas re-run a step after a crash, every one of them resending a request that may already have taken effect. Over an unreliable network you can't eliminate duplicates at the transport layer, so the application has to be correct when the same request arrives twice.
Solution
Make repeating an operation produce the same result as performing it once. Some operations are naturally idempotent: assigning an absolute value, deleting by id, setting a status to shipped. Repeating those lands the same state, so a retry is harmless. The operations that hurt are the ones that create, charge, or increment, and for those the client attaches a unique idempotency key to the request and reuses that same key across the request's retries. The server records each key alongside the outcome it produced: the first time it sees a key it executes the side effect and stores the response, and on any later request with the same key it skips execution and returns the stored response.
This pairing, at-least-once delivery from the client and at-most-once execution per key on the server, gives effectively-once side effects, which is as close to exactly-once as a real network allows. True exactly-once delivery is impossible when either the request or its acknowledgment can vanish, so production systems stop chasing it and dedupe instead.
Correctness hinges on a detail: the dedupe lookup and the side effect must commit together atomically. Otherwise two concurrent retries both pass the "haven't seen this key" check and both execute. Enforce it with a unique constraint or a conditional write on the key so exactly one request wins the claim and the rest wait for or read the stored result. Around that core, cache the response including errors so every retry is identical, mark a key in-progress while it executes, scope keys per endpoint and account, expire them after a bounded window, and reject a key reused with a different request body so a client bug can't alias two different operations onto one key.
Tradeoffs
| Property | Effect |
|---|---|
| Safe retries | Effectively-once side effects, which is what makes saga steps, queue redelivery, and quorum retries correct |
| Storage and lookup | Every mutating request now reads and writes a key record, needing a store with atomic conditional writes and expiry |
| Atomicity | The dedupe check and the side effect must be transactional together, adding locking and contention on hot keys |
| Key management | Clients must generate a stable key per operation and not reuse it across different requests; servers must detect body mismatch |
| Bounded window | Keys expire, so a retry after the window re-executes; the window must exceed the longest realistic retry horizon |
| Coverage | Only operations carrying a key are deduped; effects without one, such as a sent email or an external call, need their own idempotency |
Implementations
Minimal pseudocode
def handle(request):key = request.idempotency_key# atomic claim: only one concurrent request wins the insertif not store.try_insert(key, status=IN_PROGRESS, body_hash=hash(request)):existing = store.get(key)if existing.body_hash != hash(request):return error(422, "key reused with a different body")if existing.status == DONE:return existing.response # replay the stored resultreturn error(409, "request in progress") # retry shortlyresult = do_side_effect(request) # executed at most oncestore.update(key, status=DONE, response=result)return result
The load-bearing line is try_insert: it's the atomic guard that makes concurrent retries pick a single winner, and storing the result lets every later retry replay the same response.
Stripe idempotency keys
Stripe accepts an Idempotency-Key header on mutating requests, saves the first response keyed to it, and returns that stored response for any retry within a 24-hour window so a network blip never charges a customer twice. Concurrent requests sharing a key are serialized rather than both executed, and reusing a key with different parameters returns an error, which stops a client bug from collapsing two distinct charges onto one key. It's the design most other APIs copy.
AWS client tokens
Many AWS APIs take a ClientToken (also called an idempotency token) so that an SDK's automatic retries don't create duplicate resources; calling EC2 RunInstances twice with the same token launches one set of instances, not two. This is distinct from the x-amzn-RequestId returned in responses, which identifies a call for tracing and support rather than deduplicating it. SQS FIFO queues apply the same idea to messaging through a MessageDeduplicationId that drops repeats within a five-minute window.
Effectively-once payment systems
Payment processors and ledgers lean on a unique transaction or reference id so a retried transfer posts exactly once. The id deduplicates the write, the posting itself is idempotent, and an append-only ledger handles genuine corrections by adding a reversing entry rather than mutating the original, so the combination of at-least-once delivery and id-based dedupe yields a single net effect even when the same instruction is delivered several times. Periodic reconciliation against the source of truth catches anything that slipped through.