Append-Only Event Storage | Microservices Course

Problem

Storing only the current value, a mutable cell you overwrite on each change, discards everything about how that value came to be. Once you've overwritten a balance or a status, you can't say what it was last week, what sequence of changes produced it, or why any of them happened, and if a bug corrupted the value there's nothing to reconstruct from. Auditing, debugging, point-in-time queries, and reprocessing all need information that in-place mutation has already destroyed. Some domains, finance and compliance among them, are required to keep the full history rather than just the latest figure, and overwriting in place makes that impossible.

Solution

Don't store current state; store the ordered, immutable sequence of events that produced it. Each event is a fact that happened ("deposited 50", "address changed to X"), appended to a log and never modified or deleted. Current state is then a fold: replay the events through a reducer to derive the value, state = reduce(apply, events, initial). The log is the system of record, and any current-state view is a derived, disposable projection that can be thrown away and rebuilt by replaying.

Several things follow. You get a complete audit trail by construction, and time travel for free, since folding only the events up to time T yields the state as of T. A bug in a projection is repaired by fixing the code and replaying, because the source events are untouched. You can derive an entirely new read model, shaped for a new query, from the same events at any time, which is why this pairs naturally with CQRS (writes append events, reads are served by projections) and with the denormalized read models from the earlier piece. And because writes are append-only and events are immutable, there's no in-place contention and the write path is sequential, the same property the WAL piece relied on.

Correcting a mistake works the way an accounting ledger does: you never erase a posted fact, you append a compensating event that reverses or adjusts it.

The references span two flavors. A full event log keeps every event forever (EventStoreDB, ledgers, Datomic), so the fold runs over complete history. A compacted log keeps only the latest event per key (Kafka log compaction), so the log becomes a bounded, recoverable snapshot of current state per key rather than a full history, trading auditability for bounded storage.

Tradeoffs

Property	Effect
History and audit	Complete and tamper-evident by construction: the main reason to do this
Temporal queries	Any past state is a fold up to that point
Projections	Read models are derived and rebuildable; fix code and replay
Replay cost	Folding from the beginning is expensive, so you persist periodic snapshots and replay only events after them, the WAL checkpoint idea again
Storage growth	A full log grows without bound; compaction caps it only by dropping history
Event versioning	Events live forever, so old event shapes must stay readable (upcasting), which is a real long-term burden
Consistency	Projections lag the log; reads are eventually consistent
Corrections	You can't edit a wrong fact, only append a correcting one

Implementations

Minimal pseudocode

# write: append an immutable fact, never mutate
def append(log, event):
    log.append(event)                      # sequential, durable, final

# read: current state is the fold over events since the last snapshot
def current_state(log, reduce_fn):
    snap = load_snapshot()                 # state + offset, or (initial, 0)
    state, offset = snap.state, snap.offset
    for event in log.read_from(offset):
        state = reduce_fn(state, event)
    return state

def correct(log, event_id, fix):
    append(log, ReversingEvent(of=event_id, fix=fix))   # not an edit

Kafka log compaction

A compacted topic guarantees that at least the most recent record for each key is retained, while a background process garbage-collects older records sharing that key, so the log becomes a durable, replayable changelog whose surviving records are the current per-key state. This is how Kafka Streams backs a KTable or a state store: rebuild state by replaying the compacted topic. A record with a null value is a tombstone that removes the key. It contrasts with ordinary time-based retention, which keeps full history for a window. Docs: https://kafka.apache.org/documentation/#compaction.

EventStoreDB

A database built specifically for event sourcing. Events are appended to per-aggregate streams, are immutable, and use optimistic concurrency keyed on the expected stream version to detect conflicting writes. Built-in projections derive read models from the streams, and subscriptions push events to downstream consumers as they're appended. Docs: https://developers.eventstore.com/.

Accounting ledgers

The original event-sourced design, predating computers. Double-entry bookkeeping is strictly append-only: a posted entry is never altered or deleted, mistakes are handled by posting reversing or adjusting entries, and an account balance is the fold (sum) over all entries. Auditability is the entire purpose, which is why purpose-built financial ledgers like TigerBeetle adopt the same append-only event model directly. TigerBeetle: https://docs.tigerbeetle.com/.

Datomic

As noted in the copy-on-write piece, Datomic stores immutable datoms stamped with transaction time and only ever accumulates them. Reading the database "as of" a time is a fold filtered to that point, and nothing is overwritten, so the database value at any past instant is recoverable directly. Docs: https://docs.datomic.com/.