Schema-On-Read vs Schema-On-Write | Microservices Course

Problem

Data has structure, types, required fields, relationships, but you have to decide when that structure is checked and who absorbs the cost. Check it at write time and everything that lands is validated and conformed, which guarantees integrity but forces you to know the schema before you can store anything and rejects or stalls whatever doesn't fit. Defer the check to read time and you can dump arbitrary bytes immediately, but every reader has to interpret them, with no guarantee that the data parses or that two readers agree on what it means. One dataset usually serves many consumers with different needs, and its shape often changes faster than you can migrate it. The real question is where to put the enforcement boundary.

Solution

The two approaches are opposite ends of one axis.

Schema-on-write defines the schema first and makes the storage system validate at ingest: types, NOT NULL, foreign keys, check constraints. Anything nonconforming is rejected, so data on disk is always well-formed and self-describing. This is the classic relational database and the loaded data warehouse, and it pairs with ETL (transform, then load).

Schema-on-read stores raw files as-is, JSON, CSV, logs, Parquet, with no enforcement at ingest, and imposes a schema only when a query reads, by telling the engine how to interpret the bytes. This is the data lake, and it pairs with ELT (load raw, transform in the query). Worth noting that self-describing formats like Parquet and Avro embed a schema per file, so in practice schema-on-read usually means "a schema per file, reconciled at query time," not the total absence of one.

Dimension	Schema-on-write	Schema-on-read
Integrity	Guaranteed at ingest	None until a reader trips on bad data
Ingest	Slower, can reject	Immediate, accepts anything
Flexibility	Must model before storing	Capture now, decide structure later
Query cost	Lower (data pre-conformed)	Higher (parse and validate per read)
Evolution	Migrations (ALTER, backfill)	Reinterpret without rewriting data
Who owns correctness	The database	Every reader
Failure mode	Rejected writes	Data swamp

The negotiated middle is the lakehouse table format: Iceberg, Delta Lake, and Hudi sit on top of open files in object storage, so you keep the cheap, engine-agnostic, decoupled storage of a lake, but add a metadata and catalog layer that supplies a managed, enforced, safely-evolvable schema plus ACID transactions, snapshots, and statistics for pruning. It earns the "negotiated" label by deliberately choosing a point between the poles: schema is enforced and evolvable as in schema-on-write, while the underlying storage stays open files many engines can read. Under the hood these reuse ideas from the earlier pieces, immutable data files committed by a pointer swap (copy-on-write), a transaction log that's a WAL, and per-file column statistics for skipping (columnar).

Implementations

Minimal pseudocode

# schema-on-write: rejected at the boundary
def insert(table, row):
    for col, spec in table.schema.items():
        if not spec.accepts(row.get(col)):       # type/null/constraint check
            raise IntegrityError(col)             # write fails now
    table.append(conform(row, table.schema))

# schema-on-read: bytes stored blind, schema applied at query
def write_raw(path, blob):
    object_store.put(path, blob)                  # always succeeds

def query(path, schema, predicate):
    for f in object_store.list(path):
        for raw in f:
            try:
                row = schema.parse(raw)           # interpretation happens here
            except ParseError:
                continue                          # bad data surfaces only now
            if predicate(row):
                yield row

S3 + Athena / Spark (schema-on-read)

Files live in an S3 bucket as Parquet, JSON, or CSV, and a "table" is just a schema definition registered in a catalog (AWS Glue or a Hive metastore) that points at a path. Athena, Spark, or Trino apply that schema when the query runs. Writing malformed objects to the bucket succeeds regardless, and the mismatch only shows up as a failed query or null columns later. The schema is metadata about the bytes, not a constraint on them. Docs: https://docs.aws.amazon.com/athena/latest/ug/what-is.html.

PostgreSQL (schema-on-write)

CREATE TABLE fixes columns, types, and constraints; an INSERT that violates them is rejected outright, so stored data is always valid against the declared schema. Changing the shape means ALTER TABLE and, often, a backfill. As an escape hatch for genuinely semi-structured fields, a jsonb column lets you store flexible documents inside an otherwise strict schema-on-write table. Docs: https://www.postgresql.org/docs/current/ddl.html.

Iceberg / Delta Lake (the middle)

A table format is a metadata layer over Parquet files in object storage. Delta Lake keeps a transaction log (the _delta_log directory of JSON commits plus periodic checkpoints) that records every change, giving ACID writes, schema enforcement and evolution, and time travel to any past version. Iceberg keeps a tree of snapshot and manifest files tracked through a catalog, giving the same guarantees plus hidden partitioning and per-file statistics for pruning. Either way many engines (Spark, Trino, Flink, and others) read and write the same governed table over open files. Iceberg: https://iceberg.apache.org/spec/. Delta: https://github.com/delta-io/delta/blob/master/PROTOCOL.md.