Secondary Indexes

Problem

A query that filters on anything other than the primary key has no way to find matching rows except to read the whole table, which is O(n). The primary-key index only accelerates lookups by primary key, so "find the user with this email" or "find orders where status = shipped" degrades linearly as the table grows. Real workloads have many such access patterns, and a full scan per query doesn't survive scale.

Solution

Build a secondary index: a separate searchable structure keyed on the attribute you query, mapping each index key to the location(s) of matching rows (their primary key or a physical pointer). A lookup on that attribute drops from O(n) to O(log n) for a B-tree or near O(1) for a hash. An index is a derived, redundant structure, it duplicates the indexed column plus a pointer, must be kept in sync with the base data on every write, and can be dropped and rebuilt from the table at any time.

The structure follows the query shape. A B-tree serves equality, range, and ordering on scalar values and is the default. A hash index serves equality only. An inverted index maps each term to the list of documents containing it, for full-text search and for membership in arrays or JSON. A GiST-style generalized tree handles types a single linear key can't order, ranges, geometry, nearest-neighbor.

The distributed question is where the index lives relative to the data, and it splits into local and global.

A local (partition-local) secondary index has each partition index only its own rows. A write updates just the local partition's index with no cross-node coordination, so writes stay cheap. But a query on the indexed attribute that isn't pinned to one partition has to ask every partition and merge the results (scatter-gather), because matches could be anywhere.

A global secondary index is itself partitioned by the indexed attribute, independent of how the base table is partitioned. A read on that attribute goes to one index partition, no scatter-gather. But a write to a row may have to update an index partition sitting on a different node than the row, so writes turn into distributed, fan-out operations that are usually applied asynchronously and are therefore eventually consistent.

That local-versus-global choice is the write-cost-versus-read-cost trade that ran through the LSM, columnar, and denormalization pieces, now expressed as where the index partitions sit.

Tradeoffs

The general cost ledger of any index:

PropertyEffect
Read speedupIndexed predicates go O(n) → O(log n)/O(1); also enables cheap ordering and range scans
Index-only scansA covering index answers a query without touching the table at all
Write taxEvery insert, update, or delete updates every index covering the changed columns; more indexes means slower writes and more log volume
Update sensitivityChanging an indexed column touches the index even when the row stays in place
SpaceEach index costs storage proportional to its columns plus pointers, sometimes rivaling the base table
MaintenanceIndexes fragment and need rebuilds/VACUUM; on LSM-backed stores their upkeep rides the same compaction cost
SelectivityIndexes help selective predicates; on low-cardinality columns the planner often ignores them, so the index is pure cost
Hot spotsA global index on a monotonic or low-cardinality key concentrates load on a few partitions

And the local-versus-global comparison, which is the sharper trade:

DimensionLocal indexGlobal index
WriteCheap, single-partition, coordinated with the rowDistributed fan-out to other partitions
Read on indexed attrScatter-gather across all partitionsTargeted to one index partition
ConsistencyCan be kept strongly consistent with the rowUsually asynchronous, so eventually consistent
Read latencyBounded by the slowest partition; worsens as partition count growsFlat, independent of partition count
Best whenWrites dominate, or queries already carry the partition keyReads dominate and span partitions

Two second-order effects matter at scale. Scatter-gather reads make tail latency the real metric: a query that fans out to a hundred partitions waits on the slowest of the hundred, so p99 degrades as you shard further. And asynchronous global indexes break read-your-writes, because a row can be visible by primary key before its index entry has propagated.

Implementations

Minimal pseudocode

# maintain on write: update every index covering the changed columns
def put(table, row):
table.store(row)
for idx in table.indexes:
if idx.covers(row):
idx.insert(idx.key(row), row.pk) # the write tax
def lookup(idx, key):
return [table.fetch(pk) for pk in idx.get(key)] # O(log n) into idx
# local index: must ask every partition, then merge (scatter-gather)
def query_local(attr, value, partitions):
hits = []
for p in partitions:
hits += p.local_index[attr].get(value)
return hits # latency = slowest p
# global index: routed to the one partition that owns this value
def query_global(attr, value, global_idx):
p = global_idx.partition_for(value)
return p.get(value) # one hop

PostgreSQL GIN and GiST

Beyond the default B-tree and hash, Postgres ships two extensible index types for data a scalar key can't handle. GIN (Generalized Inverted Index) is an inverted index for multi-valued columns, array containment, jsonb key/value lookups, and full-text search over tsvector; lookups are fast, but updates are heavier, so it batches new entries in a pending list (fastupdate) and is best for relatively static data. GiST (Generalized Search Tree) is a balanced-tree framework for types where ordering isn't linear, ranges, geometry and PostGIS, nearest-neighbor (KNN) search, supporting overlap, containment, and distance operators. Both are local to the table or its partitions. Docs: https://www.postgresql.org/docs/current/gin.html and https://www.postgresql.org/docs/current/gist.html.

DynamoDB LSI and GSI

The canonical local/global pair. A Local Secondary Index shares the base table's partition key but adds an alternate sort key, so it lives inside the same partition, can be read strongly consistent, but must be defined at table creation and counts against that partition's 10 GB item-collection limit. A Global Secondary Index has its own partition and sort key, is partitioned independently of the base table, is eventually consistent, is provisioned and throttled separately, and can be added at any time. Choosing between them is exactly the write-locality versus read-locality decision above. Docs: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/SecondaryIndexes.html.

Elasticsearch inverted indexes

Documents are tokenized into terms, and the inverted index maps each term to a postings list of document IDs with positions and frequencies, which is what makes full-text search and term filtering fast. It's built on Lucene segments, which are immutable and append-only and merged in the background, the same immutable-file-plus-compaction shape as an LSM tree. Each shard holds a local index, so a query scatter-gathers across shards and merges and scores the results, putting it squarely on the local-index side of the trade. Docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html.