booklore

Seven Databases in Seven Weeks

A Guide to Modern Databases and the NoSQL Movement

sufficient

reading path: overview → analysis → narration


overview

Seven Databases in Seven Weeks

A hands-on comparative guide to seven modern database systems and the philosophy of polyglot persistence.

Overview

"The best tool for every job is not the same tool." — Perkins, Redmond, and Wilson

Seven Databases in Seven Weeks (2nd ed., 2022, Pragmatic Bookshelf) is a practical, structured tour of seven database systems, each explored over the course of a week. Authors Luc Perkins, Eric Redmond, and Jim Wilson—practitioners with extensive production database experience—guide readers through installation, core data structures, query languages, consistency semantics, and real-world deployment considerations.

The central thesis of the book is polyglot persistence: the idea that no single database system is optimal for every use case, and that modern applications benefit from deliberately selecting the right database for each data domain. Rather than advocating for or against any particular category, the book treats each database as a tool with explicit trade-offs.

What You'll Learn

  • How to model data for relational, document, graph, columnar, key-value, and managed cloud databases
  • The theoretical underpinnings of distributed data systems: CAP theorem, PACELC, BASE vs ACID, and consistency models
  • Practical query patterns in SQL, Gremlin, CQL, Cypher, and database-native APIs
  • When to introduce a second or third database into an existing architecture
  • Production considerations: indexing strategies, sharding, replication, and failure modes

Seven Databases Covered

| Week | Database | Category | Key Innovation | |------|----------|----------|-----------------| | 1–2 | PostgreSQL | Relational | ACID, SQL, joins, advanced types | | 3–4 | HBase | Distributed columnar | Wide-row, HDFS-native, random real-time reads | | 5–6 | MongoDB | Document | Flexible schemas, hierarchical nesting, rich queries | | 7–8 | Neo4j | Graph | Native graph traversal, Cypher, connected queries | | 9–10 | Cassandra | Wide-column | Tunable consistency, multi-DC replication, WAL | | 11–12 | DynamoDB | Cloud-managed key-value | Fully managed, IAM-integrated, serverless | | 13–14 | Redis | In-memory key-value | Sub-millisecond latency, rich data types |

About the Authors

  • Luc Perkins: Developer Advocate, Confluent (formerly at MongoDB, Basho). Data infrastructure, streaming, NoSQL.
  • Eric Redmond: Senior engineer with deep distributed systems background; writer and speaker on database internals.
  • Jim Wilson: Software engineer and educator; database practitioner focused on practical application design.

Reading This Guide

Each file serves a distinct purpose:

  • 01-content.mdx: Structured walkthrough of all seven databases with exercises and key takeaways.
  • 02-analysis.mdx: Framework for comparing databases, trade-off analysis, and decision guidance.
  • 03-narration.mdx: Audio-friendly narration of the book's core ideas and conclusion.

Generated for BookAtlas. ISBN 9781680502543 · Pragmatic Bookshelf · 2022.


content map

Part I: Relational Foundations — PostgreSQL

Context and Why It's Here

The book begins with PostgreSQL, the most mature and technically advanced open-source relational database. Choosing it first grounds every subsequent NoSQL exploration in a trusted baseline: ACID transactions, SQL, and normalized schemas. Only by understanding what relational databases excel at can readers appreciate where NoSQL alternatives actually win.

PostgreSQL sets the bar for the "consistency" end of the CAP spectrum.

Core Concepts Covered

Relational Model Refresher

  • Tables, rows, columns, primary keys, foreign keys
  • Normal forms (1NF through BCNF) as a design tool, not dogma
  • SQL as a declarative query language

PostgreSQL-Specific Power Features

  • JSON/JSONB support, blurring the NoSQL/relational divide
  • Array types, hstore (key-value within a row)
  • Full-text search built in
  • Window functions and CTEs for complex analytics
  • Extensions (PostGIS, pgcrypto)

ACID Guarantees in Depth

  • Atomicity: transactions are all-or-nothing
  • Consistency: constraints and invariants enforced
  • Isolation levels: read uncommitted through serializable
  • Durability: WAL (Write-Ahead Log)

Key Exercises

Install PostgreSQL, create a library database, and load initial data.sql. Practice joins across authors, books, and publishers.

Write window-function queries to rank books by publish date per publisher. Use CTEs to build a recursive author co-authorship graph inside SQL.

Takeaways

PostgreSQL is not the "default" database—it is the reference point. Every subsequent chapter asks: what does this database give up compared to PostgreSQL, and what does it gain?


Part II: Distributed Columnar — HBase

Context

HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable paper. Running on top of HDFS, it serves as the bridge from relational databases to the distributed, eventually-consistent NoSQL world.

Core Concepts Covered

Architecture

  • HRegion: the unit of distribution and load balancing
  • HStore (MemStore + StoreFile) per region
  • HDFS as the backing store (Write-Ahead Log, block replication)
  • Zookeeper for cluster coordination

Data Model: Wide Rows

  • Row keys (sorted lexicographically — a critical design decision)
  • Column families (physical storage grouping)
  • Qualifiers (dynamic, sparse columns)
  • Timestamps (built-in versioning)

Consistency Caveats

HBase provides row-level atomicity and strong consistency for single-row operations. Multi-row or cross-region scans are not atomic. The book emphasizes that HBase is not eventually consistent in the CAP sense for single-row reads—it lies toward the CP side.

Key Exercises

Design a wide-row schema for time-series IoT sensor data. Deploy a single-node HBase cluster. Insert and scan 100,000 rows; observe HBase's random read performance (ubiquitous in real-world HBase use cases).

Model a social-network follower graph in HBase wide rows. Discuss why this is an anti-pattern and how a graph database handles the same problem differently.

Takeaways

HBase teaches that row-key design is everything. Change the row key and you change the entire access pattern. It also surfaces the first real CAP tension: does the system optimize for availability or consistency under partition?


Part III: Document Databases — MongoDB

Context

MongoDB is the most widely adopted document database. Its BSON document model maps naturally to application object graphs, eliminating the object-relational impedance mismatch that plagues ORM-based PostgreSQL applications.

Core Concepts Covered

Document Model

  • BSON: binary JSON with extended types (ObjectId, Date, binary)
  • Embedded documents vs. references
  • Schema flexibility: no schemas, schema-on-read, schema-on-write via validation rules

CRUD and Query Language

  • find, findOne, projection, operators ($gt, $in, $regex)
  • Update operators ($set, $inc, $push, $addToSet)
  • Atomicity at the document level — multi-document transactions require explicit handling

Indexing Strategies

  • Single-field, compound, multikey (arrays), geospatial
  • Text indexes, TTL indexes
  • Index intersection (MongoDB 2.6+)

Consistency and Replication

  • Primary-secondary replication (replica sets)
  • Read preference: primary, secondary, nearest
  • Write concern: acknowledged, majority, fsync
  • Tunable consistency via read/write concern combinations

Key Exercises

Model a blog with embedded comments. Then model a many-to-many relationship between posts and tags using references. Compare query complexity in both.

Perform aggregation pipeline queries: $match, $group, $sort, $lookup (join-like), $unwind.measure query performance with explain("executionStats").

Takeaways

MongoDB trade-off: developer velocity and schema flexibility at the cost of transactional scope and referential integrity guarantees. It is AP-oriented (availability + partition tolerance) with tunable consistency knobs.


Part IV: Graph Databases — Neo4j

Context

Neo4j is the leading native graph database. After spending two weeks in document- and column-oriented models, the book pivots to a fundamentally different paradigm: relationships as first-class citizens, navigated through graph traversal rather than joins or map-reduce.

Core Concepts Covered

Property Graph Model

  • Nodes: entities with labels and properties
  • Relationships: directed, typed, with properties, connecting exactly two nodes
  • Labels: semantic grouping of node types
  • The graph is the schema — schema flexibility here means adding new node/relationship types

Cypher Query Language

  • Pattern-matching syntax inspired by ASCII art: (person:Author)-[:WROTE]->(book:Book)
  • MATCH, WHERE, RETURN, CREATE, MERGE
  • Variable-length path patterns: -[:WROTE*2..4]->

Why Graph Queries Are Different

  • Relational joins grow exponentially with depth; graph traversals are O(depth)
  • Recommendations, fraud detection, knowledge graphs, and network analysis are natural graph problems
  • Schema-on-read via labels, but performance requires thoughtful indexing

Key Exercises

Build a movie recommendation graph: (Person)-[:ACTED_IN|:DIRECTED]->(Movie) with genres. Write Cypher to find "collaborators of collaborators."

Model a dependency/citation graph. Write a shortest-path Cypher query to find the citation distance between two papers.

Takeaways

Neo4j is fundamentally about depth queries. Where PostgreSQL needs JOIN explosions and MongoDB needs multi-stage $lookups, Neo4j traverses relationships in near-constant time for most practical graph depths. It is a CP system under CAP.


Part V: Wide-Column Stores — Cassandra

Context

Cassandra was born at Facebook for inbox search. It inherits from Amazon's Dynamo paper (availability via vector clocks and hinted handoff) and Google's Bigtable (data model). The result is a database optimized for write-heavy workloads across multiple data centers with tunable consistency.

Core Concepts Covered

Data Model

  • Keyspace: top-level namespace (analogous to a schema)
  • Table: row-based, but rows are not required to share columns (sparse)
  • Partition key: determines node placement and distribution
  • Clustering columns: determine sort order within a partition

Distribution and Replication

  • Consistent hashing via the partitioner (Murmur3 by default)
  • Replication factor and replication strategy (SimpleStrategy, NetworkTopologyStrategy)
  • Tunable consistency per operation: QUORUM, LOCAL_QUORUM, ONE, ALL

CAP and Consistency Math

  • For a write with replication factor RF and consistency level CL=QUORUM:
    • Write quorum = floor(RF/2) + 1
    • Paxos-style lightweight transactions (IF NOT EXISTS) via LWT protocol
  • CAP position: AP by default, can offer CP via LWT and QUORUM reads/writes

Key Exercises

Design a Cassandra schema for a multi-region e-commerce catalog. Choose partition keys and clustering columns. Explain why this design cannot efficiently answer "top 10 cheapest products across all regions."

Model a time-series user activity feed using time-based partition keys. Discuss TTL patterns and compaction strategies.

Takeaways

Cassandra demands that you design your schema for your queries, not for data normalization. The partitioning model is non-negotiable. Its superpower is Multi-DC replication with per-operation consistency control.


Part VI: Cloud-Managed — DynamoDB

Context

While Cassandra is self-hosted, DynamoDB is Amazon's fully managed equivalent. A third of the way through the book, the focus shifts to the operational simplicity of a serverless database where capacity planning, replication, and patching are someone else's problem.

Core Concepts Covered

Data Model

  • Tables, items, attributes
  • Simple primary key (partition key only) or composite primary key (partition + sort key)
  • Typed scalar attributes: String, Number, Binary, Boolean, Null
  • Set types: String Set, Number Set, Binary Set
  • Document types: List, Map

DynamoDB's Consistency Trade-offs

  • Eventually consistent reads (default, ~100ms latency, two AZs)
  • Strongly consistent reads (single AZ, ~200ms, not available for global tables)
  • Transactions: ACID across up to 25 items or 4 MB (added in 2018)

Access Patterns Drive Schema

  • Single-table design: multiple entity types in one table with composite sort keys
  • Secondary indexes (GSI with its own partition/sort key projection)
  • Access patterns must be known before table creation

Key Exercises

Design a single DynamoDB table for an e-commerce system storing Users, Orders, and Products in one physical table. Explain how the composite sort key encodes entity type and attributes.

Add a GSI to support "find all orders for a given user." Discuss GSI read/write capacity and its impact on cost and latency.

Takeaways

DynamoDB compresses everything—schema design, consistency, access patterns, and cost—into explicit upfront decisions. There is no migration path once a table is live. The CAP trade-off is managed through configurable RCU/WCU and eventual vs. strong consistency.


Part VII: In-Memory Key-Value — Redis

Context

Redis closes the book at the infrastructure layer: a sub-millisecond, in-memory key-value store used as a cache, message broker, session store, leaderboard engine, and sometimes a primary database. Its breadth of data structures is unmatched among key-value stores.

Core Concepts Covered

Data Structures

  • String: O(1) get/set; binary-safe
  • List: O(1) push/pop from both ends; queue and stack semantics
  • Set: unordered unique collection; O(1) member add/remove
  • Sorted Set (ZSET): score-ordered; O(log N) insert; leaderboard natural fit
  • Hash: field-value map within a key; partial updates without reading whole value
  • Bitmaps, HyperLogLog, Geospatial indexes

Persistence Models

  • RDB: point-in-time snapshot to disk
  • AOF: every write operation appended to a log; fsync configurable
  • Hybrid: RDB + AOF for fast restarts and minimal data loss

Replication and High Availability

  • Master-replica replication (asynchronous by default)
  • Redis Sentinel: automatic failover with quorum
  • Redis Cluster: sharding across 1,024 hash slots; eventual consistency between masters

Key Exercises

Install Redis, configure AOF with fsync-every-second, stop the process, corrupt the AOF, repair with redis-check-aof, and restart. Observe the persistence guarantees in practice.

Build a rate limiter using Redis sorted sets: ZADD rate-limit:user123 <timestamp> <requestId>. Implement sliding-window logic with ZREMRANGEBYSCORE and ZCARD. Measure latency with redis-benchmark.

Takeaways

Redis is AP-oriented and prioritized for speed and data structure richness over transactional guarantees. Its polymodel nature (not just key-value) makes it relevant for many problem categories, but requires operational expertise around persistence, memory management, and cluster topology.


Cross-Database Comparison Summary

The following Mermaid diagram maps each database along the two CAP dimensions most discussed in the book, illustrating approximate positioning as the authors describe them in production contexts:

graph TD
    subgraph CP[Consistency & Partition Tolerance]
        PG[PostgreSQL<br/>ACID · SQL · Strong Consistency]
        N4J[Neo4j<br/>Graph · Cypher · Strong Consistency]
    end
    subgraph AP[Availability & Partition Tolerance]
        MONGO[MongoDB<br/>Document · Flexible Schema · Tunable Consistency]
        HB[HBase<br/>Wide-Row · HDFS · Row Atomic]
        CAS[Cassandra<br/>Wide-Column · Tunable Consistency · Multi-DC]
        DDB[DynamoDB<br/>Cloud-Managed · Serverless · Eventual/Strong]
        RED[Redis<br/>In-Memory · Sub-ms Latency · Master-Replica]
    end
    subgraph CA[Consistency & Availability]
        PACELC[PACELC Note:]
        PACELC_NOTE[If P: Cassandra has EL choice<br/>If no P: Cassandra has CA trade-off]
    end
    PG -->|baseline| MONGO
    PG -->|sharding| HB
    MONGO -->|graph traversal| N4J
    HB -->|write-optimized| CAS
    CAS -->|managed cloud| DDB
    DDB -->|speed layer| RED

Key Themes That Bind All Seven

Polyglot Persistence: Use PostgreSQL for financial transactions requiring ACID guarantees, Redis for the caching layer, Neo4j for the social graph, MongoDB for the product catalog, and Cassandra for the time-series metrics. The book's bottom line: every architecture that requires more than one of these patterns is a polyglot persistence architecture.

Consistency Is a Slider, Not a Switch: From PostgreSQL's SERIALIZABLE isolation through Cassandra's LOCAL_QUORUM and DynamoDB's strongly consistent reads, every system discussed offers a point on the C-A spectrum in a distributed system.

Schema Design Encodes Access Patterns: HBase row keys, Cassandra partition keys, DynamoDB sort keys, Neo4j relationship types—failure to design your schema around your reads means you've designed it for failure.

BASE vs ACID: BASE (Basically Available, Soft state, Eventually consistent) describes the trade-off envelope for AP systems. ACID describes the CP guarantee envelope. Neither is inherently better—they solve different problems under different assumptions.

Operational Complexity Trade-off: Self-managed (HBase, Cassandra, self-hosted Redis) vs. fully managed (DynamoDB) vs. has-open-source-managed (PostgreSQL via RDS, MongoDB Atlas, Neo4j Aura). Planning for operational burden is a first-class architectural decision.


analysis

Part II: Critical Analysis

The Book's Thesis in One Sentence

A developer who understands seven databases, their data models, their consistency semantics, and their operational costs can design systems that use the right tool for each data domain—polyglot persistence—instead of forcing every problem into a single database paradigm.

Strengths

1. Structure Forces Real Engagement

A week per database, with exercises at the end of each day, means the reader does the work. The book is not reference material—it's a structured apprenticeship. The exercises are carefully sequenced: install → model → query → compare.

2. Comparative Framework Is Explicit

Rather than treating each database in isolation, the authors repeatedly draw comparisons: "PostgreSQL does this with a JOIN; MongoDB does this with $lookup; Neo4j does this with a MATCH pattern." This comparative lens—built into the book's DNA—is its most important intellectual contribution.

3. CAP and CAP-Adjacent Theory Is Accessible

The book introduces PACELC (if there's a Partition, your system must trade Availability against Consistency; Else, you can trade Latency against Consistency) without overwhelming the reader with formal proofs. It translates Brewer's theorem from a research paper into a decision-making tool.

4. Materials Age Better Than Most Tech Books

Published 10 years after the first edition, the 2nd edition replaces CouchDB with DynamoDB—recognition that cloud-managed databases changed the landscape. PostgreSQL coverage was upgraded to include JSONB. The core framework (compare 7 systems, model, query, reason) remains evergreen.

Weaknesses

1. No Depth on Operational Production Concerns

The book covers installation and basic configuration but omits production-critical topics: backup/restore procedures, disaster recovery planning, monitoring essential metrics (latency P99, replication lag, connection pool saturation), upgrade procedures, and capacity planning. This is a significant gap for the stated audience of practitioners.

2. DynamoDB Coverage Is Thin on Modern Features

The 2022 edition covers DynamoDB at a time when DynamoDB Standard-IA tables, DynamoDB Streams with Kinesis Data Streams integration, and IAM condition-based access control were all available. The book predates significant serverless-native features and doesn't address DynamoDB pricing models in operational depth.

3. No Coverage of PostgreSQL Internals

Given PostgreSQL is book one and the reference standard, readers might expect deeper internals coverage: MVCC mechanism, VACUUM, autovacuum tuning, WAL archiving, PITR configuration, and the postmaster process. These topics are absent.

4. Elasticsearch and Couchbase Omissions

The book's subtitle is "Seven Databases in Seven Weeks" but the ecosystem has expanded. Elasticsearch—ubiquitous for search workloads—is not covered. Couchbase, which combines document and key-value models, appears in the index but not as a core chapter. These gaps could mislead readers about the full NoSQL landscape.

Core Frameworks Extracted from the Book

Use this matrix when evaluating which database to introduce for a given data domain:

| Question | PostgreSQL | MongoDB | Neo4j | Cassandra | DynamoDB | Redis | |---|---|---|---|---|---|---| | Do you need multi-row ACID transactions? | ✅ Yes | ❌ No | ❌ No | ❌ No | ⚠️ Limited | ❌ No | | Is your primary access pattern deep graph traversals? | ❌ No | ❌ No | ✅ Yes | ❌ No | ❌ No | ❌ No | | Is availability across multiple DCs the top priority? | ⚠️ Yes | ⚠️ Yes | ❌ No | ✅ Yes | ⚠️ Managed | ❌ No | | Is your workload write-heavy (>70% writes)? | ⚠️ Yes | ⚠️ Yes | ❌ No | ✅ Yes | ⚠️ Yes | ✅ Yes | | Do you need sub-millisecond reads? | ❌ No | ❌ No | ⚠️ Sometimes | ❌ No | ⚠️ Sometimes | ✅ Yes | | Do you want zero operational burden? | ⚠️ Managed | ⚠️ Managed | ⚠️ Managed | ❌ No | ✅ Yes | ⚠️ Managed | | Is your data naturally hierarchical? | ⚠️ Yes | ✅ Yes | ❌ No | ❌ No | ⚠️ Yes | ❌ No |

CAP and Consistency Trade-Off Analysis

graph TB
    subgraph Theoretical Models
        ACID["ACID Guarantees<br/>(PostgreSQL baseline)"]
        BASE["BASE Philosophy<br/>(Cassandra, DynamoDB default)<br/>Basically Available, Soft state, Eventually consistent"]
        PACELC["PACELC Extension<br/>If P: trade A vs C<br/>If no P: trade L vs C"]
    end
    subgraph How Each DB Positions
        PG_POS["PostgreSQL: CA (single-node/single-DC)<br/>CP under network partition"]
        MONGO_POS["MongoDB: AP with tunable concern<br/>majority write = CP-like"]
        HB_POS["HBase: CP within a region<br/>row-level atomicity guaranteed"]
        N4J_POS["Neo4j: CP<br/>cluster election model"]
        CAS_POS["Cassandra: AP default<br/>CP available via LWT + QUORUM"]
        DDB_POS["DynamoDB: AP eventual default<br/>CP via strongly consistent reads"]
        RED_POS["Redis: AP (async replica)<br/>CP available via WAIT command"]
    end
    ACID --> PG_POS
    BASE --> CAS_POS
    BASE --> DDB_POS
    PACELC --> CAS_POS
    PACELC --> DDB_POS

Decision Flowchart: Which Database Fits Your Needs?

flowchart TD
    START["What is your primary query pattern?"]
    TRAVERSE{"Deep graph traversal<br/>(connected data)?"}
    TXN{"Multi-row ACID<br/>transactions?"}
    DOC{"Hierarchical/semi-structured<br/>document model?"}
    COL{"Wide-column / time-series<br/> + multi-DC?"}
    KV{"Sub-millisecond<br/>key-value access?"}
    MG{"Fully managed serverless<br/>with auto-scaling?"}
    KV_TYPES{"Rich data structures<br/>(lists, sets, sorted scores)?"}
    NEO["Neo4j — Graph database"]
    PG["PostgreSQL — Relational"]
    MONGO["MongoDB — Document database"]
    CAS["Cassandra — Wide-column store"]
    HB["HBase — Distributed columnar"]
    DDB["DynamoDB — Cloud-managed"]
    REDIS["Redis — In-memory store"]
    POSTGRES_GREAT["PostgreSQL is also<br/>a strong candidate"]
    NEO --> ORDER_NOTE["Use Cypher MATCH.<br/>See p. 144–156 (Neo4j)"]
    PG --> ORDER_NOTE2["Use PostgreSQL when<br/>consistency > availability trade-off"]
    MONGO --> ORDER_NOTE3["Use MongoDB for<br/>flexible schemas + rich queries"]
    CAS --> ORDER_NOTE4["Use Cassandra for<br/>write-heavy multi-region"]
    HB --> ORDER_NOTE5["Use HBase for<br/>random reads on big data"]
    DDB --> ORDER_NOTE6["Use DynamoDB when<br/>ops burden is unacceptable"]
    REDIS --> ORDER_NOTE7["Use Redis for caching,<br/>rate limiting, sessions"]
    TOPIC["Return to Part I: PostgreSQL<br/>as your default hypothesis"]
    START --> TRAVERSE
    TRAVERSE -->|Yes| NEO
    TRAVERSE -->|No| TXN
    TXN -->|Yes| PG
    TXN -->|No| DOC
    DOC -->|Yes, many writes| MONGO
    DOC -->|No / mixed| COL
    COL -->|Yes| CAS
    COL -->|No, more read/random| HB
    HB -->|Need managed| DDB
    HB -->|Self-managed OK| POSTGRES_GREAT
    PG -->|Speed is goal| REDIS
    REDIS -->|Structured values| KV_TYPES
    KV_TYPES -->|Yes| REDIS
    KV_TYPES -->|Simple strings| TOPIC
    MONGO -->|Need managed| MG
    MG -->|Yes| DDB
    MG -->|No, structured data| REDIS

Critique of Specific Chapters

PostgreSQL (Chapters 1–2)

What works: Grounding NoSQL exploration in a rigorous ACID baseline. JSONB introduction is forward-looking. What's missing: No coverage of connection pooling (PgBouncer), replication modes (streaming replication, logical replication), or PostgreSQL as a reasoning tool for eventual consistency via read-only replicas.

HBase (Chapters 3–4)

What works: Row-key design as a first-class design concern is handled well. The HDFS prerequisite is acknowledged honestly. What's missing: No coverage of coprocessors, Phoenix SQL layer, or the fact that most modern HBase users have moved to cloud-managed HBase (via Google Cloud Bigtable, AWS Managed HBase).

MongoDB (Chapters 5–6)

What works: Aggregation pipeline is well demonstrated. Schema validation rules are introduced without overcomplicating things. What's missing: Change Streams (a production-critical feature for event sourcing), sharding architecture, and the security model (auth, TLS, role-based access) are absent.

Neo4j (Chapters 7–8)

What works: Cypher's ASCII-art syntax is perfectly suited to a book format. Graph algorithms (PageRank, shortest path) are introduced practically. What's missing: Neo4j Fabric (deploying across multiple databases), APOC library, and the Bloom filter optimization layer. These are core production concerns.

Cassandra (Chapters 9–10)

What works: Lightweight transactions (LWT) and consistency math (quorum arithmetic) are handled with unusual clarity. The multi-DC replication strategy is essential and well-taught. What's missing: SSTable compaction strategies (SizeTiered vs Leveled vs TimeWindow), materialized views (deprecated but in the book at time of writing), and tombstone management—critical for long-lived Cassandra clusters.

DynamoDB (Chapters 11–12)

What works: Single-table design philosophy is articulated as clearly as anywhere in print. GSI modeling is practical. What's missing: As noted in weaknesses: modern features, detailed cost modeling, and serverless application integration patterns (Step Functions, AppSync).

Redis (Chapters 13–14)

What works: The breadth of data structures covered in a short chapter is impressive. The rate-limiter exercise is immediately production-applicable. What's missing: Redis as a primary data store (RedisJSON, RediSearch, RedisTimeSeries modules), cluster resharding, and Lua scripting for atomic complex operations.

Originality and Contribution to the Field

The book's primary original contribution is its format: forced sequential, hands-on comparison of diverse database systems rather than categorical encyclopedic treatment. This format directly encodes polyglot persistence philosophy into the reading experience itself. No single chapter stands as groundbreaking contributions to database theory or practice, but the aggregate effect—a practitioner fluent in seven paradigms—is genuinely rare.

Final Verdict

Recommended for: Backend engineers, data engineers, technical architects who need to evaluate database technologies for production systems. The book assumes programming fluency and basic familiarity with database terminology—beginners should pair it with a fundamentals text.

Not recommended for: Database administrators needing production operations depth, or specialists seeking deep expertise in a single database system.


narration

Part III: Narration Script

A spoken-word companion to Seven Databases in Seven Weeks. Read aloud or serve as an audio guide for the book's key ideas.


Opening

There is a moment every software engineer reaches when the single database that worked for everything starts to fail. Not catastrophically—not at first. It starts as a throb behind the locks. A subquery that used to take milliseconds now takes seconds. A write that used to be atomic becomes a coordination nightmare.

And the engineer asks: is this the database's fault, or is this my fault?

Seven Databases in Seven Weeks argues it is neither. It is a category mismatch. The problem isn't that your database is broken. The problem is that you are asking one tool to do seven tools' worth of work.


Week One: Why Start with PostgreSQL?

Every story about NoSQL begins with a story about SQL. And the honest version of that story starts in the same place: PostgreSQL. Not because it is the perfect database, but because it is the best-defined version of what a database promises.

ACID transactions. Declarative SQL. Atomic commits. Isolation levels. Write-ahead logging for durability. These are not features of a legacy technology. They are the contractual obligations a relational database makes with your data.

The book uses PostgreSQL as a control group. If Neo4j traverses a graph in near-constant time, what is that compared against? Multi-way JOINs in SQL—the exponential cost of which is real, measurable, and production-critical. If MongoDB embeds documents to avoid the object-relational impedance mismatch, what is impedance mismatch costing you in the PostgreSQL world? It costs you an ORM layer, a mental translation, and a bunch of schema migration pain.

PostgreSQL also has a JSONB type. This is not a footnote—it is the book's first hint that the NoSQL/relational divide is not a wall. It is a gradient. And understanding where you sit on that gradient is the entire exercise.


Weeks Two and Three: The Distributed Reality

After PostgreSQL, the book enters the distributed database layer. HBase and Cassandra arrive in quick succession, and they arrive with a message: your data is too big for one machine. Therefore, your database must live on many. And if it lives on many machines, network partitions are not hypothetical—they are a scheduled event.

Here the book introduces CAP theorem: Consistency, Availability, Partition Tolerance—pick two. And then immediately expands it to PACELC: when there is a partition, trade availability against consistency; when there is no partition, trade latency against consistency.

HBase is CP-leaning within a single row. Cassandra is AP-leaning with tunable consistency per operation. These are not philosophical distinctions—they are operational decisions that show up in your latency SLAs during a network partition.

HBase's lesson is simpler and harder than Cassandra's: your row key is your data model. Lexicographic sorting of row keys isn't an implementation detail. It is the only reason your performance model works. Get the row key wrong and you cannot fix it later with an index.

Cassandra's lesson is different: distributed writes at scale require a different mental model entirely. You do not JOIN across machines. You do not JOIN inside a partition either, because Cassandra optimizes for writes, not for ad-hoc query flexibility. You design a table for each query pattern, and each query pattern is a table. This inverses the normalization instinct PostgreSQL taught you. And the inversion is not a mistake—it is the correct response to a different physical substrate.


Weeks Four and Five: The Document Alternative

MongoDB represents a different answer to a different problem: what if your data doesn't fit into rows and columns?

Documents are application-native. The BSON format is what your application objects already look like. This eliminates the ORM layer, the impedance mismatch, and the schema migration cycle that slows down rapid product iteration. The book is clear-eyed about this trade: schema flexibility comes with fewer guarantees. Multi-document transactions require explicit handling. Referential integrity is advisory.

Neo4j arrives next with a category shift, not just a data model change. Relationships are not foreign keys in Neo4j. They are the data. The shortest path between two nodes in a social network, or the impact radius of a service outage in a microservices dependency graph, is a query that takes milliseconds in Neo4j and requires either an exponential JOIN explosion or a precomputed recursive CTE in PostgreSQL.

The Cypher query language—MATCH (person:Author)-[:WROTE]->(book:Book)—is intentionally readable. The ASCII-art pattern syntax is not just pretty. It encodes graph structure visually in a way no other query language has matched. Reading Cypher, a developer who has never seen Neo4j can intuit the query's structure.


Week Six: Cloud-Managed Absolves—and Constrains

DynamoDB is the most ideologically different chapter in the book, because it is not just a different database. It is a different contract with the user.

Self-hosted databases require you to manage servers, replication, backups, patching, and capacity planning. DynamoDB requires you to manage none of that. What it requires instead is upfront schema discipline: a single-table design, a known set of access patterns before you create the table, and the discipline to never need a query pattern that was not anticipated.

The single-table design philosophy—where Users, Orders, and Products all live in the same physical table with a composite sort key encoding the entity type—is counterintuitive to developers trained in relational normalization. But the book demonstrates it precisely: one table, one scan, all entity types. The alternative is three tables, three queries, and multiple round trips.


Week Seven: Speed at the Edge

Redis closes the book at the infrastructure boundary where databases meet caches meet message brokers meet session stores. Sub-millisecond reads. Rich data structures inside the database engine itself: sorted sets for leaderboards, hashes for user sessions, HyperLogLog for cardinality estimation.

The rate limiter exercise—using a Redis sorted set to implement a sliding window—is not theoretically interesting. It is immediately production-applicable. Every engineer who has ever rate-limited an API endpoint should understand the Redis sorted set sliding window. It is the cleanest known implementation.

Redis is not safe for all data by default. Its async replication means a primary failure can lose acknowledged writes. Its in-memory model means data volume is bounded by RAM cost. The book is clear that Redis as a primary data store requires operational care that a disk-backed database does not.


The Book's Real Argument: Polyglot Persistence

Every chapter ends with a pattern: here is what this database does well, and here is what it explicitly does not do. Sequel databases do not graph-traverse. Graph databases do not shard horizontally for write throughput. Document databases have unbounded schema flexibility at the cost of transactional scope.

No single database is right for every problem. The "right" answer is almost always two or three databases, used for different data domains in the same system.

This is polyglot persistence: not as an academic proposal, but as a production architectural pattern.

  • Use PostgreSQL for financial transactions requiring ACID guarantees.
  • Use Redis for the caching layer and session state.
  • Use Neo4j for the social graph and recommendation engine.
  • Use MongoDB for the product catalog with flexible product schemas.
  • Use Cassandra for the time-series metrics pipeline.
  • Use DynamoDB when you cannot afford to manage infrastructure.
  • Use HBase when your data volume requires HDFS-native random reads.

The book's contribution is not teaching you these databases individually. It is teaching you to think fluently across all seven—to select the right tool per data problem without ideological attachment to any one paradigm.


Closing

The last chapter of the book is not a conclusion. It is a beginning. A developer fluent in seven paradigms is not a database generalist in the pejorative sense. They are an architect who can match physical data substrate to cognitive data model—and stop asking one tool to carry seven jobs.

That, ultimately, is what Seven Databases in Seven Weeks teaches. Not seven tools. One philosophy, applied seven ways.


End of narration. ISBN 9781680502543. Pragmatic Bookshelf, 2022.