booklore

Designing Event-Driven Systems

Concepts and Patterns for Streaming Services

sufficient

reading path: overview → analysis → narration


overview

Overview

Designing Event-Driven Systems: Concepts and Patterns for Streaming Services (O'Reilly, May 2018) by Ben Stopford is the authoritative guide to building business-critical systems with Apache Kafka and event streaming. Stopford, a principal engineer at Confluent (the company behind Kafka), draws on years of production experience to bridge the gap between messaging mentality and true event-driven architecture.

The book is organized in five parts spanning 15 chapters: setting the stage (Chapters 1–4), designing event-driven systems (Chapters 5–7), rethinking architecture at company scale (Chapters 8–10), consistency and evolution (Chapters 11–13), and implementing streaming services (Chapters 14–15).

The O'Reilly listing shows 171 pages, approximately 4 hours of reading. A print edition with ISBN 9781492038240 was later published by O'Reilly in April 2022.


Executive Summary

Stopford's central thesis: the log (a replayable, append-only event stream) should serve as the backbone of your system — the shared, immutable record that connects services, enables recovery, and eliminates data duplication across an organization.

| Part | Focus | Key Chapters | |------|-------|-------------| | I: Setting the Stage | Kafka fundamentals, what it is and isn't | Ch 1–4 | | II: Designing Event-Driven Systems | Event patterns, stateful processing, CQRS | Ch 5–7 | | III: Company-Scale Architecture | Shared data, inside-out databases, lean data | Ch 8–10 | | IV: Consistency & Evolution | Concurrency, transactions, schema evolution | Ch 11–13 | | V: Implementing with Kafka | Kafka Streams, KSQL, streaming services | Ch 14–15 |

The book sits between two extremes: event-driven architecture for small teams (solving the request-response reliability problem) and company- scale architectures (using event streams as the shared source of truth).


Key Takeaways

  1. Kafka is not a message broker. It is a distributed, replayable log — more like a database than a traditional queue. The "messages" are events that persist for as long as you need.

  2. Events, commands, and queries are different. Commands request action; events record what happened. Mixing them leads to brittle systems. Stopford argues for strict separation.

  3. Loose coupling has limits. Essential data coupling is unavoidable. The pattern you use — notification vs. state transfer vs. event collaboration — determines what data flows between services.

  4. Event Sourcing ≠ Event Streaming. Event sourcing rebuilds state from an event log. Event streaming moves data between services. They solve different problems; don't use one when you need the other.

  5. The Event Collaboration Pattern reduces inter-service dependencies by having services react to events rather than calling each other directly. The log is the communication backbone.

  6. Stateful stream processing is a first-class architectural choice. Kafka Streams tables and state stores let services maintain local state derived from events, recover after failure, and process in time windows.

  7. Schema compatibility is a production concern. Not just about APIs — your event schemas evolve, and backward/forward compatibility must be enforced at the boundary (Confluent Schema Registry is the canonical solution).

  8. The "competing consumer pattern" scales event processing horizontally by adding consumers reading from the same log partition — each message processed exactly once across the consumer group.

  9. Eventual consistency is intentional, not a bug. The book provides practical tools — command topics, the single-writer principle, idempotent consumers — for managing consistency without distributed transactions.

  10. Don't use events when CRUD or a simple API will do. Streaming adds complexity: schema management, monitoring, consumer group coordination, and operational overhead. Stopford explicitly warns against over-applying the pattern.


Who Should Read

| Reader Type | Why | |---|---| | Senior backend engineers | Foundational patterns for event-driven design at scale | | Platform/data engineers | Kafka internals, schema management, stream processing | | Architects | Trade-offs between EDA, SOA, and CRUD-based architectures | | Technical leads evaluating Kafka | Honest guide to when Kafka helps and when it doesn't | | Anyone preparing for system design interviews | Concise treatment of EDA, log-based recovery, CQRS |


Who Should Skip

  • Absolute beginners to distributed systems — start with a Kafka getting-started guide first
  • Readers seeking an academic theory treatment — this is practitioner- focused, not a formal proof-based text
  • Teams with no Kafka or streaming workloads — less directly applicable

Core Themes

| Theme | Description | |-------|-------------| | The log as architecture | Append-only, replayable streams as a system-wide contract | | Separation of concerns: events vs. commands | Naming and structure conventions that prevent coupling | | Inside-out databases | Building services that expose their internals as streams | | Lean data pipelines | Messages carry deltas, not full state; views are rebuilt | | Company-wide data sharing | Event streams as a first-class organizational asset | | Resilience through immutability | Never update or delete — append and rebuild |


Why This Book Matters

Stopford wrote the book that Kafka engineers needed but didn't have: a bridge between the promise of event-driven architecture and the messy reality of making it work across many teams. It is widely considered the essential companion to Kafka in Action and the definitive practitioner's guide to streaming architecture foreworded by Sam Newman.

Sam Newman's foreword positions the book as the answer to a critical gap: microservices give us autonomy, but autonomy without a shared data strategy creates islands. Stopford shows how event streams are the bridge.


| Book | Author | Connection | |------|--------|------------| | Building Microservices | Sam Newman | Foreword author; companion on distributed system design | | Designing Data-Intensive Applications | Martin Kleppmann | Theoretical foundation; complementary to practice | | Kafka: The Definitive Guide | Neha Narkhede et al. | Deep Kafka internals; published by Confluent team | | Domain-Driven Design | Eric Evans | Conceptual foundations for bounded context and service boundaries | | Release It! | Michael Nygard | Production resilience patterns applicable to EDA |


Final Verdict

Designing Event-Driven Systems is the most concentrated, practitioner- oriented guide to event streaming architecture available. Stopford's clarity on when to use — and when to avoid — event-driven patterns alone is worth the price of admission.

The book is best read with Kafka experience, but its conceptual chapters (5, 8–10, 12–13) reward any engineer building distributed systems.

Rating: 8.5/10 — The essential field guide for any engineer evaluating or implementing Apache Kafka and event-driven architecture.


content map

The Three Message Types

Stopford's most influential contribution is the strict separation of commands, events, and queries — three message archetypes with different semantics, lifecycles, and implications for system design.

graph TD
    subgraph Message_Types["Message Type Semantics"]
        CMD["Command<br/>(imperative, directed)<br/>'ProcessOrder(id=42)'"]
        EVT["Event<br/>(declarative, broadcast)<br/>'OrderProcessed(id=42)'"]
        QRY["Query<br/>(request-response)<br/>'GetOrder(id=42) => {...}'"]
    end

    subgraph Direction["Flow Direction"]
        D1["Client → Service"]
        D2["Service → Topic"]
        D3["Client ↔ Service"]
    end

    CMD --> D1
    EVT --> D2
    QRY --> D3

Commands request action. They are directed at a specific service, expect a response, and have an imperative verb form (ProcessOrder, DeleteUser). They are the synchronous world's default.

Events announce what happened. They are named in the past tense (OrderProcessed, UserDeleted), addressed to a topic, and broadcast to any number of consumers. They are the shared language of event-driven systems.

Queries ask for state without side effects. In practice they tend to be synchronous REST calls or materialized view lookups.

Mixing these types — especially using events as commands — causes operational chaos: consumers fail and the producer never knows, or consumers retry and produce duplicates.


Coupling: The Spectrum

Stopford forces the reader to confront the myth that message brokers eliminate coupling. They change the kind of coupling, not eliminate it.

graph LR
    subgraph Coupling_Spectrum["Coupling Spectrum"]
        TIGHT["Tight Coupling<br/>(synchronous API calls)<br/>Service A calls Service B directly"]
        ESSENTIAL["Essential Data Coupling<br/>(unavoidable: shared schema, shared concepts)"]
        LOOSE["Loose Coupling<br/>(async message broker)<br/>Service A emits → Topic → Service B"]
        IDEAL["Ideal: Shared Log<br/>(Kafka)<br/>Both services read from replayable log"]
    end

    TIGHT --> ESSENTIAL
    ESSENTIAL --> LOOSE
    LOOSE --> IDEAL

The three collaborative patterns:

Event Notification (Level 1)

Service A emits OrderPlaced. Service B is notified, then looks up order details via a separate call. Works for simple cases but still requires coupling on the query interface.

sequenceDiagram
    participant P as Producer
    participant T as Topic
    participant C as Consumer

    P->>T: EventNotification("order_placed", orderId)
    T-->>C: deliver event
    C->>+P: getOrderDetails(orderId) [sync call]
    P-->>-C: { order details }

Event-Carried State Transfer (Level 2)

The event itself carries enough state that the consumer doesn't need to call back. Reduces coupling but increases message volume and risks data duplication.

Event Collaboration (Level 3)

Services interact primarily through events on a shared log. No direct calls between services; the log is the only communication channel. The most decoupled form, but requires careful design of event schemas and lifecycle.

graph TD
    subgraph Service_A["Service A"]
        A_L["Local State"]
        A_P["Producer"]
    end
    subgraph Kafka_Log["Kafka Log (Shared Backbone)"]
        T1["Topic: orders"]
        T2["Topic: payments"]
        T3["Topic: shipments"]
    end
    subgraph Service_B["Service B"]
        B_C["Consumer Group"]
        B_L["Local View<br/>(rebuilt from events)"]
    end
    subgraph Service_C["Service C"]
        C_C["Consumer Group"]
        C_L["Local View"]
    end

    A_P --> T1
    T1 --> B_C
    T1 --> C_C
    B_C --> B_L
    C_C --> C_L
    A_L --> A_P

The Kafka Log: What It Is

Stopford devotes an entire chapter (3) to demystifying Kafka. The core insight: Kafka is a persistent, distributed, replayable log — not a queue, not a message broker.

graph LR
    subgraph What_Kafka_Is_Not["Common Misconceptions"]
        M1["REST (synchronous)"]
        M2["Enterprise Service Bus"]
        M3["Traditional Message Queue"]
        M4["Database"]
    end

    subgraph What_Kafka_Is["What Kafka Is"]
        K1["Distributed Log"]
        K2["Immutable Append-Only"]
        K3["Replayable"]
        K4["Partitioned & Ordered"]
        K5["Retained by Time/Size"]
        K6["Backbone for Streams AND Shared Data"]
    end

    M1 -. critique .-> K1
    M2 -. critique .-> K2
    M3 -. critique .-> K3
    M4 -. critique .-> K4

A Kafka log consists of partitions — ordered, immutable sequences of records. Each partition has a single leader broker and replicated followers. Consumers read at their own pace; the log is retained independently of consumption.

Critical properties:

  • Ordering is guaranteed per-partition
  • Retention is time- or size-based; events are not deleted on consumption
  • Replay means any consumer can re-read events from any offset
  • Compacted topics retain only the latest value per key

Event Streaming vs. Event Sourcing

One of Stopford's most valuable distinctions: event streaming and event sourcing solve different problems. Confusing them causes design errors.

graph LR
    subgraph Event_Streaming["Event Streaming"]
        ES1["Purpose: Decouple systems"]
        ES2["Events: transient messages"]
        ES3["Retention: short to medium"]
        ES4["Pattern: fire and listen"]
        ES5["Example: OrderPlaced → PaymentService → ShipmentService"]
    end
    subgraph Event_Sourcing["Event Sourcing"]
        EV1["Purpose: Rebuild state"]
        EV2["Events: system of record"]
        EV3["Retention: permanent append-only"]
        EV4["Pattern: replay to rebuild"]
        EV5["Example: rebuild bank account balance by<br/>replaying all transactions"]
    end
    Event_Streaming -. "often combined" .-> Event_Sourcing

Event Streaming: Get data from system A to system B. The event log is a communication channel. Events may be retained briefly. Consumption is typically fire-and-forget.

Event Sourcing: The event log is the data. The state of the system is derived from replaying events. Events are never updated or deleted. This enables audit trails, time-travel debugging, and complete rebuilds of derived views.

Many systems combine both: streaming events between services while using event sourcing within individual services.


Stateful Stream Processing

Stopford distinguishes three processing models, each with different trade-offs for state management:

| Model | State Location | Recovery | Best For | |-------|---------------|----------|----------| | Stateless (pure streaming) | Stateless functions | No state to recover | Simple transformations, filtering | | Event-driven (notification) | Source services | Callback to rebuild | Notifications, trigger workflows | | Stateful streaming | Local state stores (Kafka Streams) | Rebuild from log | Aggregations, joins, windows |

graph TD
    subgraph Stateful_Stream["Stateful Stream Processing"]
        INPUT["Input Stream<br/>(Kafka changes topic)"]
        STORE["State Store<br/>(RocksDB, changelog topic)"]
        PROCESS["Topology<br/>(filter, map, aggregate, join)"]
        OUTPUT["Output Topic<br/>(derived view)"]
    end

    INPUT --> PROCESS
    PROCESS <--> STORE
    PROCESS --> OUTPUT

    subgraph Recovery["Failure Recovery"]
        CRASH["Service crashes"]
        REBUILD["State store rebuilt by<br/>replaying changelog topic"]
        REJOIN["Service rejoins consumer group"]
    end

    CRASH --> REBUILD
    REBUILD --> REJOIN
    REJOIN --> PROCESS

The changelog topic — write-only, compaction-enabled — is the mechanism that makes state stores durable. It records every state change, allowing full recovery without checkpointing.


Event Sourcing, CQRS, and Materialized Views

Chapter 7 is the deep dive. Stopford covers the full implementation spectrum with Kafka:

flowchart TD
    CMD["Command<br/>(createOrder)"]
    VLD["Validate & Authorize"]
    EVT["Event<br/>(orderCreated)"]
    LOG["Kafka Topic<br/>(immutable log)"]
    STORE["Event Store<br/>(Kafka as store)"]
    VS["View Service<br/>(Kafka Streams)"]
    DB["Materialized View<br/>(read-optimized)"]
    API["API Gateway"]

    CMD --> VLD
    VLD --> EVT
    EVT --> LOG
    LOG --> STORE
    STORE --> VS
    VS --> DB
    DB --> API

    subgraph Write_Path["Write Path"]
        CMD
        VLD
        EVT
    end
    subgraph Read_Path["Read Path"]
        API
        DB
        VS
    end
    subgraph Log_Backbone["Immutable Backbone"]
        LOG
        STORE
    end

Key patterns in this chapter:

Command Sourcing: Save every command (what was requested) alongside events (what happened). Enables full audit trail of intent and outcome.

CTR: Commands trigger validation; valid commands become events. Each event has a single, deterministic handler.

Materialized Views: Pre-computed query results built from events. Views are rebuilt by replaying events — from the beginning or from a saved offset.

Polyglot Views: Different read models for different query patterns. One service writes events; N services build N views in N storage technologies. The log enables this without tight coupling.

Change Data Capture (CDC): Unlock legacy databases by streaming their change log (via Debezium + Kafka Connect) into the event ecosystem. Old systems become event producers without code changes.


Schema Evolution

Stopford treats event schemas as first-class API contracts, not an afterthought. The practical framework:

graph TD
    subgraph Schema_Evolution["Schema Evolution Contract"]
        BC["Backward Compatibility<br/>(new schema reads old data)"]
        FC["Forward Compatibility<br/>(old schema reads new data)"]
        FULL["Full Compatibility<br/>(new schema + old consumer works)"]
        TRANS["Transform<br/>(migrate data, deprecate schema)"]
    end

    subgraph Tools["Schema Management Tools"]
        SR["Schema Registry<br/>(Confluent)"]
        VER["Compatibility Validation<br/>(at write time)"]
        VER2["Versioning<br/>(subject + version)"]
    end

    BC --> SR
    FC --> SR
    SR --> VER
    VER --> VER2
  • Backward compatible: new producer → old consumer (most common)
  • Forward compatible: old producer → new consumer (rare but needed for rolling upgrades)
  • Full compatibility: both directions (the safe zone)
  • Schema transformations: reshape data at the boundary to avoid breaking changes

Structured schemas (Avro, Protobuf, JSON Schema with Registry) prevent the "unreadable message" problem — when a consumer cannot deserialize because the schema changed unexpectedly.


analysis

Strengths

  • Kafka as architecture, not infrastructure. Stopford elevates Kafka from middleware to architectural backbone. The "inside-out database" framing (Chapter 9) is original and persuasive.
  • Grounded in production reality. Written from direct experience building Kafka at Confluent — not from conference talks or blog posts. The trade-off discussions carry genuine operational weight.
  • Events vs. commands taxonomy is a genuine contribution. Stopford's insistence on strict naming has shaped how teams structure their Kafka topics and event schemas across the industry.
  • Practical CQRS with Kafka. Rather than theoretical explanation, the book shows five concrete implementation patterns in Chapter 7 — covering trade-offs the reader will actually face.
  • Enterprise-scale thinking. Unlike microservice books written for startups, Stopford addresses the hard problems: data sharing across organizational boundaries, schema governance, the REST-to-ETL migration trap (Chapter 8).
  • Honest about when not to use events. Chapter 1 and recurring warnings throughout the book make it a corrective to the hype cycle around event-driven architecture.

Weaknesses

  • Dense for its length. At 171 pages the book is short, but the concepts per page ratio is very high. Some passages assume fluency in distributed systems vocabulary.
  • Kafka-centric. The patterns are real, but the practical implementation paths are Kafka-specific. Engineers building with Pulsar, Kinesis, or Service Bus must translate.
  • Schema Registry is a black box. The book explains concepts but delegates the implementation mechanics of schema management to the Confluent Schema Registry, without going deep on its operation.
  • Limited testing guidance. Unit and integration testing for event-driven services is barely addressed — an area where most teams struggle the most in practice.
  • Transaction chapter is skeptical. Stopford accurately describes Kafka's transaction API (Chapter 12), but his conclusion — "do we really need transactions?" — may frustrate engineers in domains (financial services, healthcare) where strong consistency is required.

Criticism

The "Kafka Sales Brochure" Critique

Some readers see the book as marketing dressed as technical content. Stopford is a Confluent employee; the book's canonical solutions invariably depend on Confluent's paid ecosystem (Schema Registry, KSQL which has been rebranded as ksqlDB, Confluent Cloud). The patterns themselves are sound, but the commercial context is worth awareness. The core concepts apply to any replayable log system.

The "Missing the Complexity" Critique

Several reviewers note that the book understates the operational difficulty of running event-sourced and CQRS systems at scale. Schema migration across 50+ consuming services, managing consumer lag during replays, and multi-topic atomic commits are either glossed over or presented as simpler than they are in practice. Teams that underestimate these challenges have learned this from production incidents, not from the book.

The "Dates the Book" Critique

Originally published as a shorter report in 2018 and then expanded, the book reflects the Kafka ecosystem circa 2017–2018. ksqlDB has evolved, Kafka Connect has changed, and newer patterns like Iceberg-backed log storage and tiered storage have emerged. The conceptual framework is timeless; the specific implementations require current documentation.


Context: Why This Book Exists

The early-to-mid-2010s was the peak of the microservices hype cycle driven by Netflix, Amazon, and Martin Fowler's writings. Teams decomposed monoliths into thousands of services without a clear strategy for data sharing. The result: the "REST-to-ETL problem" (termed in this book's Chapter 8) — systems glued together with synchronous APIs that then required nightly batch ETL to share data.

Stopford's answer was the log: a single, ordered, replayable stream of events that describes "what happened" across the entire organization. Services subscribe to the log and build their own views. This is:

  • The pattern LinkedIn used to scale Kafka to trillions of messages
  • The pattern that underpins Kafka Streams and KSQL
  • The central idea that made Kafka more than "a message bus"

Sam Newman's foreword situates the book precisely: microservices give us team autonomy, but autonomy without a shared data strategy creates operationally expensive islands. Stopford's event streams are the bridge.


Appreciation: The "Inside-Out Database" Chapter

Chapter 9, "Event Streams as a Shared Source of Truth," is the intellectual centerpiece. Stopford inverts the traditional database mental model: instead of services exposing APIs and periodically dumping data to a data warehouse, services publish their internals as an event stream. Other services subscribe and build local views.

graph LR
    subgraph Outside_In["Traditional: Outside-In"]
        SVC["Service with private db"]
        ETL["Nightly ETL"]
        DW["Data Warehouse"]
        REP["Reports (3 days stale)"]
        SVC --> ETL --> DW --> REP
    end
    subgraph Inside_Out["Event-Driven: Inside-Out"]
        LOG["Shared Event Log<br/>(real-time, replayable)"]
        SVC2["Service writes to log"]
        V1["View Service A"]
        V2["View Service B"]
        V3["Analytics Engine"]
        SVC2 --> LOG
        LOG --> V1
        LOG --> V2
        LOG --> V3
    end

The traditional model has low-latency writes but high-latency reads (days until data reaches analysts). The inside-out model has near- zero latency for new data consumers, at the cost of requiring consumers to manage state and handle schema evolution.

This pattern is what Kafka advocates call the "unified log" — a single immutable log that replaces thousands of API endpoints and nightly batch pipelines.


Appreciation: The Competing Consumer Pattern

Stopford explains why Kafka's consumer group model makes scaling event processing genuinely simple compared to traditional message queues:

graph TD
    subgraph Log["Kafka Topic"]
        P1["Partition 1<br/>(offset 0, 1, 2...)"]
        P2["Partition 2<br/>(offset 0, 1, 2...)"]
        P3["Partition 3<br/>(offset 0, 1, 2...)"]
    end
    subgraph CG1["Consumer Group A<br/>(ordering required)"]
        C1["Consumer 1"]
        C2["Consumer 2"]
    end
    subgraph CG2["Consumer Group B<br/>(analytics, parallel)"]
        C3["Consumer 1"]
        C4["Consumer 2"]
        C5["Consumer 3"]
    end

    P1 --> C1
    P2 --> C2
    P3 --> C1

    P1 --> C3
    P2 --> C4
    P3 --> C3
    P3 --> C4
    P3 --> C5
  • Each consumer group gets its own independent offset — "competing consumers" without coordination
  • Adding consumers to a group increases parallelism up to the number of partitions
  • Multiple consumer groups can process the same events for different purposes without interfering

This is the practical enabler of event-driven architecture at scale.


narration

Opening (0:00–0:45)

Welcome to BookAtlas. Today, we're diving into one of the most influential books on modern system architecture: Designing Event-Driven Systems: Concepts and Patterns for Streaming Services, by Ben Stopford.

Originally published by O'Reilly in May 2018, this 171-page book comes from an engineer who has lived the problems it describes. Ben Stopford is a principal engineer at Confluent, the company behind Apache Kafka. He was building Kafka infrastructure at scale when he wrote this book — so this isn't theory from someone watching from the sidelines. It's practice, written by one of the practitioners.

The book carries a foreword by Sam Newman, author of Building Microservices, who calls it the answer to a critical gap: microservices give us team autonomy, but autonomy without a shared data strategy creates expensive islands. Stopford shows how event streams are the bridge.


The Problem Stopford Is Solving (0:45–2:30)

To understand why this book matters, you have to understand the problem it arose from.

The early 2010s saw a massive push toward microservices. Netflix, Amazon, every startup — decompose the monolith, give teams autonomy, ship faster. And that worked, kind of.

But here's what happened next: those thousands of independently deployed services needed to share data. Some teams built synchronous REST APIs. Others used message queues. Some hauled data into nightly batch ETL pipelines into a data warehouse.

None of those approaches scaled. Synchronous APIs create tight coupling — if the payment service is down, the checkout service is down, and the whole site goes down with it. Nightly ETL means data is days old by the time analysts see it. Message queues don't remember — if a consumer crashes, the message is gone.

Stopford's answer is the log. A shared, ordered, replayable stream of events that every service can read from. The log is the backbone of the system. And that insight — simple on the surface, profound in its implications — is the heart of this book.


What Kafka Really Is (2:30–5:00)

Stopford spends Chapter 3 unravelling what Kafka is. This is one of the most valuable sections of the book, because most engineers arrive at Kafka with the wrong mental model.

He says: Kafka is not a message queue like RabbitMQ. It's not an enterprise service bus. And it's not quite a database.

What it actually is: a distributed, persistent, replayable log. Think of an accounting ledger that never closes. Every event gets appended in order, and any service can read from any point in that history.

The genius of this design is that it separates the concerns of producers and consumers. Producers just write — fast, they don't care who's reading. Consumers read at their own pace, and can even re-read from any previous point. And because the log is persistent, a consumer that crashes and restarts doesn't lose anything. It just continues from where it left off.

This is fundamentally different from a message queue, where once a message is acknowledged, it's gone. Kafka's events stick around until you decide to age them out — which means you can replay history whenever you need to.


Events vs. Commands: The Taxonomy That Changes Everything (5:00–8:00)

Chapter 5 might be the most important chapter in the book. Stopford insists that engineers get sloppy with their message naming — calling everything a "message" or an "event" — and that sloppiness causes real bugs and operational pain.

His taxonomy:

Commands are requests for action. They're directed, synchronous, imperative. "Process this order."

Events are records of what already happened. They're broadcast, asynchronous, declarative. "Order forty-two has been processed."

Queries are requests for information without side effects. They're typically synchronous API calls.

The reason this matters: commands and events have fundamentally different lifecycles. A command expects a response — if the consumer doesn't reply, the producer is left hanging. An event is fire-and-forget — the producer doesn't know or care who's listening.

When engineers treat events as commands — expecting every consumer to handle them — they end up with systems that silently fail. The producer thinks everything is fine, but half the consumers never received the message, or received a version they couldn't parse.

Stopford's prescription: name things correctly. If it's a command, call it a command, design for retry and response. If it's an event, call it an event, design for idempotent consumption and eventual consistency.


Loose Coupling — and Its Limits (8:00–10:30)

Here's where Stopford challenges a common article of faith in the microservices community.

Engineers love loose coupling. It's supposed to mean you can change one service without breaking another. And message brokers are supposed to give you that.

Stopford says: message brokers change the type of coupling, they don't eliminate it. If consumers depend on a specific event schema, changing that schema still breaks consumers — it just happens at message parse time rather than at build time.

More importantly, he introduces the idea of "essential data coupling." Some coupling is inevitable because services share domain concepts. An "order" in the orders service is the same thing as an "order" in the payments service. You cannot eliminate that coupling — but you can manage it deliberately.

This is the practical impact of the three collaboration patterns: notification, state transfer, and event collaboration. Each makes different trade-offs about what data flows between services and when.

The key insight: choose your pattern deliberately based on the coupling requirements, not based on which pattern is currently fashionable.


Stateful Stream Processing — Kafka Streams (10:30–14:00)

One of the more advanced sections of the book covers stateful stream processing. This is where events aren't just forwarded — they're used to build and maintain local state.

Stopford makes a distinction that many engineers miss. There are three ways to process a stream:

Stateless: Read each event, transform it, write the result. No memory of previous events. Simple, but limited — you can only answer things like "count events per minute" if you keep that count outside the processor.

Event-driven: React to events by making calls back to other services. This works but reintroduces synchronous coupling.

Stateful streaming: Maintain local state — a table, a window, a join — derived from the stream itself. Kafka Streams is the primary example. It gives you exactly-once processing semantics, local state stores backed by the changelog, and automatic recovery when a processor restarts.

The changelog topic is the mechanism that makes this work. Every state change is written to a compacted Kafka topic. If the service crashes, the state store is rebuilt by replaying the changelog. No checkpoint coordination required.

Stopford treats this as a first-class architectural choice, not a performance optimization.


Event Sourcing and CQRS (14:00–18:00)

Chapters 6 and 7 get into the deep end: Event Sourcing and CQRS.

Event Sourcing means you never store current state. You store every event that led to that state, and rebuild the current state by replaying the event log. It turns your database into a function that reduces a list of events into a single result.

CQRS — Command Query Responsibility Segregation — separates the write path from the read path. Commands go through validation and become events. Queries read from a pre-computed, read-optimized view built from those events.

Stopford shows five concrete ways to implement these patterns with Kafka, and the comparison is the real value of this chapter:

In-process views with Kafka Streams tables: simplest, fastest reads, but state is local and must be rebuilt on failure.

Writing through a database into a Kafka topic: legacy systems can participate via Kafka Connect and CDC — no code changes required.

Writing through a state store to Kafka: the most functional, using Kafka Streams processors. State survives crashes via changelog topics.

Unlocking legacy systems with CDC: Debezium captures row-level changes from existing databases (MySQL, PostgreSQL, MongoDB) and publishes them to Kafka topics. Suddenly that legacy system is part of your event ecosystem without a single line of application code.

The key reminder throughout: event sourcing is not for every system. It's complex, it requires new operational patterns, and it only pays off when you genuinely need to rebuild state, audit change history, or support time-travel-style queries.


Schema Evolution: The API Contract of Events (18:00–21:00)

Chapter 13 covers schema evolution — a topic many teams only discover the hard way, after a breaking schema change has crashed half their consumers.

Stopford treats event schemas API contracts. The same discipline that applies to REST endpoints applies to event schemas: backward compatibility, versioning, deprecation, and migration.

The primary tools he discusses:

Confluent Schema Registry: stores schemas centrally, validates messages against schemas at write time, and provides compatibility checks before a schema change is allowed.

The compatibility modes: backward compatible, forward compatible, or full — meaning consumers on either the old or new schema can still process messages.

Schema evolution scenarios: adding optional fields (safe, backward compatible), removing fields (safe for new consumers, breaks old ones), renaming fields (technically OK for most serialization formats, but requires careful testing).

The reminder that stops teams in their tracks: schema changes are deployment changes. A producer deploying a new schema is useless if consumers haven't been updated. Schema compatibility checks are the safety net that prevents half-deployed changes from causing data loss.


Consistency Without Transactions (21:00–24:00)

Chapter 11 is one of the most counter-intuitive sections. Stopford argues that in event-driven systems, you can often achieve better consistency without distributed transactions — not by weakening consistency, but by changing the system structure.

The core mechanism is the single writer principle: each aggregate or state transition is written by exactly one service, to exactly one topic, at exactly one time. This eliminates the need for distributed transactions because there's no coordination required.

graph TD
    subgraph Single_Writer["Single Writer for Each State Transition"]
        SVC1["Orders Service<br/>(writes to: orders topic)"]
        SVC2["Payments Service<br/>(writes to: payments topic)"]
        SVC3["Shipments Service<br/>(writes to: shipments topic)"]
    end
    subgraph Consumers["Consumers Rebuild Views"]
        V1["Read Model 1"]
        V2["Read Model 2"]
        V3["Analytics View"]
    end

    SVC1 --> T1["orders topic"]
    SVC2 --> T2["payments topic"]
    SVC3 --> T3["shipments topic"]
    T1 --> V1
    T1 --> V2
    T2 --> V1
    T3 --> V3

    style SVC1 fill:#d4edda
    style SVC2 fill:#d4edda
    style SVC3 fill:#d4edda

If two services need to coordinate — say, creating an order and reserving inventory — they use a command topic: one service issues a command, another service processes it asynchronously. The result is effectively a saga, but without explicit saga orchestrator code.

Stopford addresses the elephant in the room: does this actually work? His answer: it works for a remarkably wide range of use cases, and the systems that genuinely need strict distributed transactions tend to be financial systems with regulatory requirements — a well-bounded category.

Chapter 12 covers Kafka transactions, but Stopford is careful to note their limitations: transactions are expensive, they don't scale as well as idempotent consumers, and they don't solve the broader problem of coordinating multiple services. Know when to use them, but prefer simpler patterns when possible.


When NOT to Use Events (24:00–25:30)

One of the most valuable aspects of this book is what it tells you to avoid.

Event-driven architecture is not a universal upgrade. Stopford is explicit: if your system is simple, if your teams are small, if you have a bounded context with a single database, events are complexity you don't need.

The overhead of event-driven architecture:

  • Schema management and compatibility enforcement
  • Consumer group monitoring and lag alerting
  • Topic lifecycle governance (creation, deletion, retention policy)
  • Event replay and data backfill procedures
  • Consumer idempotency design
  • Tracing across asynchronous boundaries
  • Operational knowledge (Kafka, ZooKeeper, Schema Registry, ksqlDB)

If your system is a CRUD API with one database and a small team, use a relational database. It will be simpler, better supported, and easier to operate. Stopford would agree: the simplest system that solves your problem is the right system.


The Organization-Scale View (25:30–27:30)

Chapters 8, 9, and 10 are what make this book stand apart from other EDA and Kafka books.

Most books on Kafka talk about integrating services within a team. Stopford talks about integrating organizations.

He introduces the "God Service" problem: as organizations grow, services accumulate APIs that serve every other service. A payment service starts as a clean API but gradually becomes the payment service, the fraud service, the settlement service, the reporting service. It's now the God Service — everything depends on it.

Event streams solve this by making data a first-class citizen. Rather than the God Service pushing data to fifty downstream systems via API, it publishes to a log. Consumers subscribe independently.

The "Lean Data" chapter is where the rubber meets the road. How do you handle data that consumers need but producers emit inefficiently? How do you avoid the "data divergence problem" where two services have different views of the same entity?

Stopford's approach: build view services that consume the log, derive what they need, cache the result, and rebuild on failure. The log is the source of truth. Views are rebuilt from the log, not duplicated by ETL.


Closing (27:30–28:30)

Designing Event-Driven Systems is a practitioner's guide. It's written by someone who was building the systems he describes — and who's seen the mistakes teams make.

The book's most important contribution is its framing: event-driven architecture is not a technology choice. It's an architectural choice about how your organization shares data across service boundaries.

The log as backbone is the central idea, and it's powerful because it's simple: every important thing that happens gets recorded once, in order. Every service can read and build what it needs. No point-to-point APIs. No nightly ETL. No God services.

The book has limitations. It's Kafka-centric. It assumes you're already operating at the scale where events make sense. It doesn't deeply cover testing or operational patterns.

But if you're a backend engineer building distributed systems, a platform engineer evaluating Kafka, or a technical lead making architecture decisions — this is a book you should read.

I'm BookAtlas. See you in the next one.