Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services

Brendan Burns, Dave Ratner, Kelsey Hightower · 168pp

sufficient

reading path: overview → analysis → narration

overview

Overview

Designing Distributed Systems provides a structured catalogue of repeatable, generic patterns that make the development of reliable distributed systems significantly more approachable. Brendan Burns, director of engineering at Microsoft Azure and co-founder of Kubernetes, adapts established software design patterns to the domain of distributed systems, giving engineers a shared vocabulary and proven building blocks for cloud-native applications.

The book is anchored in container and Kubernetes technology but focuses on patterns that transcend any specific tooling. Each pattern is explained with motivation, trade-offs, hands-on deployment guidance, and a discussion of when it does — and does not — apply.

About the Authors

Brendan Burns is a distinguished engineer at Microsoft Azure and co-founder of the Kubernetes project. His work focuses on distributed systems, container orchestration, and making cloud-native development accessible to a broad audience of engineers.

Dave Ratner is a former Google engineer with experience in large-scale infrastructure and distributed systems design. His contributions bring practical perspective on reliability engineering at massive scale.

Kelsey Hightower is a widely recognized Kubernetes ambassador and Google Cloud Platform expert known for his ability to translate complex infrastructure concepts into practical, developer-friendly guidance.

Key Themes

| Theme | Description | |-------|-------------| | Patterns as Shared Language | Established patterns provide a common vocabulary that dramatically raises the quality of systems discussions and designs | | Containers as Reusable Components | Container technology enables truly reusable, modular components for distributed system building blocks | | Single-Node to Multi-Node | Patterns progress logically from single-container/node abstractions to complex multi-node distributed systems | | Reliability by Design | Each pattern addresses specific failure modes and provides concrete mechanisms for resilience | | Hands-On Kubernetes | Every pattern includes deployable examples using Kubernetes primitives, making the book immediately actionable | | Trade-Off Transparency | The book honestly discusses when each pattern fails or becomes counterproductive |

Reader Profile

Systems engineers and application developers who are building cloud-native applications and need proven patterns for reliability, scalability, and maintainability. Ideal for teams adopting Kubernetes or container orchestration and looking for architectural guidance that goes beyond API documentation.

content map

01 — Content Summary

Designing Distributed Systems (March 2018, 168 pages) is a pattern-first guide to building reliable distributed services. Burns structures the book in three progressive parts — Single-Node Patterns, Serving Patterns, and Batch Computational Patterns — using Kubernetes as the deployment substrate throughout.

Part I: Single-Node Patterns

Single-node patterns wrap a primary application container with co-located helper containers on the same host, using Kubernetes' pod abstraction.

The Sidecar Pattern (Chapter 2)

A sidecar is an auxiliary container deployed in the same pod as the primary application, extending or augmenting its behavior without modifying its code. The canonical example adds HTTPS termination to a legacy HTTP service by placing nginx alongside it.

Key ideas:

Dynamic configuration — sidecars can watch a shared volume or ConfigMap and update the primary application's configuration at runtime without restarting it.
Modular containers — each container does one thing well, enabling independent upgrades, scaling, and replacement.
API discipline — sidecars should expose well-defined interfaces (Unix sockets, localhost HTTP ports) so the primary container can treat them as libraries, not remote network services.
Parameterization — containers should accept configuration via environment variables and command-line arguments so the same image serves many deployments.

Burns also demonstrates building a minimal platform-as-a-service using sidecars: each "application" is simply a pod with a standardized set of sidecar containers for logging, monitoring, and service discovery.

Ambassadors (Chapter 3)

An ambassador container proxies or brokers network access for the primary container, intercepting outbound calls to apply a cross-cutting concern transparently.

Sharding ambassador — placed in front of a backend service, it routes requests to the appropriate shard based on a key (e.g., Redis cluster sharding with consistent hashing).
Service brokering ambassador — abstracts the actual network endpoint so the application can use a stable localhost address while the ambassador resolves the real service location from a registry.
Experimentation / request splitting ambassador — routes a percentage of traffic to an experimental variant and the remainder to the stable version (e.g., 10% experiment roll-out).

Adapters (Chapter 4)

An adapter container transforms the primary application's output into a format consumable by external tooling. Three practical applications:

Monitoring — a Prometheus adapter scrapes application metrics and presents them in the Prometheus exposition format, even if the application natively uses a different schema.
Logging — a Fluentd adapter normalizes heterogeneous log formats into a unified structured schema before forwarding to a central logging pipeline.
Health monitoring — a health-check adapter supplements basic liveness/readiness probes with richer health signals that the application itself does not expose (e.g., disk remaining, downstream dependency health).

Part II: Serving Patterns

Serving patterns address multi-node, horizontally scalable request-serving architectures. These are the patterns most relevant to platforms, APIs, and user-facing services.

Replicated Load-Balanced Services (Chapter 5)

The simplest scalable pattern: multiple identical replicas behind a load balancer that distributes requests round-robin or via least-connections.

Stateless services are trivial to replicate — any replica can serve any request.
Readiness probes prevent the load balancer from routing to replicas that are still initializing.
Session-tracked services use sticky sessions tied to a cookie, routing a user consistently to the same replica. This limits horizontal scale-out and introduces failure-domain complexity.
Caching layers deployed as a separate replicated tier reduce load on the origin database. Different eviction policies (LRU, TTL, write-through) carry different consistency trade-offs.
Rate limiting and DoS defense at the edge (nginx, Envoy) protect backend capacity.
SSL termination at the edge offloads encryption from application containers.

Sharded Services (Chapter 6)

Sharding partitions a dataset or request stream across multiple independent instances based on a key, enabling horizontal scale beyond what a single instance can handle.

Why shard? When a cache or database exceeds the memory or throughput of one machine, replicas alone won't help — each replica holds a full copy. Sharding distributes the data.
Consistent hashing minimizes data movement when the number of shards changes, making shards elastically scalable.
Replicated, sharded caches — each shard is itself replicated for availability; the ambassador or proxy routes to the correct shard-replica pair.
Hot sharding systems allow live resharding with minimal downtime by migrating keys between shards.

Scatter/Gather (Chapter 7)

Scatter/gather is used when a single request logically fan-out to multiple independent data sources and requires a merged result.

Root distribution — a root node fans out requests to all leaf nodes, merges their individual responses, and returns a unified result. Used for document search across sharded indices.
Leaf sharding — the root node also partitions the request space, sending each leaf a different slice of the keyspace; the root then merges partial results. More efficient than root distribution for large result sets.
Choosing the right number of leaves — too few leaves underutilizes parallelism; too many leaves saturates network overhead and the merge coordinator.

Functions and Event-Driven Processing (Chapter 8)

Serverless / function-as-a-service (FaaS) patterns decompose services into discrete functions triggered by events, eliminating the need to manage long-lived servers.

When FaaS makes sense — sporadic, bursty workloads where per-invocation pricing beats always-on capacity for cost, but sustained high-throughput workloads can actually become more expensive.
Decorator pattern — a function that wraps the primary handler to apply cross-cutting logic (authentication, request validation, defaulting) before and after invocation.
Hands-on examples — implement two-factor authentication as a decorator; implement a new-user signup event pipeline.
Event-based pipelines — chain functions via message queues (Pub/Sub, Kafka) so each stage in a workflow is independently scalable and replaceable.

Ownership Election (Chapter 9)

Leader election ensures exactly one replica acts as coordinator for a shared resource at any time, preventing split-brain scenarios.

Do you need it? — often a distributed lock (e.g., etcd, Consul) is simpler than full leader election. Burns argues for minimizing coordination: is there a way to avoid the singleton bottleneck entirely?
Implementation with etcd — uses atomic compare-and-swap (CAS) operations to acquire a lock. Leases ensure locks expire if the holder crashes.
Handling concurrent data manipulation — once elected, the leader must read-modify-write shared state atomically, typically using etcd transactions.

Part III: Batch Computational Patterns

Batch patterns handle large-scale, non-real-time data processing where throughput and fault-tolerance matter more than latency.

Work Queue Systems (Chapter 10)

A work queue decouples task submission from task execution. A "source container" produces tasks onto a durable queue; multiple "worker containers" consume and process them.

Dynamic scaling — workers can be added or removed based on queue depth, making the system elastic to workload.
Multi-worker pattern — multiple workers compete for tasks, providing natural load balancing and fault tolerance (if one worker crashes, others pick up its tasks).
Hands-on example — a video thumbnail generation pipeline: source container uploads video metadata to a queue; workers pull jobs and generate thumbnails.

Event-Driven Batch Processing (Chapter 11)

Five canonical event-processing topologies applied to batch data pipelines:

| Pattern | Role | |---|---| | Copier | Duplicates a stream to multiple downstream consumers | | Filter | Passes through only events matching a predicate | | Splitter | Routes each event to the correct destination stream by type | | Sharder | Partitions events by key into multiple streams | | Merger | Joins multiple streams into one ordered stream |

Hands-on: deploying Kafka and building a new-user sign-up pipeline that chains copier → filter → sharder.

Coordinated Batch Processing (Chapter 12)

For operations that require synchronization across many parallel workers:

Join (barrier synchronization) — all workers must complete before the next stage begins. Used to assemble a final result from parallel computation fragments.
Reduce — workers compute partial aggregates; a final reduction step combines them into a single value (sum, count, histogram).
Hands-on examples — an image tagging pipeline using join synchronization; count, sum, and histogram reduction patterns.

Conclusion (Chapter 13)

Burns closes by arguing that containers and orchestration platforms have made these patterns practical at scale for the first time. The community's next challenge is standardizing the patterns themselves, not just the tools, so that future systems architecture can build on a shared foundation.

analysis

02 — Analysis

This document analyses the core design patterns introduced in Designing Distributed Systems, their structural rationale, trade-offs, and applicability criteria.

1. Sidecar Pattern

graph LR
    A["Primary<br/>Container"] <-.-> B["Sidecar<br/>Container<br/>(HTTPS / logging / metrics)"]
    A -.->|localhost| B
    style A fill:#4A90D9,color:#fff
    style B fill:#7B68EE,color:#fff

Structural rationale. A Kubernetes pod can hold multiple containers sharing a network namespace and volumes. The sidecar exploits this to extend the primary container's behavior at the network boundary without changing its code. The primary application sees only a localhost endpoint — the sidecar's presence is entirely transparent.

Trade-offs:

✅ Zero coupling to application code — the same image works unchanged across environments.
✅ Sidecars are independently upgradeable, scalable, and replaceable.
❌ Pod resource footprint grows with each additional sidecar.
❌ Debugging inter-container communication requires Kubernetes-level tooling.
❌ More than 2–3 sidecars per pod creates startup and operational complexity.

Applicability: Adding HTTPS termination, centralized logging, or metrics collection to a legacy or third-party application. Avoid stacking sidecars beyond practical limits.

2. Ambassador Pattern

graph LR
    A["Primary<br/>Container"] -->|localhost| B["Ambassador<br/>Proxy"]
    B -->|sharding / broker / split| C["Backend<br/>Shard 1"]
    B -->|routing| D["Backend<br/>Shard 2"]
    B -->|traffic split| E["Backend<br/>Shard N"]
    style A fill:#4A90D9,color:#fff
    style B fill:#E94B3C,color:#fff

Structural rationale. The ambassador intercepts outbound calls from the primary container, applying policies like sharding, service brokering, or A/B traffic splitting transparently. The application uses a stable localhost address regardless of backend changes.

Trade-offs:

✅ Abstracts network topology; applications are insulated from backend restructuring.
✅ Enables canary rollouts and experimentation at the infrastructure layer, not the application layer.
❌ Adds a network hop — latency-sensitive workloads may suffer measurable degradation.
❌ Ambassador state (shard maps, routing tables) lags behind the backend in eventually consistent systems, causing transient misrouting.
❌ Debugging misrouted requests adds layers to trace through.

Applicability: Multi-tenant SaaS routing, sharded external APIs, experimentation frameworks. Avoid for performance-critical internal service calls where direct connection is faster.

3. Adapter Pattern

graph LR
    A["Primary<br/>Container"] --> B["Adapter<br/>Container"]
    B -->|Prometheus<br/>format| C["Prometheus<br/>Server"]
    B -->|structured<br/>logs| D["Fluentd /<br/>Log Aggregator"]
    B -->|health<br/>signal| E["Kubernetes<br/>Health API"]
    style A fill:#4A90D9,color:#fff
    style B fill:#27AE60,color:#fff

Structural rationale. The adapter performs format translation between the application's native observability output and the format expected by external infrastructure. This keeps business logic cleanly separated from monitoring, logging, and health concerns.

Trade-offs:

✅ Swap monitoring or logging backends without recompiling or redeploying the application.
✅ Treats heterogeneous legacy applications uniformly in a single observability pipeline.
❌ Adapter failures silently break observability while the application continues running — a dangerous failure mode.
❌ Adapter code must be maintained as a separate artifact with its own release cycle.

Applicability: Integrating legacy applications with modern monitoring stacks, normalizing heterogeneous log formats, supplementing Kubernetes health probes. Avoid for new applications that can emit standard formats natively.

4. Replicated Load-Balanced Services

graph LR
    LB["Load Balancer<br/>(round-robin)"] --> R1["Replica 1"]
    LB --> R2["Replica 2"]
    LB --> R3["Replica N"]
    R1 -.->|sticky session| S1[("Session Store")]
    R2 -.->|sticky session| S1
    style LB fill:#F39C12,color:#fff
    style R1 fill:#4A90D9,color:#fff

Structural rationale. Identical replicas behind a load balancer distribute incoming requests. Readiness probes prevent the balancer from routing to initializing or failing replicas. Failures are absorbed by surviving replicas.

Trade-offs:

✅ Near-linear horizontal scalability for stateless workloads.
✅ Universally understood by cloud engineers — low cognitive overhead for new team members.
❌ Sticky sessions for stateful workloads reduce horizontal scale-out and create hot-spot risk.
❌ Caching layers on individual replicas create invalidation complexity; a shared cache tier adds infrastructure cost.
❌ Stateless sessions are fine; stateful sessions require careful external state design.

Applicability: REST APIs, web frontends, API gateways, any stateless request-serving tier. Avoid for stateful workloads without an externalized session store.

5. Sharded Services

graph LR
    P["Sharding<br/>Proxy / Ambassador"] -->|key → hash| S1["Shard 1<br/>(Replicated × 3)"]
    P -->|key → hash| S2["Shard 2<br/>(Replicated × 3)"]
    P -->|key → hash| S3["Shard N<br/>(Replicated × 3)"]
    subgraph "Shard 1"
    S1 --> R1a["Replica 1a"]
    S1 --> R1b["Replica 1b"]
    S1 --> R1c["Replica 1c"]
    end
    style P fill:#E94B3C,color:#fff

Structural rationale. Partition the dataset by key so each shard holds a disjoint subset. Combined with replication within each shard, the architecture achieves both horizontal capacity and intra-shard fault tolerance. Consistent hashing minimizes key migration when shards are added or removed.

Trade-offs:

✅ Total capacity scales linearly with shard count; no single-machine limit.
✅ Consistent hashing enables elastic resharding with minimal data movement.
❌ Uneven key distribution creates hot shards that become throughput bottlenecks — the most common production failure mode.
❌ No cross-shard transactions: every operation targets exactly one shard.
❌ The sharding proxy/coordinator is itself a potential SPOF unless replicated.

Applicability: Large distributed caches (Memcache, Redis clusters), partitioned databases, time-series stores, geo-distributed partitioned datasets. Avoid when the dataset fits on a single node, or when multi-shard atomicity is required.

6. Scatter/Gather

graph TD
    Q["Client Request"] --> R["Root Coordinator"]
    R -->|fan-out| L1["Leaf 1"]
    R -->|fan-out| L2["Leaf 2"]
    R -->|fan-out| L3["Leaf N"]
    L1 -.->|partial result| R
    L2 -.->|partial result| R
    L3 -.->|partial result| R
    R -->|merged result| Q
    style R fill:#F39C12,color:#fff
    style L1 fill:#4A90D9,color:#fff

Structural rationale. The root coordinator fans out independent sub-queries to N leaves, each holding a portion of the total dataset, then merges their partial results. Two variants: root distribution (full query to every leaf) and leaf sharding (keyspace partitioned, partial results merged). Leaf sharding is more efficient for large result sets.

Trade-offs:

✅ Enables large-scale read aggregation without a centralized global index or store.
✅ Leaves are independently scalable and replaceable.
❌ Response latency is bounded by the slowest straggling leaf — outlier nodes dominate P99 latency.
❌ Partial leaf failures must be handled explicitly: does the merge succeed with degraded results, or does the entire request fail?
❌ Too many leaves per root exhausts root threads and saturates network connections (fan-out storm).

Applicability: Distributed search (Elasticsearch), parallel aggregations, multi-region reads. Avoid for OLTP requiring strong consistency or sub-millisecond SLA.

7. Functions as a Service (FaaS)

graph LR
    E["Event<br/>(HTTP / Cron / Pub/Sub)"] --> F1["Auth<br/>Decorator"]
    F1 --> F2["Validation<br/>Decorator"]
    F2 --> F3["Business<br/>Logic"]
    F3 --> S[("External State /<br/>Database / Redis")]
    style F1 fill:#27AE60,color:#fff
    style F2 fill:#27AE60,color:#fff
    style F3 fill:#27AE60,color:#fff
    style S fill:#E94B3C,color:#fff

Structural rationale. FaaS decomposes services into ephemeral, stateless functions triggered by events. The platform manages scaling, scheduling, and recovery. The decorator pattern chains middleware functions (auth, validation, transformation) around a core handler, analogous to the sidecar stack.

Trade-offs:

✅ Zero infrastructure management; automatic scale-to-zero; pay-per-invocation pricing.
✅ Natural event-driven microservice decomposition; each function is independently deployable.
❌ Cold-start latency (100ms–seconds depending on runtime) rules out latency-sensitive interactive paths.
❌ Sustained high-throughput workloads can be more expensive than always-on containers.
❌ No local state between invocations — all state must be externalized.
❌ SDKs, trigger syntax, and deployment models are vendor-specific — real lock-in risk.

Applicability: Background event processing, scheduled batch tasks, bursty API backends, webhook handlers. Avoid for hot-path request processing or streaming workloads.

8. Ownership Election (Leader Election)

graph TD
    subgraph "Contenders"
    C1["Replica 1"]
    C2["Replica 2"]
    C3["Replica 3"]
    end
    C1 -->|CAS acquire| D[("etcd<br/>(Distributed Lock)"]
    C2 -->|try acquire| D
    C3 -->|try acquire| D
    D -.->|lease held| C1
    style C1 fill:#27AE60,color:#fff
    style C2 fill:#4A90D9,color:#fff
    style C3 fill:#4A90D9,color:#fff
    style D fill:#E94B3C,color:#fff

Structural rationale. Exactly one replica must coordinate access to a shared resource at any time. etcd provides atomic compare-and-swap and lease primitives: the first contender to CAS a lock key wins; a lease with TTL ensures the lock self-releases if the leader crashes. Replicas watch the key and race to acquire when it becomes available.

Trade-offs:

✅ Prevents split-brain and conflicting state mutations.
✅ Automatic failover on crash — lease expiry triggers a new election within the lease TTL window.
❌ Coordination service (etcd) is itself a critical component requiring a 3- or 5-node quorum cluster.
❌ The leader is an inherent throughput bottleneck — all writes route through exactly one node.
❌ Failover time is bounded by the lease TTL, typically seconds — not instantaneous.
❌ The most important design question: can the problem be solved without any coordination at all?

Applicability: Exactly-once schedulers, primary/replica database failover, singleton cron jobs. Prefer coordination-free alternatives (CRDTs, quorum reads, optimistic concurrency) wherever possible.

9. Work Queue Systems

graph LR
    S["Source<br/>Container"] -->|produce tasks| Q[("Durable Queue<br/>(Redis / Kafka)")]
    Q -->|competing consumers| W1["Worker 1"]
    Q -->|competing consumers| W2["Worker 2"]
    Q -->|competing consumers| W3["Worker N"]
    style Q fill:#E94B3C,color:#fff
    style S fill:#F39C12,color:#fff
    style W1 fill:#27AE60,color:#fff

Structural rationale. A durable queue decouples task production from task execution. Workers compete for tasks in a competing-consumers pattern. Failed tasks are re-queued automatically, making the system naturally fault-tolerant. Queue depth drives dynamic worker scaling.

Trade-offs:

✅ Natural load leveling; absorbs demand spikes without overwhelming workers.
✅ At-least-once delivery; workers must be idempotent, but the system does not lose tasks on failure.
❌ Queue is itself a critical HA component — queue failure halts the entire pipeline.
❌ Monitoring queue depth, consumer lag, and dead-letter queues adds operational overhead.
❌ Exactly-once semantics require idempotency + deduplication logic in the consumer.

Applicability: Video/audio transcoding, email delivery, report generation, any asynchronous background processing. Avoid for real-time request/response interaction.

10. Coordinated Batch Processing (Join / Reduce)

graph TD
    W1["Worker 1<br/>Partial"] --> J["Join /<br/>Barrier"]
    W2["Worker 2<br/>Partial"] --> J
    W3["Worker 3<br/>Partial"] --> J
    J --> R["Reduce<br/>(Sum / Count / Histogram)"]
    R --> OUT["Final<br/>Aggregate"]
    style J fill:#F39C12,color:#fff
    style R fill:#E94B3C,color:#fff

Structural rationale. Join (barrier synchronization) ensures all workers complete a stage before any advances. Reduce combines partial worker outputs into a single result. Both are fundamental to parallel batch computation where correctness depends on total ordering or total aggregation.

Trade-offs:

✅ Horizontal scaling to thousands of workers with linear throughput improvement for embarrassingly parallel sub-tasks.
✅ Reduce is associative and commutative — partials can be combined in any order, enabling flexible pipeline topologies.
❌ Barrier stages serialize the pipeline — all workers must wait for the slowest completion. Stragglers dominate wall-clock time.
❌ Barrier failure (e.g., a worker crashes mid-barrier) requires explicit timeout and recovery logic; without it, the entire pipeline deadlocks.
❌ Reduce correctness depends on the operation being truly associative — floating-point addition is not, leading to subtle precision errors at scale.

Applicability: MapReduce-style data processing, machine learning training data aggregation, histogram computation, large-scale ETL joins. Avoid when per-record latency is required (use streaming instead).

Cross-Pattern Observations

| Pattern | Coordination Needed | Primary Failure Mode | Key Enabling Technology | |---------|--------------------|--------------------|-----------------------| | Sidecar | None | Adapter failure → silent observability loss | Kubernetes pod abstraction | | Ambassador | Minimal | Stale routing state → transient misrouting | Localhost proxy + shared config | | Adapter | None | Adapter crash → blind observability gap | Container ephemerality | | Replicated LB | Stateless: none | Cascading replica failure | Load balancer + readiness probes | | Sharded Services | Shard map (coordination light) | Hot shard / key skew | Consistent hashing | | Scatter/Gather | None (logical) | Straggler leaf → high P99 | Asynchronous fan-out + merge | | FaaS | None (per-invocation) | Cold start / throttling / lock-in | Cloud platform | | Leader Election | etcd quorum | Split brain (without election) | etcd CAS + lease primitives | | Work Queue | Queue HA | Queue itself is SPOF | Durable message broker | | Event Batch Topologies | None (decoupled) | Dead-letter / pipeline stall | Kafka / message bus | | Coordinated Batch | Barrier + reduce leader | Straggler / deadlock | Distributed barrier + reduce |

narration

03 — Narration Script

A voice-first reading script covering all chapters, organized as timed narration segments with cue annotations for visualization and pacing breaks.

[OPENER — 0:00–0:45]

Narrator: If you've ever tried to build a distributed system from scratch — really build one, not just deploy a few microservices — you know it's brutal. Every failure mode you never considered suddenly becomes real. Latent networks. Partial failures. Split brains. And there's no textbook to follow, no catalogue of known solutions, no shared vocabulary for talking about what you just built. Or there wasn't, until this book.

Visual: Animated diagram showing a single machine → client-server → microservices mesh → Kubernetes cluster, with failure events pulsing red at each stage.

[CHAPTER 1: INTRODUCTION — 0:45–2:00]

Narrator: Brendan Burns opens with the history of patterns in software. Object-oriented programming, the Gang of Four, open source components — each wave gave developers a shared language and reusable building blocks. Distributed systems haven't had that. Until containers arrived and made it possible to package, version, and compose infrastructure primitives the way we compose functions.

Narrator: The book's central thesis is simple: the most powerful thing you can do is give engineers a name for what they're building. A name, a shape, a set of trade-offs everyone agrees on. That's what a pattern is. And that's what this book provides — in abundance.

Visual: Timeline from 1960s algorithms → 1990s OOP patterns → 2010s container orchestration → present Kubernetes ecosystem.

[PART I: SINGLE-NODE PATTERNS — 2:00–6:00]

Chapter 2: Sidecar — [2:00–3:00]

Narrator: The sidecar pattern is the book's foundational concept. Imagine you have a legacy service that only understands plain HTTP. You can't change its code — it's a third-party binary, or it's running in production and you can't afford to touch it. The sidecar sits alongside it in the same Kubernetes pod, intercepts incoming HTTPS traffic, decrypts it, and forwards plain HTTP to the primary container. The service doesn't know HTTPS exists.

Narrator: The deeper insight here is modularity. Each container is a separately versioned artifact. The sidecar can be upgraded independently of the application. It can be reused across dozens of different applications without writing a line of their code. And critically, you can swap the entire stack just by editing a YAML manifest.

Visual: Pod diagram showing primary container + nginx sidecar, traffic flow annotation, ConfigMap volume as shared configuration surface.

Stop cue: [pause for chapter summary card]

Chapter 3: Ambassadors — [3:00–4:15]

Narrator: Where the sidecar extends the primary container's behavior, the ambassador intercepts and transforms its outbound calls. The example that makes this click: sharded Redis. Your application doesn't know about shards. It sends all requests to localhost:6379. The ambassador reads a key from the request, hashes it to determine the correct shard, and forwards the request there. From the application's view, it's talking to a single Redis instance. It has no idea there are twelve of them behind the scenes.

Narrator: The same ambassador pattern enables service brokering — abstracting the real network endpoint behind a stable local address. Or experimentation: route 10% of traffic to a new version without touching application code. The ambassador learns what the target is from the platform; the application just trusts localhost.

Visual: Animated request flow from application → ambassador → shard selection → backend cluster. Show 10% traffic split highlighted in green.

Chapter 4: Adapters — [4:15–5:30]

Narrator: The adapter is about translation — taking the application's native output format and converting it to what the outside world needs. Three concrete examples make this concrete.

Narrator: First, monitoring. Your app emits metrics in its own proprietary format. A Prometheus adapter sits alongside it, scrapes those metrics on a schedule, reformats them into the Prometheus exposition format, and serves them on a standard metrics endpoint. You've given a legacy application modern observability without rewriting a single line of its code.

Narrator: Second, logging. A Fluentd adapter normalizes log lines from twenty different application formats into a single structured schema before shipping to your central log store. Individual applications can log however they want. The adapter makes it uniform.

Narrator: Third and subtlest: health monitoring. Kubernetes probes only know about HTTP status codes and TCP connections. What if your application is healthy syntactically but is silently degrading — a disk filling up, a downstream dependency timing out? A health adapter adds richer probes that the application itself never exposes.

Visual: Three adapters shown as sidecars, each with a different arrow color pointing to a different external system (Prometheus, Fluentd, Kubernetes API).

Stop cue: [pause for Part I summary card]

[PART II: SERVING PATTERNS — 6:00–15:00]

Chapter 5: Replicated Load-Balanced Services — [6:00–7:30]

Narrator: Part II moves to multi-node patterns. Starting with the simplest possible scalable architecture: replicated load-balanced services. Identical copies of your application behind a load balancer that routes requests round-robin. Add readiness probes so the load balancer knows not to send traffic to a pod that's still warming up. This is boring. And that's the point.

Narrator: Boring is good. Boring scales. The chapter covers the details that actually matter: stateless vs. session-aware services, when to use sticky sessions and why they're usually a mistake, deploying a caching layer as a separate tier, rate limiting at the edge, and SSL termination with nginx. All of these are infrastructure decisions you will face in production regardless of which language stack you use.

Visual: Load balancer → N replicated pods → optional cache tier diagram. Animated request flow with cache hit/miss paths highlighted.

Chapter 6: Sharded Services — [7:30–9:45]

Narrator: Replication gives you redundancy. Sharding gives you capacity. When a dataset is too large to fit on a single machine, replication doesn't help — each replica holds a full copy. Sharding partitions the data. Key-based hashing routes each request to the shard that owns its data. Now your total capacity scales linearly with shard count.

Narrator: Consistent hashing is the innovation that makes sharding practical. In simple modular hashing, adding one new shard means almost every key moves to a different shard — a massive cache invalidation storm. Consistent hashing minimizes this migration. Add or remove shards, and only a small fraction of keys need to move. This is what makes elastic sharding possible.

Narrator: The pattern in practice: an ambassador proxy sits in front of a sharded Memcache cluster. The ambassador holds the shard map, receives requests, hashes the key, and routes to the correct shard. The application talks to a single localhost address. It is completely unaware that it's using a sharded distributed cache. This is the power of the ambassador pattern applied at scale.

Visual: Consistent hashing ring animation showing key → shard mapping, then ring expansion showing minimal key migration. Shard diagram: proxy → shard-0 (×3 replicas), shard-1 (×3 replicas), shard-N (×3 replicas).

Chapter 7: Scatter/Gather — [9:45–12:00]

Narrator: Scatter/gather is the answer when a single query needs to fan out across many independent data sources and return a merged result. Think: a search query that must check dozens of sharded Elasticsearch indices, or an analytics query aggregating data across multiple regional databases.

Narrator: There are two flavors. Root distribution sends the full query to every leaf node — each leaf independently computes its partial result. The root merges them and returns the final answer. Leaf sharding is more efficient: the root partitions the keyspace and sends each leaf only the keys it owns, then merges partials.

Narrator: The critical design question is: how many leaves? Too few leaves and you underutilize parallelism. Too many leaves and the root becomes a bottleneck — connection storms, thread exhaustion, garbage collection pressure. The right answer depends on your per-leaf latency, your merge overhead, and your target P99. This is a system you must empirically tune, not design once and forget.

Visual: Scatter/gather request flow. Root splits key range → fan-out to N leaves → partial results converge at root → merged response. Show straggler leaf delaying merge as a labeled warning.

Stop cue: [pause for Part II midpoint summary card]

Chapter 8: Functions and Event-Driven Processing — [12:00–13:30]

Narrator: FaaS — Functions as a Service — promises to eliminate server management entirely. Deploy a function, the platform handles scaling, scheduling, and failure recovery. You pay only when the function runs. For sporadic, bursty workloads this is genuinely transformative.

Narrator: But Burns is honest about the trade-offs. Cold-start latency makes FaaS unsuitable for interactive, latency-sensitive paths. Sustained high-throughput can actually be more expensive than always-on containers. And you cannot hold state in memory between invocations — every function must treat itself as ephemeral and reach out to external storage for everything.

Narrator: The decorator pattern is the practical takeaway. A chain of small functions — auth decorator, validation decorator, business logic function — each independently deployable and replaceable. The same pattern that works for Kubernetes sidecars works for serverless function composition. The medium changes. The pattern does not.

Visual: Event → decorator chain (auth → validate → business logic) → external state store. Show ephemeral function instances being created and destroyed per invocation.

Chapter 9: Ownership Election — [13:30–14:45]

Narrator: Leader election exists because some problems fundamentally require a single, authoritative coordinator. A master database node. A singleton scheduler. A primary API serving writes. Having two leaders simultaneously means split brain — conflicting writes, corrupted state, the kind of failure that's agonizing to debug in production at 2 AM.

Narrator: The implementation via etcd uses atomic compare-and-swap: the first contender to write its identity to a lock key wins. A lease ensures the lock expires if the leader crashes — no manual cleanup needed. Replicas watch the lock key and race to acquire it when it becomes available again.

Narrator: But Burns delivers his most important advice in this chapter: before you build a leader election system, ask if you can avoid needing a leader at all. Optimistic concurrency, CRDTs, quorum reads — often you can. Every coordination layer you add is a potential source of unavailability. The best coordination is no coordination.

Visual: etcd lock acquisition race animation. Three replicas competing. One acquires CAS. Lease countdown timer. On expiry: new election, new leader wins.

[PART III: BATCH COMPUTATIONAL PATTERNS — 15:00–20:00]

Chapter 10: Work Queue Systems — [15:00–16:00]

Narrator: Batch processing is where distributed systems really earn their keep. The work queue pattern is its simplest and most powerful form. A source container produces tasks onto a durable queue. Workers compete for tasks. If a worker crashes mid-job, the task is retried by another worker. There's no coordination beyond the queue itself.

Narrator: Dynamic scaling is the critical advantage. When the queue depth rises above a threshold, launch more workers. When it drops, terminate workers. The system absorbs demand spikes without manual intervention. Burns demonstrates this with a video thumbnail generation pipeline — upload the video, queue the jobs, workers transcode and watermark at whatever scale the queue demands.

Visual: Task queue with depth meter. Workers scale in/out based on queue depth. Show failed task being re-queued and picked up by a different worker.

Chapter 11: Event-Driven Batch Processing — [16:00–17:15]

Narrator: Five topologies — Copier, Filter, Splitter, Sharder, Merger — cover virtually every batch data flow you'll encounter. These aren't just abstract shapes. Burns implements each in Kafka and connects them into a working new-user signup pipeline.

Narrator: The copier duplicates a stream for multiple independent downstream consumers — your event pipeline fan-out without writing custom routing logic. The filter applies a predicate and passes only matching events. The splitter routes events by type into separate streams. The sharder partitions by key. The merger recombines streams in a defined order.

Visual: Animated pipeline: source stream → copier → filter → splitter → per-type stream → sharder → merger → destination. Each stage labeled and colored distinctly.

Chapter 12: Coordinated Batch Processing — [17:15–18:30]

Narrator: Sometimes you need more than a queue. You need workers to coordinate. Two patterns: join (barrier synchronization) and reduce.

Narrator: Join — all workers must complete their assigned stage before any of them can proceed to the next stage. This is barrier synchronization. Your image tagging pipeline assigns each image to a different ML model worker; only when every classification is complete do you assemble the final metadata record.

Narrator: Reduce — workers compute partial aggregates. A final reduction step combines them into a single value. Sum, count, histogram. The sum of partial sums is the total sum. The math works. The pattern scales horizontally to thousands of workers. The example in the book uses histogram generation: each worker counts its local bucket frequencies; a final merger combines the bucket counts.

Visual: Parallel workers → barrier → merged output. Reduce diagram: worker partials (P1…PN) → reduce step → final aggregate. Show histogram bucket merge animation.

Chapter 13: Conclusion — [18:30–19:00]

Narrator: Burns closes with a forward-looking argument: containers and Kubernetes have made the infrastructure layer commoditized and stable. The next frontier for the industry is standardizing the patterns layer. Not reinventing how to run a container, but agreeing on what a sharded service looks like, naming it, specifying it, and building components that implement it correctly the first time.

Narrator: In other words: the tools have caught up to the problems. Now it's our job to document and teach the problems well enough that the next generation of engineers doesn't have to rediscover every failure mode from scratch.

Visual: Pattern catalogue grows from single = sharded = batch. Fade to book cover with call-to-action: "Stand on the shoulders of giants."

[OUTRO / KEY TAKEAWAY — 19:00–19:45]

Narrator: The single most important pattern in this entire book isn't sidecar, or sharding, or scatter/gather. It's the idea that naming a solution makes it reusable. That a shared vocabulary for distributed systems — real, operational vocabulary that engineers can agree on — is worth more than any individual implementation. Because once everyone calls the same shape by the same name, you stop reinventing it. And that's exactly what this book does.

Narrator: Whether you're building your first Kubernetes service or your hundredth, Designing Distributed Systems is the reference to have at your desk. Not a cookbook. A pattern language. And that's the difference.

Overview

About the Authors

Table of Contents

Key Themes

Reader Profile

01 — Content Summary

Part I: Single-Node Patterns

The Sidecar Pattern (Chapter 2)

Ambassadors (Chapter 3)

Adapters (Chapter 4)

Part II: Serving Patterns

Replicated Load-Balanced Services (Chapter 5)

Sharded Services (Chapter 6)

Scatter/Gather (Chapter 7)

Functions and Event-Driven Processing (Chapter 8)

Ownership Election (Chapter 9)

Part III: Batch Computational Patterns

Work Queue Systems (Chapter 10)

Event-Driven Batch Processing (Chapter 11)

Coordinated Batch Processing (Chapter 12)

Conclusion (Chapter 13)

02 — Analysis

1. Sidecar Pattern

2. Ambassador Pattern

3. Adapter Pattern

4. Replicated Load-Balanced Services

5. Sharded Services

6. Scatter/Gather

7. Functions as a Service (FaaS)

8. Ownership Election (Leader Election)

9. Work Queue Systems

10. Coordinated Batch Processing (Join / Reduce)

Cross-Pattern Observations

03 — Narration Script

[OPENER — 0:00–0:45]

[CHAPTER 1: INTRODUCTION — 0:45–2:00]

[PART I: SINGLE-NODE PATTERNS — 2:00–6:00]

Chapter 2: Sidecar — [2:00–3:00]

Chapter 3: Ambassadors — [3:00–4:15]

Chapter 4: Adapters — [4:15–5:30]

[PART II: SERVING PATTERNS — 6:00–15:00]

Chapter 5: Replicated Load-Balanced Services — [6:00–7:30]

Chapter 6: Sharded Services — [7:30–9:45]

Chapter 7: Scatter/Gather — [9:45–12:00]

Chapter 8: Functions and Event-Driven Processing — [12:00–13:30]

Chapter 9: Ownership Election — [13:30–14:45]

[PART III: BATCH COMPUTATIONAL PATTERNS — 15:00–20:00]

Chapter 10: Work Queue Systems — [15:00–16:00]

Chapter 11: Event-Driven Batch Processing — [16:00–17:15]

Chapter 12: Coordinated Batch Processing — [17:15–18:30]

Chapter 13: Conclusion — [18:30–19:00]

[OUTRO / KEY TAKEAWAY — 19:00–19:45]