booklore

Web Scalability for Startup Engineers

Tips & Techniques for Scaling Your Web Application

sufficient

reading path: overview → analysis → narration


overview

Overview

Web Scalability for Startup Engineers (McGraw-Hill, 2015) by Artur Ejsmont is a practical, opinionated handbook for engineers who need to take a web application from prototype to high-traffic production without rewriting it three times in the process. Ejsmont is a working software engineer and scalability consultant who has built and scaled systems at Yahoo!, Amazon, and a string of high-growth startups. The book reads like the long-form version of advice he has given junior engineers over a decade: do this, avoid that, here is why, here is the order to do it in.

Unlike most scalability books — which target enterprise architects with infinite budgets and dedicated platform teams — Ejsmont writes for the four-person engineering team with a tiny AWS bill and a roadmap full of features. He repeatedly chooses pragmatic, boring solutions over fashionable ones. The book is the rare scaling text that takes startup constraints seriously.



Key Takeaways

  1. Scalability ≠ performance ≠ availability. They overlap but are not the same. Optimize for the one that is actually constraining you. A fast system that cannot add capacity is not scalable. A scalable system that is always down is not available.

  2. Statelessness is the foundation. Once any service holds state in process memory — sessions, counters, in-flight uploads — you have created a node that cannot be replaced or load-balanced. Push state to Redis, the database, or the client.

  3. Cache at every layer, outermost first. A browser cache hit costs nothing. A CDN hit costs almost nothing. A database query costs disk, CPU, and network. The cheapest scaling decision is to serve the request as close to the user as possible.

  4. Asynchronous processing is the cheapest perf win. Sending email, generating thumbnails, indexing search — anything the user does not need to see complete — belongs on a queue. Synchronous request handlers should do the minimum and return.

  5. Data partitioning is the hard one. Stateless services scale by adding machines. Caches scale by adding nodes. Databases scale by sharding — and sharding requires choosing a key, planning a re-shard path, and accepting cross-shard query limitations. Plan early; execute when load demands.


Who Should Read

| Reader Type | Why | |---|---| | Startup backend engineers | Concrete scaling playbook calibrated to small teams and limited budgets | | Full-stack developers moving to production | Bridges the gap between localhost demos and systems that survive traffic spikes | | Tech leads at early-stage companies | A defensible decision framework for architecture reviews | | Junior engineers studying system design | One coherent mental model rather than scattered blog posts | | Engineering managers | Vocabulary and reasoning patterns to evaluate proposals from the team |


Why This Book Matters

By the mid-2010s the web-scaling literature was dominated by enterprise tomes and conference talks from FAANG-scale companies. The advice — Kubernetes, service meshes, Cassandra clusters, polyglot persistence, event sourcing — was correct for Google but actively harmful for a five-person startup. Engineers applied it anyway, producing systems that took six months to ship a CRUD form.

Ejsmont's book pushed back. It argued that 95% of growing companies need the same handful of techniques: stateless services behind a load balancer, a Redis cache, a sharded relational database, an async queue, and a CDN. Done well, that stack scales to tens of millions of users. Anything more elaborate is a tax on velocity that startups cannot afford.

A decade later the advice has aged well. Microservices fatigue, "monolith-first" advocacy, and the rise of boring-tech manifestos all echo Ejsmont's core argument: scalability is about discipline, not sophistication.


| Book | Author | Connection | |---|---|---| | Designing Data-Intensive Applications | Martin Kleppmann | The deeper, theory-heavy companion. Where Ejsmont says "shard your database," Kleppmann explains the consensus protocols and consistency models that make sharding work. Read Ejsmont first, then Kleppmann. | | Release It! | Michael Nygard | Complements Ejsmont on the operational side: circuit breakers, bulkheads, timeouts, capacity planning. Together they cover scaling and stability. | | The Art of Scalability | Martin L. Abbott, Michael T. Fisher | Covers similar ground but from a more managerial angle — the AKF Scale Cube, organisational scaling. Ejsmont is more engineer-focused and code-adjacent. | | Building Microservices | Sam Newman | The reference for service-oriented architectures. Useful after Ejsmont, when you have justified the split. Ejsmont is appropriately skeptical of premature microservices. | | Site Reliability Engineering | Google SRE team | The enterprise endpoint of the spectrum. SRE describes scaling at Google scale with a dedicated platform organisation. Ejsmont describes scaling without one. |


Final Verdict

Web Scalability for Startup Engineers is not the deepest book on distributed systems, the most current on cloud platforms, or the most theoretically rigorous. It is, however, the most useful single book a startup backend engineer can read before their first traffic spike. Its strength is its restraint: Ejsmont consistently chooses the simpler solution and explains why the more elaborate one would be premature. Its weakness is that the specific technology references (memcached versions, EC2 instance types, particular NoSQL stores) have aged faster than the principles.

Rating: 8/10 — A genuinely lean scalability handbook. Skip it if you already run a platform team; read it twice if you are about to ship your first product.


content map

What Scalability Actually Means

Ejsmont opens with a sharp distinction that the rest of the book depends on. Three properties get confused constantly:

| Property | Definition | Failure mode | |---|---|---| | Performance | How fast a single request completes | Slow page loads | | Availability | What fraction of requests succeed | Outages, 5xx errors | | Scalability | The ability to absorb more load without proportional cost or redesign | System collapses under traffic |

A system can be fast and unavailable (a sports car that will not start). It can be available and unscalable (handles 100 requests per second perfectly, dies at 101). It can be scalable but slow (every request takes 4 seconds, but the cluster grows linearly with load).

Ejsmont's working definition of scalability:

The ability of a system to accommodate growth — in users, data, traffic, complexity — without a proportional increase in cost or a ground-up redesign.

The crucial phrase is without redesign. A system that requires a rewrite at every 10x of growth is not scalable; it is a sequence of unscalable systems.


The Reference Architecture

The book is organised around a layered reference architecture that Ejsmont treats as the default starting point for any web product. It is deliberately boring. Every component is replaceable.

flowchart TB
    User["Users / Mobile clients"]

    subgraph Edge["Edge layer"]
        CDN["CDN<br/>(static assets, edge cache)"]
        LB["Load balancer<br/>(L7, terminates TLS)"]
    end

    subgraph FE["Front-end layer (stateless)"]
        FE1["Front-end<br/>server 1"]
        FE2["Front-end<br/>server 2"]
        FE3["Front-end<br/>server N"]
    end

    subgraph Svc["Web services layer (stateless)"]
        SV1["Service A"]
        SV2["Service B"]
        SV3["Service C"]
    end

    subgraph Async["Asynchronous layer"]
        MQ["Message queue<br/>(SQS, RabbitMQ)"]
        W1["Worker pool"]
    end

    subgraph Cache["Shared cache"]
        OC["Object cache<br/>(Redis / Memcached)"]
    end

    subgraph Data["Data layer"]
        DBM[("Primary DB<br/>(write)")]
        DBR1[("Read replica 1")]
        DBR2[("Read replica 2")]
        OBJ[("Blob storage")]
        SRCH[("Search index")]
    end

    User --> CDN
    User --> LB
    LB --> FE1
    LB --> FE2
    LB --> FE3
    FE1 --> SV1
    FE2 --> SV2
    FE3 --> SV3
    SV1 --> OC
    SV2 --> OC
    SV1 --> DBM
    SV2 --> DBR1
    SV3 --> DBR2
    SV1 --> MQ
    MQ --> W1
    W1 --> DBM
    W1 --> OBJ
    W1 --> SRCH

Each layer scales by adding identical instances. Each layer talks to the layer below through a narrow contract. State lives only in the data layer and the shared cache. Nothing important is held in a single process.


Principle 1: Statelessness

This is the most repeated word in the book. A service is stateless when any instance can handle any request, because no instance retains data that another instance needs.

Concrete consequences:

  • Sessions go in Redis or a signed cookie, not in process memory.
  • Uploads go directly to object storage, not to a local disk waiting to be flushed.
  • Counters and rate limits go in Redis, not in a class-level map.
  • Feature flags and config are fetched from a central store and cached briefly, not loaded once at startup.

The payoff: any instance can crash, be replaced, or be added without draining sessions, copying state, or coordinating with siblings. Auto-scaling becomes a configuration change rather than an engineering project.

A useful diagnostic question Ejsmont returns to: if I kill this node mid-request, does any user notice anything beyond a single failed request? If yes, the node holds state that should not be there.


Principle 2: Cache at Every Layer

Caching is the book's second-favourite tool. The rule is simple: serve every request as close to the user as physically possible.

flowchart LR
    Browser["Browser cache<br/>cost: 0"]
    CDN["CDN edge<br/>cost: ~1ms"]
    Proxy["Reverse proxy<br/>cost: ~1ms"]
    App["App-level cache<br/>cost: ~1ms"]
    Obj["Object cache<br/>(Redis/Memcached)<br/>cost: ~1-5ms"]
    DB["DB query cache<br/>cost: ~5-50ms"]
    Disk["DB disk read<br/>cost: ~10-200ms"]

    Browser --> CDN --> Proxy --> App --> Obj --> DB --> Disk

Each layer that misses pushes the cost to the next. Ejsmont's heuristic: a request that reaches your database is a request you failed to cache somewhere cheaper.

The book classifies caches by scope:

| Cache type | Lives in | Best for | |---|---|---| | Browser cache | The user's browser | Static assets, immutable resources | | CDN cache | Edge POPs | Public, geographically distributed assets | | Reverse proxy cache | Nginx / Varnish | Full-page or fragment HTTP caching | | Application cache | App process memory | Hot config, short-lived computed values | | Object cache | Redis / Memcached | Shared session, query results, fragments | | Database cache | Inside the DB | Query plans, buffer pool — managed for you |

The trap to avoid: stale or inconsistent caches caused by missing invalidation. Ejsmont recommends short TTLs over clever invalidation schemes whenever the data tolerates it, and version-keyed cache entries when it does not.


Principle 3: Asynchronous Processing

Synchronous request handlers are expensive: they hold an open connection, a thread or fiber, and the user's attention. Any work that does not need to complete before the response should be moved off the request path.

The pattern is uniform:

  1. Web request arrives.
  2. Service validates the request, persists a minimal record.
  3. Service publishes a message to a queue.
  4. Service returns 202 (or a success page) immediately.
  5. A separate worker pool consumes the queue and does the work.

Examples Ejsmont uses: sending email, generating image thumbnails, indexing for search, billing, webhook delivery, video transcoding, generating reports, sending push notifications.

Building blocks the book treats as core infrastructure:

  • Message queues (SQS, RabbitMQ, Kafka). Pick one, treat it as a managed dependency.
  • Topics and subscriptions for pub/sub fan-out.
  • Worker pools that scale independently of the web tier.
  • Dead-letter queues for messages that fail repeatedly.

Idempotency

The single non-negotiable requirement for any message consumer. Networks fail. Workers crash. Queues redeliver. A consumer that debits a card twice on retry is worse than one that does nothing.

Every message consumer must produce the same result whether it processes a message once, twice, or ten times.

Practical techniques: a processed_messages table keyed by message ID; a natural deduplication key (order_id + step_name); upsert semantics in the data store.


Principle 4: Front-End and Web Services

Ejsmont splits the application layer into two:

  • Front-end servers handle browser HTTP, render HTML (or serve a SPA), terminate sessions, route to services.
  • Web services expose narrow APIs to front-end servers, mobile apps, and other services.

The reason for the split: they have different scaling profiles. Front- ends are user-facing, latency-sensitive, often I/O-bound. Services are internal, throughput-sensitive, often CPU-bound. Scaling them separately lets each be sized for its actual workload.

Ejsmont is careful here. He is not prescribing microservices. The front-end and the web services may share a deployment, a database, even a process at first. The split is a logical one that becomes physical when scaling needs diverge.

When to actually split a service

Ejsmont's rules of thumb for promoting a logical service into a physically separate one:

  1. Its scaling profile differs sharply from the rest (e.g., a media-processing service is CPU-heavy; the rest is I/O-heavy).
  2. Its deployment cadence differs (a fraud detection service needs daily updates; everything else is weekly).
  3. Its failure domain must be isolated (a recommendations outage must not take down checkout).
  4. Its team ownership justifies the operational overhead.

In the absence of these, Ejsmont prefers a well-factored monolith.


Principle 5: The Data Layer

The hardest layer to scale. Ejsmont walks through the techniques in order of pain:

Vertical scaling

Buy a bigger database server. Always the first option because it requires no application change. Ends when single-machine limits are hit, typically at low millions of users.

Read replicas

Send writes to the primary; send most reads to replicas. Trivially scales reads. Requires the application to tolerate replication lag — a write may not be visible on a replica for a few hundred milliseconds. Read-your-own-writes is a common subtlety to handle.

Functional partitioning

Move different tables or domains to different databases. The users database, the orders database, the analytics database. Buys headroom; breaks cross-database joins; requires application-level coordination for transactions that span domains.

Data partitioning (sharding)

Split rows of a single table across multiple databases by a partition key. Each shard holds a slice of the data. Ejsmont treats sharding as the most important architectural decision the book covers, because:

  1. The partition key is hard to change later.
  2. Queries that do not include the key become expensive (scatter- gather across shards).
  3. Re-sharding requires either application support from day one (consistent hashing, virtual buckets) or downtime later.

His advice: decide on a partitioning strategy before you need it, even if you do not implement it yet. Make sure every important table has a defensible candidate key (user ID, account ID, tenant ID).

NoSQL — when, not whether

Ejsmont's position on NoSQL is unfashionable for 2015 but has aged well: most startups do not need it. A relational database with sane indexing and read replicas handles enormous load. Reach for NoSQL when you have a specific data model that fits poorly — wide column stores for time-series, document stores for schema-less user content, key- value stores for sessions and counters.

Avoid the trap of choosing a data store because it is fashionable and then bending the data model to fit.


Principle 6: Search, Counters, and Other Specialised Stores

Some workloads do not belong in the primary database at all:

  • Full-text search belongs in Elasticsearch / Solr / OpenSearch.
  • Counters and leaderboards belong in Redis.
  • Session storage belongs in Redis or signed cookies.
  • Object data (images, video, attachments) belongs in object storage (S3 and similar), never on the application disk.

The pattern is the same in every case: write asynchronously from the primary, accept eventual consistency, treat the secondary store as a derived view.


Principle 7: Observability and Capacity Planning

Ejsmont closes the playbook with the part most startup teams skip: measuring what is actually happening.

The three pillars

| Pillar | Tool examples | Question it answers | |---|---|---| | Metrics | Prometheus, CloudWatch, Datadog | How is the system behaving right now and over time? | | Logs | ELK, Splunk, Loki | What exactly happened in this request? | | Tracing | Zipkin, Jaeger, OpenTelemetry | Where did the time go across services? |

RPS budgeting

For every endpoint that matters, write down:

  • Expected requests per second at current load
  • Expected RPS at 10x and 100x current load
  • Latency budget (p50, p95, p99)
  • Cost budget (CPU, memory, DB calls per request)

These numbers turn architectural arguments into arithmetic. "Can we add this feature?" becomes "the new endpoint will add 200 RPS at peak, each request hits two DB queries, the primary is at 60% CPU — yes, with headroom."

Performance testing

Ejsmont recommends realistic load tests against staging environments that mirror production topology, not benchmarks on a laptop. The results feed back into the RPS budget and inform scaling decisions before traffic forces them.


The Lean Scalability Mindset

Threaded through the entire book is a worldview that distinguishes Ejsmont from most scalability authors:

Build the simplest system that can grow to handle the next order of magnitude. When you reach it, build the simplest system that can grow to the next.

This rejects two opposite errors:

| Error | Why it fails | |---|---| | Big architecture upfront — design for 1000x today | You ship slowly; you guess wrong about which dimensions actually grow; you spend a year building infrastructure for traffic that never comes | | Scale later — ignore scaling entirely until it hurts | You ship statefully; you couple services that should have been separate; you wake up at 3 a.m. with no path forward and a rewrite as the only option |

The middle path is disciplined defaults: stateless services, cacheable responses, queued background work, partitionable data, measured throughput. These are nearly free to apply from day one. They are extraordinarily expensive to retrofit. That economic asymmetry is the entire argument of the book.


analysis

Strengths

  • Calibrated to startup reality. Most scalability literature is written for organisations with a dedicated platform team and a 9-figure infrastructure budget. Ejsmont writes for the four-person backend team with one AWS account, a CFO who notices the bill, and a roadmap full of features. Almost every recommendation is sized to that audience.

  • A single, coherent playbook. The book gives the reader one mental model — layered, stateless, cached, asynchronous, partitioned, measured — rather than a catalogue of patterns. By the final chapter the playbook fits in the reader's head. That consolidation is rare and valuable.

  • Honest about trade-offs. Ejsmont consistently shows what each technique costs as well as what it buys. Sharding gains capacity, loses joins. Caching gains throughput, risks staleness. Async processing gains responsiveness, demands idempotency. He never hand-waves the downsides.

  • Skeptical of fashion. In 2015 the industry was deep into microservice fever, polyglot persistence, and NoSQL-as-default. Ejsmont pushed back on all three for startup audiences. A decade later the consensus has caught up with him.

  • Practical RPS thinking. The treatment of capacity planning is more concrete than in most books. Ejsmont insists on turning architectural debates into arithmetic — request budgets, latency budgets, cost budgets per endpoint. Junior engineers especially benefit from this discipline.

  • Excellent coverage of asynchronous processing. The chapters on queues, workers, and idempotency are some of the clearest in the literature. The "every consumer must be idempotent" rule is stated loudly enough that the reader will actually remember it.


Weaknesses

  • Technology references have aged. Specific tools (memcached versions, particular EC2 instance types, named NoSQL products and versions) date the book noticeably. A reader in the late 2020s must mentally translate "use Cassandra here" into "use whatever wide-column store your cloud provider sells now." The principles age well; the product names do not.

  • Light on modern cloud abstractions. Serverless functions, managed container platforms, managed databases at scale, edge compute, and managed message buses receive limited treatment. The book assumes a VM-and-load-balancer mental model that is still valid but no longer the default for new projects.

  • Sparse on distributed systems theory. Consensus protocols, consistency models (linearizability, causal, eventual), CAP and its successors are mentioned rather than developed. A reader who wants to understand why sharding is hard, not just how to do it, will need to follow up with Kleppmann.

  • Front-end and mobile are not the focus. The "front-end" in this book is the HTML-serving tier, not the JavaScript SPA, and not the iOS/Android client. Caching and scaling concerns specific to client-rendered applications (hydration, edge SSR, mobile network variability) get little space.

  • Some repetition. The "stateless, cache, async, measure" mantra is repeated so often across chapters that the middle of the book can feel padded. The argument is correct, but a reader who has internalised it by chapter four still has to wade through it in chapter eight.

  • Security treatment is thin. The book mentions authentication and TLS termination in passing. A reader who wants advice on scaling authentication systems, secrets management, or DDoS resilience must look elsewhere.


Controversies and Debates

Big architecture upfront vs. scale later

Ejsmont's central practical claim is that there is a defensible middle path between these two extremes — disciplined defaults applied early. Critics from both ends disagree:

  • The "scale later" camp (Pieter Hintjens, parts of the early- Rails community) argues that any time spent thinking about scalability before product-market fit is wasted, because most startups die before traffic matters. They would say Ejsmont still asks for too much upfront work.

  • The "build it right" camp (much of enterprise architecture literature) argues that the cost of retrofitting scalability is so high that engineers should always design for 100x of current load. They would say Ejsmont leaves too much risk on the table.

Ejsmont's response is essentially economic: statelessness and cacheability cost almost nothing to apply from day one; not applying them costs months of rewrite later. The asymmetry justifies the discipline. This argument has aged extremely well.

Microservices

The book is more skeptical of microservices than the 2015 mainstream was. Ejsmont's position — split a service only when its scaling profile, deployment cadence, failure domain, or ownership justifies the operational tax — was contrarian then and is consensus now. The "monolith first" advocacy of Martin Fowler, Kelsey Hightower's warnings about microservice complexity, and the rise of "modular monolith" patterns all vindicate Ejsmont's caution.

NoSQL

In 2015 it was common to choose a NoSQL database as the default for new projects. Ejsmont argued the opposite: start relational, reach for NoSQL only when you have a workload that genuinely fits it. This was unfashionable advice that has been broadly vindicated. Most "NoSQL by default" startups eventually added a relational database for the parts that needed transactions and joins, or migrated back entirely.

Premature optimisation vs. premature architecture

Ejsmont distinguishes the two more carefully than most authors. Premature optimisation (tuning the inner loop of a function before profiling) is correctly condemned everywhere. Premature architecture (designing the service split, the queue topology, the sharding key upfront) is different — some of it must be done early because retrofitting is prohibitively expensive. Ejsmont's nuanced position on which architectural decisions are reversible (caching strategy, service granularity) and which are not (sharding key, primary data store) is one of the book's quieter contributions.


Comparison to Similar Books

| Book | Difference | |---|---| | Designing Data-Intensive Applications (Kleppmann) | DDIA is the deep theoretical companion. It explains why the techniques Ejsmont prescribes actually work — consensus, isolation levels, replication topology. Ejsmont tells you what to do; Kleppmann tells you why it works. Read in that order. | | Release It! (Nygard) | Nygard focuses on the operational dimension: circuit breakers, bulkheads, timeouts, capacity planning under failure. Ejsmont focuses on the architectural dimension. Together they cover scaling and stability. | | The Art of Scalability (Abbott & Fisher) | Same audience, more managerial framing. Introduces the AKF Scale Cube and addresses organisational scaling alongside technical. Ejsmont is closer to the engineer's day-to-day. | | Building Microservices (Newman) | The reference for service-oriented architecture, but Ejsmont's reader is not yet ready for it. Read Ejsmont first to understand whether microservices are justified at all. | | Site Reliability Engineering (Google) | The opposite end of the spectrum. SRE assumes a dedicated platform organisation and Google-scale problems. Ejsmont assumes a startup with neither. Both are correct for their audience; do not confuse them. | | Scalability Rules (Abbott & Fisher) | Companion checklist to The Art of Scalability. Similar overlap with Ejsmont. If you want a rule-based reference, this is the one. |


Final Assessment

| Dimension | Rating | Notes | |---|---|---| | Originality | 6/10 | Most principles existed before; Ejsmont's contribution is the coherent startup-oriented synthesis | | Practical Utility | 9/10 | Directly applicable advice for the intended audience; few books in this category are this immediately useful | | Readability | 8/10 | Plain, clear prose; some repetition in the middle chapters | | Evidence | 7/10 | Grounded in real systems at Yahoo!/Amazon but mostly anecdotal; few formal references | | Currency (as of late 2020s) | 6/10 | Principles age well; specific tools have moved on | | Audience Fit | 10/10 | Pitched precisely at the underserved startup-engineer market | | Overall | 8/10 | The most useful single scalability book for engineers at growing startups |

Recommended for any backend or full-stack engineer at a startup that expects to outgrow its current architecture. Pair with Kleppmann for depth and Nygard for operational hardening.


narration

Introduction

Welcome to BookAtlas. Today's book: Web Scalability for Startup Engineers by Artur Ejsmont. Published 2015 by McGraw-Hill Education. Four hundred and thirty-two pages. The most useful single scalability book a startup engineer can read before their first traffic spike.

This is not the deepest book on distributed systems. It is not the most theoretically rigorous. It is not the most current on the latest cloud platforms. What it is, is the most appropriate book for its audience — the four-person backend team that just shipped a product and is now watching the user count rise faster than the architecture was designed for. If that is you, or you expect it to be you, this book is calibrated for your situation in a way that almost nothing else in the literature is.

Let's get into it.


Who Is Artur Ejsmont?

Ejsmont is a working software engineer and scalability consultant. He spent years at Yahoo! and Amazon and a string of high-growth startups, doing exactly the work the book describes — taking systems that handled thousands of requests per second and growing them to handle hundreds of thousands without rewriting from scratch. He is not a researcher. He is not a vendor. He is someone who has been on call when the traffic spike actually hit.

This matters because the book's authority comes from his experience. The advice is concrete because Ejsmont has tried it. The skepticism toward microservices, NoSQL, and "big architecture upfront" comes from having seen those choices fail in practice for startup-sized teams. You may find the prose plain. You will not find the advice naive.


The Core Distinction: What Scalability Actually Means

Ejsmont starts by separating three things that engineers conflate constantly: performance, availability, and scalability.

Performance is how fast a single request completes.

Availability is what fraction of requests succeed.

Scalability is whether you can handle ten times the load without rebuilding the system.

These overlap, but they are not the same. You can have a fast site that is constantly down. You can have a reliable site that can never grow past its current peak. You can have a system that scales beautifully but is slow at every size.

Ejsmont's working definition of scalability is:

The ability of a system to accommodate growth — in users, data, traffic, complexity — without a proportional increase in cost or a ground-up redesign.

The phrase that matters is without redesign. A system that needs to be rewritten at every 10x of growth is not a scalable system; it is a sequence of unscalable systems, with a heroic engineering campaign between each.

Narrator: This is the framing that makes the rest of the book work. Most "scaling problems" engineers encounter are not really about being too slow. They are about hitting an architectural wall — a singleton state-holder, a database that cannot be split, a queue that does not exist — that forces a rewrite. The book is about avoiding those walls.


The Playbook

Ejsmont's entire book is built on a small set of principles. You can write them on an index card. Here they are:

  1. Layer the system into front-end, services, and data.
  2. Make every service stateless.
  3. Cache at every layer.
  4. Process asynchronously when you can.
  5. Partition data before you must.
  6. Measure everything.

That's it. The remaining four hundred pages are about applying these principles in practice, with examples and trade-offs at every step. Let's walk through them.


Principle 1: Statelessness

This is the most repeated word in the book, and the most important.

A service is stateless when any instance can handle any request, because no instance is holding data that another instance needs.

The practical consequences are everywhere. Sessions go in Redis or in a signed cookie — not in process memory. File uploads go directly to object storage — not to a local disk waiting to be flushed. Rate limit counters go in a shared store — not in a class-level map. Even config and feature flags get fetched and briefly cached — not loaded once at startup and assumed to be correct forever.

The payoff is enormous. Any instance can crash, be replaced, or be added without anyone noticing. Auto-scaling stops being a project and becomes a configuration change.

Ejsmont gives a diagnostic question that I have stolen and used ever since:

If I kill this node mid-request, does any user notice anything beyond a single failed request?

If the answer is yes, that node is holding state that does not belong in a process. Move it.

Narrator: I have rescued more outages by applying this question than by any other technique. The number of services in the wild that quietly hold critical state in process memory is staggering. Every one of them is a future outage.


Principle 2: Cache at Every Layer

The book treats caching as the cheapest scaling lever you have, and it is right. The rule is simple: serve every request as close to the user as physically possible.

The hierarchy goes: browser cache, CDN edge cache, reverse proxy cache, application cache, shared object cache, database query cache, and finally a disk read. Each layer that misses pushes the cost to the next. A browser cache hit costs you nothing. A CDN hit costs almost nothing. A database query costs disk, CPU, network, and a connection slot.

Ejsmont's heuristic is brutal and useful:

A request that reaches your database is a request you failed to cache somewhere cheaper.

That is overstated, of course — some requests must reach the database, that is the point of the database. But as a default posture, it is the right one. Every request you can serve from a cache is a request that does not consume a database connection that your most important traffic needs.

The trap to watch for is cache invalidation. Stale data is worse than slow data. Ejsmont's pragmatic advice: prefer short TTLs over clever invalidation when the data tolerates it, and use version-keyed cache entries when it doesn't.


Principle 3: Asynchronous Processing

This is the principle that most directly turns a slow request into a fast one. Any work that the user does not need to see complete before they get a response belongs on a queue.

The shape is always the same. The web request arrives. The service validates it, persists a minimal record, publishes a message to a queue, and returns immediately. A separate worker pool pulls from the queue and does the actual work — sending the email, generating the thumbnail, indexing for search, calling the payment processor.

The benefits stack up. The user gets a fast response. The web tier holds connections for milliseconds instead of seconds. The worker tier scales independently, sized for throughput rather than latency. Failed work retries instead of failing the user request. Traffic spikes get absorbed by the queue instead of overwhelming downstream services.

But — and Ejsmont is very firm about this — every consumer must be idempotent. Networks fail. Workers crash. Queues redeliver. A consumer that charges a card twice on a retry is worse than one that does nothing at all.

Every message consumer must produce the same result whether it processes a message once, twice, or ten times.

The book gives standard techniques: a processed-messages table keyed by message ID, a natural deduplication key like an order ID plus a step name, upsert semantics in the data store. The exact technique matters less than the discipline. If you cannot make a consumer idempotent, you should not put it on a queue.


Principle 4: The Data Layer

This is the hardest one, and Ejsmont treats it with the most respect. The data layer is where scalability gets genuinely difficult.

You scale a data layer in stages, each more painful than the last.

First, buy a bigger database. This always works first, requires no application change, and ends somewhere around the limit of a single machine.

Second, add read replicas. Writes go to the primary, most reads go to the replicas. Trivially scales read-heavy workloads. The catch is replication lag — a write may not be visible on a replica for a few hundred milliseconds. Read-your-own-writes patterns and session affinity become things you have to think about.

Third, partition by function — move different tables to different databases. The users database, the orders database, the analytics database. Buys real headroom. Breaks cross-database joins. Forces application-level coordination for transactions that span domains.

Fourth, shard the data — split the rows of a single big table across many databases by a partition key. This is the hard one. Ejsmont treats sharding as the most important architectural decision the book covers, because:

The partition key is hard to change later. Queries that do not include the key become expensive scatter-gather operations across shards. Re-sharding requires either application support from the beginning — consistent hashing, virtual buckets — or significant downtime when you finally do it.

His advice: decide on a sharding strategy before you need it, even if you do not implement it yet. Every important table should have a defensible partition key — a user ID, an account ID, a tenant ID — that you would shard on when the time comes.


Principle 5: Pick Boring Tools

Ejsmont was writing in 2015, deep in the era of NoSQL-by-default and microservices-everywhere. His advice was unfashionable then and has aged beautifully.

On NoSQL: most startups do not need it. A relational database with proper indexing and a couple of read replicas handles enormous load. Reach for a specialised store when you have a specific data model that fits poorly — wide-column for time-series, document for schema-less content, key-value for sessions and counters. Do not choose a database because it is fashionable.

On microservices: do not split a service before its scaling profile, deployment cadence, failure domain, or team ownership justify the operational cost. A well-factored monolith ships faster, scales further than people expect, and is far easier to refactor than a network of misaligned microservices.

Narrator: Both of these were contrarian positions in 2015. Both are now consensus. The monolith-first movement, the modular monolith, the public retreat of high-profile companies from microservices — they all retroactively endorsed Ejsmont's caution. The lesson is not that microservices or NoSQL are wrong. The lesson is that the right time for them is later than most startups think.


Principle 6: Measure Everything

The chapter that most teams skip. Ejsmont is firm: you cannot scale what you cannot measure.

You need three things in place before scaling decisions become sensible. Metrics, which tell you how the system is behaving over time. Logs, which tell you what happened in a specific request. Tracing, which tells you where the time went across services.

And you need numbers attached to every endpoint that matters. The expected requests per second at current load. The expected load at ten times current load. The latency budget at the median and at the 99th percentile. The cost budget — CPU, memory, database calls — per request.

This is what Ejsmont calls RPS budgeting. It is the unsexy work that turns architectural debates into arithmetic. "Can we add this feature?" stops being a vibe and becomes: the new endpoint adds two hundred requests per second at peak, each request hits two database queries, the primary is at sixty percent CPU, so yes, with headroom.

Most architecture arguments in startups are won by whoever can do this math fastest. Most architecture mistakes in startups are made by people who never did the math.


The Lean Scalability Mindset

If you take one thing from the book, take this:

Build the simplest system that can grow to handle the next order of magnitude. When you reach it, build the simplest system that can grow to the next.

This rejects two opposite mistakes. The first is big architecture upfront — designing for one thousand times your current load on day one. You ship slowly, you guess wrong about which dimensions actually grow, and you spend a year building infrastructure for traffic that may never arrive.

The second is scale later — ignoring scaling entirely until something breaks. You ship a stateful, uncacheable, synchronous system. You couple services that should have been separable. You wake up at three in the morning during your first viral moment with no path forward except a rewrite.

The middle path is what Ejsmont calls disciplined defaults. Stateless services. Cacheable responses. Queued background work. Partitionable data. Measured throughput. These cost almost nothing to apply from the beginning. They are extraordinarily expensive to retrofit.

That economic asymmetry is the entire argument of the book.


The Verdict

Narrator: I have given this book to junior backend engineers as their first scalability text more times than I can count, and the result is always the same. They come back two months later with a working mental model of how systems grow. Not the deepest model — that takes years and Kleppmann — but a working one. They can sit in an architecture meeting and ask the right questions. They can read a postmortem and understand why the system fell over. They can look at a service their predecessor built and identify which of the six principles it violates.

The book is not perfect. Some of the specific technologies have aged. The cloud has moved on in ways the 2015 text could not anticipate. The distributed systems theory is light. The mobile and SPA front-end are barely covered. But the playbook itself — layer, statelessness, caching, async, partitioning, measurement — is as correct in 2026 as it was in 2015, and probably will be in 2035.

If you are a backend engineer at a startup that expects to grow, this is the book. Read it once for the principles, then keep it on the shelf for when you need to defend a design decision to your team. Pair it with Kleppmann for the theory and Nygard for the operational hardening. Together those three books are most of what you need to grow a system from prototype to production scale without a rewrite.

This has been a BookAtlas narration of Web Scalability for Startup Engineers by Artur Ejsmont. Thanks for listening.