Skip to main content
Explainer · infrastructure

What is a CDN?

A content delivery network is the global cache layer that makes today's web feel instantaneous — and the shield that lets a one-developer side-project survive a front-page traffic spike. Here is how they actually work, and how the major providers compare.

The problem CDNs were invented to solve

In the late 1990s, websites had one server (often in someone's garage or a single datacenter in Virginia) and increasingly distant users (East Asia and Europe were minutes-of-page-load away). Bandwidth was expensive, transcontinental latency was punishing, and an unexpected mention on Slashdot would flatten a small site for hours. Akamai, founded out of MIT in 1998, pioneered the model that became the industry: copy the content to thousands of servers spread around the world, and serve each user from the one geographically nearest.

Twenty-five years later that model has expanded enormously. A modern CDN is not just a cache — it terminates TLS, runs WAF rules, makes routing decisions, executes serverless code at the edge, transcodes images on the fly, and absorbs terabit-scale DDoS attacks. But the original premise still holds: get the response physically close to the user, and let your origin do as little work as possible.

What you actually get from a CDN

How a request actually flows through a CDN

  1. DNS resolution. Your domain's A record points at a CDN-owned IP, not your origin. DNS lookup tools show this — query www.shopify.com and you'll see a Fastly IP; query github.com and you'll see a Fastly IP again.
  2. Anycast routing. The CDN advertises that same IP from hundreds of BGP points-of-presence worldwide. The user's ISP routes them to whichever PoP is closest in BGP terms — usually but not always the geographically closest. See the BGP glossary entry for how this works.
  3. Cache lookup. The edge node hashes the request (URL + relevant query parameters + relevant headers like Accept-Encoding) and checks its local cache. A hit returns the cached response immediately, with a header like cf-cache-status: HIT or x-cache: HIT telling you which edge served you.
  4. Tier 2 / shield. If the edge missed, most modern CDNs send the request to a regional shield (a second-tier cache closer to your origin) instead of straight to origin. The shield catches a miss-from-edge before it becomes a hit on your origin server, dramatically reducing origin traffic.
  5. Origin fetch. If both levels of cache miss, the request finally reaches your origin. The CDN stores the response (according to your Cache-Control headers), serves it to the user, and serves subsequent users in that region from the cache.

The cache-control language

The behavior of every CDN is governed by HTTP cache headers your origin sends. The most important ones:

The traditional "cache invalidation is hard" problem is mostly solved by content-hashed URLs: rather than invalidating /styles.css, you publish /styles-d4f9e7c2.css and update the HTML to reference it. The old version is still cached (and harmlessly so); nobody asks for it anymore.

The major providers, in plain English

The CDN market is concentrated at the top but increasingly competitive at the budget tier. The shortlist:

Trade-offs and gotchas

TLS visibility

The CDN sees your decrypted traffic by definition — it has to, to make routing and caching decisions. That means you are trusting the CDN with whatever your users send through it: form posts, authenticated session tokens, file uploads. Most CDNs publish clear data-handling policies; pick one whose privacy posture you can defend to your users. For sensitive workloads, look at end-to-end encryption schemes the CDN can't decrypt (Cloudflare's Encrypted Client Hello, Fastly's per-tenant TLS keys).

Origin IP leakage

If your origin's real IP is ever exposed — through email headers, historical DNS records on tools like SecurityTrails, certificate transparency logs publishing every domain you've held a cert for, or a careless error page — attackers can bypass the CDN entirely and hit your origin directly. The fix is a firewall on the origin that only accepts traffic from the CDN's published IP ranges. Most CDNs publish those ranges as a downloadable JSON; rotate the firewall rules from cron.

Cache poisoning

When the cache key doesn't include a header that affects the response (forgettingVary: Authorization, for example), one user's logged-in response can be served to another user. Bug, not a hypothetical — it has appeared at major sites. Solution: explicit Vary headers, careful keying of the cache, and synthetic monitoring that fetches as multiple identities and compares.

When NOT to use a CDN

Some workloads don't benefit. Long-running WebSocket connections (the cache adds nothing, the routing layer adds latency), uncacheable real-time APIs that hit the origin on every request anyway (you're just adding a hop), and trusted internal services where you control both ends and don't want a third party in the middle. For those, point at your origin directly or build your own routing layer.

How to tell if a site is using a CDN

  1. Run a DNS lookup on the site. If the A record points at a known CDN range (104.16.x.x, 199.232.x.x, 23.227.x.x), you have your answer.
  2. Fetch the page and inspect headers. cf-cache-status means Cloudflare,x-cache + x-served-by means Fastly, x-amz-cf-id means CloudFront, x-akamai-transformed means Akamai.
  3. WHOIS the IP. The org will read "Cloudflare, Inc.", "Fastly, Inc.", "Akamai Technologies", etc.

Related reading