The problem CDNs were invented to solve
In the late 1990s, websites had one server (often in someone's garage or a single datacenter in Virginia) and increasingly distant users (East Asia and Europe were minutes-of-page-load away). Bandwidth was expensive, transcontinental latency was punishing, and an unexpected mention on Slashdot would flatten a small site for hours. Akamai, founded out of MIT in 1998, pioneered the model that became the industry: copy the content to thousands of servers spread around the world, and serve each user from the one geographically nearest.
Twenty-five years later that model has expanded enormously. A modern CDN is not just a cache — it terminates TLS, runs WAF rules, makes routing decisions, executes serverless code at the edge, transcodes images on the fly, and absorbs terabit-scale DDoS attacks. But the original premise still holds: get the response physically close to the user, and let your origin do as little work as possible.
What you actually get from a CDN
- Latency reduction. Static assets — JavaScript bundles, CSS, images, fonts — live in cache at hundreds of points of presence. A user in Tokyo fetches your site from a Tokyo edge in 5–15 ms instead of crossing the Pacific for 120–180 ms. The bigger the bundle, the bigger the win, because the time-to-first-byte improvement compounds with the throughput improvement.
- Origin shielding. Most requests never touch your origin server. Cloudflare reports that for the average site behind it, more than 70% of bytes are served from cache. Your backend can run on a tiny VPS while still surviving viral traffic — the cache absorbs the spike.
- TLS termination. The CDN does the TLS handshake; your origin can run plain HTTP over a private link. Lower CPU on origin, hundreds of regional locations doing the certificate work, automatic HTTP/3 and TLS 1.3 even if your origin only speaks HTTP/1.1.
- DDoS mitigation. Tier-1 CDNs absorb attacks measured in terabits per second; the famous Mirai botnet attack of 2016 hit 1.2 Tbps and was absorbed by OVH's network. No single origin can do this. Sitting behind a serious CDN means the cost of attacking you is the CDN's cost, not yours.
- WAF and bot management. Layer 7 filtering at the edge blocks SQL injection probes, credential-stuffing attempts, scraper traffic, and malicious user agents before they reach your server. Most CDNs ship a default WAF ruleset that handles the OWASP Top Ten without configuration.
- Image and video optimization. Many CDNs transcode on the fly — sending AVIF or WebP to browsers that support it, MP4 with adaptive bitrate to video clients, and resized variants of the same source image depending on the requested viewport. You upload one master file; the CDN serves the right derivative.
- Edge compute. Modern CDNs (Cloudflare Workers, Fastly Compute@Edge, AWS Lambda@Edge, Vercel Edge Functions, Netlify Edge Functions) let you run code at the edge — A/B tests, auth checks, redirects, even full server-rendered HTML. The distinction between "CDN" and "platform" is steadily blurring.
How a request actually flows through a CDN
- DNS resolution. Your domain's A record points at a CDN-owned IP, not your origin. DNS lookup tools show this — query
www.shopify.comand you'll see a Fastly IP; querygithub.comand you'll see a Fastly IP again. - Anycast routing. The CDN advertises that same IP from hundreds of BGP points-of-presence worldwide. The user's ISP routes them to whichever PoP is closest in BGP terms — usually but not always the geographically closest. See the BGP glossary entry for how this works.
- Cache lookup. The edge node hashes the request (URL + relevant query parameters + relevant headers like Accept-Encoding) and checks its local cache. A hit returns the cached response immediately, with a header like
cf-cache-status: HITorx-cache: HITtelling you which edge served you. - Tier 2 / shield. If the edge missed, most modern CDNs send the request to a regional shield (a second-tier cache closer to your origin) instead of straight to origin. The shield catches a miss-from-edge before it becomes a hit on your origin server, dramatically reducing origin traffic.
- Origin fetch. If both levels of cache miss, the request finally reaches your origin. The CDN stores the response (according to your
Cache-Controlheaders), serves it to the user, and serves subsequent users in that region from the cache.
The cache-control language
The behavior of every CDN is governed by HTTP cache headers your origin sends. The most important ones:
Cache-Control: max-age=31536000, immutable— cache for a year, and the client may treat the response as never-changing even when revalidating. Used for content-hashed assets likemain.a8f3c2.js.Cache-Control: public, s-maxage=300, stale-while-revalidate=86400— CDN caches for 5 minutes; after expiry, serve the stale copy for up to 24 hours while asynchronously fetching a fresh one. Excellent for "mostly-stable" pages.Cache-Control: no-store— never cache, anywhere. Used for personalized HTML or anything containing auth state.Vary: Accept-Encoding, Accept-Language— instructs the CDN to cache a different version for each value of these headers, so a French-speaking user and an English-speaking user don't get each other's localized HTML.
The traditional "cache invalidation is hard" problem is mostly solved by content-hashed URLs: rather than invalidating /styles.css, you publish /styles-d4f9e7c2.css and update the HTML to reference it. The old version is still cached (and harmlessly so); nobody asks for it anymore.
The major providers, in plain English
The CDN market is concentrated at the top but increasingly competitive at the budget tier. The shortlist:
- Cloudflare — 300+ cities, generous free tier (yes, including HTTP/3 and TLS 1.3), broad feature set (Workers, R2 object storage, D1 SQL at the edge, Zero Trust, DNS, Pages). Best default for most projects, especially those starting small.AS13335 on the IPFerret ASN explorer.
- Akamai — the originator (1998), tremendous global reach including many ISPs as embedded caching partners, enterprise pricing, complex configuration. Where you go when you have a CDN budget and a procurement department.
- Fastly — VCL-based configuration (very expressive — you can ship complex routing logic directly to the edge), strong developer community, loved by content sites (NYT, BuzzFeed, GitHub). Higher per-request cost but lower origin-egress cost.
- AWS CloudFront — tight integration with the rest of AWS, the obvious pick if your origin is already an S3 bucket or an EC2 instance. Less compelling as a free-standing product.
- Bunny.net — much smaller footprint than Cloudflare, much lower per-GB pricing (single-digit dollars per terabyte vs. Cloudflare R2's competitive but higher rate), great for video hosting and high-egress static delivery.
- Vercel, Netlify, Render — platform-CDN hybrids; the CDN is invisible because the platform provides the whole stack. Easiest path if your application is a modern Next.js / Astro / Remix / SvelteKit site.
Trade-offs and gotchas
TLS visibility
The CDN sees your decrypted traffic by definition — it has to, to make routing and caching decisions. That means you are trusting the CDN with whatever your users send through it: form posts, authenticated session tokens, file uploads. Most CDNs publish clear data-handling policies; pick one whose privacy posture you can defend to your users. For sensitive workloads, look at end-to-end encryption schemes the CDN can't decrypt (Cloudflare's Encrypted Client Hello, Fastly's per-tenant TLS keys).
Origin IP leakage
If your origin's real IP is ever exposed — through email headers, historical DNS records on tools like SecurityTrails, certificate transparency logs publishing every domain you've held a cert for, or a careless error page — attackers can bypass the CDN entirely and hit your origin directly. The fix is a firewall on the origin that only accepts traffic from the CDN's published IP ranges. Most CDNs publish those ranges as a downloadable JSON; rotate the firewall rules from cron.
Cache poisoning
When the cache key doesn't include a header that affects the response (forgettingVary: Authorization, for example), one user's logged-in response can be served to another user. Bug, not a hypothetical — it has appeared at major sites. Solution: explicit Vary headers, careful keying of the cache, and synthetic monitoring that fetches as multiple identities and compares.
When NOT to use a CDN
Some workloads don't benefit. Long-running WebSocket connections (the cache adds nothing, the routing layer adds latency), uncacheable real-time APIs that hit the origin on every request anyway (you're just adding a hop), and trusted internal services where you control both ends and don't want a third party in the middle. For those, point at your origin directly or build your own routing layer.
How to tell if a site is using a CDN
- Run a DNS lookup on the site. If the A record points at a known CDN range (
104.16.x.x,199.232.x.x,23.227.x.x), you have your answer. - Fetch the page and inspect headers.
cf-cache-statusmeans Cloudflare,x-cache+x-served-bymeans Fastly,x-amz-cf-idmeans CloudFront,x-akamai-transformedmeans Akamai. - WHOIS the IP. The org will read "Cloudflare, Inc.", "Fastly, Inc.", "Akamai Technologies", etc.
Related reading
- Cloudflare's ASN page — see prefixes, country, and cloud-provider detection.
- TLS / HTTPS explained — the handshake the CDN terminates on your behalf.
- How traceroute works — what you see when you traceroute through a CDN's anycast network.
- BGP — the routing protocol that makes anycast possible.
