Skip to main content
Explainer · developers

Rate limiting by IP — the right way

Rate-limiting by IP looks like a one-liner — and is — until you ship it. Then CGNAT pools, mobile carriers, IPv6, corporate proxies, and the X-Forwarded-For header turn it into the most common source of "why-am-I-blocked" support tickets.

Why IP rate-limiting is your first defense

When a request hits your service, the IP address is the only stable identifier you have before any authentication or session has happened. It is the obvious axis to rate-limit anonymous traffic on: scrapers, credential stuffers, spam form submissions, exploratory vulnerability probes, and unauthenticated DDoS all hammer your endpoints from one or a small number of IPs. Limiting requests-per-IP-per-time is the cheapest way to bound the damage.

The same simplicity that makes IP rate limiting attractive is also what makes the edge cases bite. IP is not a user identifier — it's a network identifier. Many users share one IP, one user has many IPs, and the IP you see in your handler may not be the IP the request actually came from. This article walks through the production considerations.

Pick the right algorithm

Token bucket — your default

Each IP has a bucket holding up to N tokens. Each request consumes one. Tokens are refilled at rate R per second up to the cap N. If the bucket is empty when a request arrives, it's rate-limited.

Properties: allows bursts up to N requests at any one moment, while enforcing an average rate of R per second. Fits in a single Redis script using INCRBYand EXPIRE calls. Most public API rate limits (GitHub, Stripe, Slack, Twitter/X) are token buckets at their core.

Sliding window — when exact counts matter

Store a timestamped list of requests in the last window. Count entries newer than now - window; reject if the count exceeds the limit. Memory cost is O(requests-in-window) per IP; CPU cost is dominated by trimming the list. Use this when you need precise compliance (e.g., "exactly 100 requests per minute, no more") rather than a smoothed approximation.

Sliding window counter — the practical compromise

Keep two fixed-window counters (this window and the previous one), and compute the effective rate as a weighted average. Cheaper than the log-based sliding window, more accurate than the fixed-window naive version. The default for most high-throughput limiters.

Leaky bucket — for output smoothing

Requests enter a queue; they leave at a constant rate. Excess requests overflow the bucket and are dropped. Less useful for inbound rate limiting (you usually want to allow bursts) but valuable for outbound smoothing — e.g., capping your application's calls to a third-party API at exactly N/sec to stay inside their rate limit.

Fixed window — what to avoid

The naive "count all requests this minute, reset on the boundary." The problem is that an attacker can hit the cap once just before the boundary and again just after, putting through 2× the intended rate in a one-second window across the boundary. Easy to implement, easy to exploit. Don't ship this in production for anything you care about.

Get the IP right — the X-Forwarded-For trap

If your application sits behind a CDN, load balancer, or reverse proxy, the peer IP your handler sees is the proxy's, not the client's. The real client IP is in headers set by the proxy:

The right pattern is: configure your framework's trusted-proxy list explicitly. Trust the proxy's headers only when the request came from a known proxy IP. For requests from elsewhere, ignore the header and use the peer-socket IP. Frameworks like Express, Rails, Django, and FastAPI all have this configuration; use it. IPFerret's request headers tool shows exactly what headers a given request carried, which is invaluable when debugging a flaky setup.

The shared-IP problem

Per-IP rate limiting assumes IP-to-user is roughly one-to-one. In reality:

A naive "60 requests per minute per IP" limit, applied to an IP shared by 50 users, gives each user about 1 request per minute — enough to break the application for everyone. The defensive moves:

  1. Set the per-IP limit at a level that tolerates moderate sharing. Multiply a single-user limit by 3–5× to leave room for shared egresses. If a real user makes 30 requests per minute peak, 150–200 per IP is sane.
  2. Add a tighter per-session/per-account limit on top. The IP limit stops broad abuse; the session limit catches the specific abusive user without hurting their neighbors on the same IP.
  3. Use connection-type metadata. If your geo-IP provider tags the IP as "datacenter" or "mobile," tune the limits per category. Datacenter IPs sending consumer-style traffic deserve stricter limits; mobile-carrier IPs deserve more generous ones because they're heavily shared.
  4. Distinguish well-known shared infrastructure. Some IPs you should rate-limit differently from defaults — known Tor exits, known VPN ranges, large corporate NAT IPs. Most providers (MaxMind GeoIP2, IPinfo) sell this as a feature.

IPv6 changes the calculus

IPv6 prefixes are typically /64 per customer — meaning every device on that customer's network has a different /128 address inside their /64. If you rate limit by /128, an attacker can rotate trivially within their allocated block. If you rate limit by /32 or /48, you risk treating an entire ISP region as one IP.

The conventional answer is to rate-limit IPv6 by /64 — one bucket per customer allocation. That gives the same logical granularity as IPv4 per-customer rate limiting. Some operators go further and limit by /48 for routing-level abuse (entire allocation blocks), with a separate /64 limit on top for individual customers.

What to do when the limit fires

Returning HTTP 429 Too Many Requests is the obvious answer. Make it informative:

Combine signals — never rely on IP alone

The strongest rate-limiting systems use IP as one signal among several:

A token-bucket per IP catches volumetric abuse; per-account catches account abuse; per-fingerprint catches account-rotation abuse; behavioral catches the sophisticated attacks that successfully rotate everything else. The layers compound.

Where to put the limiter

Three common architectural placements, each with trade-offs:

Most production systems use a layered combination: edge limits for the obvious abuse, gateway limits per tier, application limits for the per-feature business rules.

Related reading