What is Rate Limiting?
Rate limiting is a technique used by APIs and web servers to control the number of requests a client can make within a given time window. It protects server resources, prevents abuse, and ensures fair access for all users.
Why APIs use rate limiting
Without rate limiting, a single client could flood an API with thousands of requests per second, consuming all available server resources and degrading performance for every other user. Rate limits act as a circuit breaker — they cap usage per client so the service stays fast and reliable for everyone.
Rate limiting also prevents credential stuffing attacks, brute-force login attempts, and scraping abuse. For API providers, it's a way to enforce usage tiers — free plans get lower limits, paid plans get higher limits.
Common rate limiting algorithms
Fixed Window
Count requests in fixed time intervals (e.g., per minute). Simple to implement but can allow bursts at window boundaries — a client could send 100 requests at 0:59 and 100 more at 1:00.
Sliding Window
Tracks requests over a rolling time window, smoothing out the burst problem of fixed windows. More accurate but slightly more complex to implement. Most production APIs use this approach.
Token Bucket
Clients spend "tokens" for each request. Tokens refill at a steady rate. This allows short bursts while enforcing an average rate over time. Used by AWS, Stripe, and most major APIs.
Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. This ensures a perfectly smooth output rate regardless of input patterns.
How to handle rate limits
Respect Retry-After headers
When you receive a 429 response, check the Retry-After header. It tells you exactly how many seconds to wait before retrying. Don't guess — use the value the server provides.
Implement exponential backoff
If no Retry-After header is present, retry with increasing delays: 1s, 2s, 4s, 8s. Add random jitter (e.g., ±500ms) to prevent thundering herd problems when multiple clients retry simultaneously.
Use request queues
Instead of firing requests as fast as possible, queue them and process at a controlled rate. Libraries like Bottleneck (Node.js) or ratelimit (Python) make this easy.
Cache responses
If you're fetching the same data repeatedly, cache it locally. This reduces total API calls and keeps you well under rate limits. Many scraping workflows benefit from aggressive caching.
Monitor your usage
Track your API call counts against your rate limits. Most APIs include X-RateLimit-Remaining and X-RateLimit-Reset headers so you can see how close you are to the limit in real time.
Frequently asked questions
Rate limiting is a technique used by APIs and web servers to control how many requests a client can make within a given time window. For example, an API might allow 100 requests per minute per API key. Requests beyond that limit are rejected with a 429 status code.
APIs use rate limiting to prevent abuse, protect server resources, ensure fair usage across all clients, and maintain service stability. Without rate limits, a single client could overwhelm the server with requests and degrade performance for everyone.
HTTP 429 "Too Many Requests" is the standard response code when a client exceeds the rate limit. The response usually includes a Retry-After header indicating how many seconds to wait before making another request.
Implement exponential backoff: when you receive a 429 response, wait and retry with increasing delays (1s, 2s, 4s, etc.). Also check for Retry-After headers, use request queues to stay under limits, and cache responses to reduce unnecessary requests.
Rate limiting rejects requests that exceed the limit (returning 429 errors). Throttling slows down request processing instead of rejecting them — requests are queued and processed at a controlled rate. Some systems use both.
SnapRender applies per-key rate limits based on your plan. Free tier allows 5 concurrent requests; paid plans allow 10-50 concurrent requests. The API returns clear 429 responses with Retry-After headers so your code can handle limits gracefully.
Learn more
What is Web Scraping?
The automated process of extracting data from websites, including handling rate limits.
What is a Proxy Server?
How proxy servers help distribute requests and manage rate limits when scraping.
Scraping API
SnapRender handles rate limits, retries, and queuing for you. Just send a URL.
Cloudflare Bypass
How SnapRender handles Cloudflare-protected sites that aggressively rate limit scrapers.
Stop fighting rate limits.
SnapRender manages retries, queuing, and concurrency for you. Start free.
Start Free — 100 requests/month