How to Scrape Cloudflare-Protected Sites in 2026
Cloudflare now protects over 20% of the web. If you've tried scraping a Cloudflare-backed site, you've hit the wall: 403 errors, endless CAPTCHA loops, or JavaScript challenges that never resolve. Here's exactly how to get past them.
Why Cloudflare blocks scrapers
Cloudflare's Bot Management layer sits between your scraper and the origin server. It runs a multi-step challenge pipeline before serving any HTML:
- !JavaScript fingerprinting — evaluates browser APIs, timing, and canvas rendering
- !TLS fingerprinting — checks your TLS handshake against known browser profiles
- !IP reputation — flags datacenter IPs, proxies, and known scraper ranges
- !Behavioral analysis — bot-like request patterns trigger automatic challenges
Standard HTTP clients (requests, axios, fetch) fail all of these checks. You get a 403 or a Turnstile CAPTCHA page instead of the content you want.
Solution 1: Self-hosted FlareSolverr
FlareSolverr is an open-source proxy server that launches a real Chromium browser, solves the Cloudflare challenge, and returns the final HTML to your scraper. It's free and runs locally — but there are real operational costs to consider.
Setup
# 1. Pull and run FlareSolverr
docker run -d \
--name=flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
ghcr.io/flaresolverr/flaresolverr:latest
# 2. Test it's running
curl http://localhost:8191/healthUsage
# Send a request through FlareSolverr
curl -X POST http://localhost:8191/v1 \
-H "Content-Type: application/json" \
-d '{
"cmd": "request.get",
"url": "https://cloudflare-protected-site.com",
"maxTimeout": 60000
}'Trade-offs
Pros
- Free and open source
- Full control over the instance
- No per-request costs
Cons
- Requires a server with a real browser (2GB+ RAM)
- Sessions expire and need active management
- Cloudflare updates can break it without warning
- Each cold challenge takes ~30-60 seconds
- Not scalable for high concurrency without complex session pooling
Solution 2: SnapRender API (managed)
SnapRender has FlareSolverr baked in. You make one API call with use_flaresolverr: true and we handle the browser, the session, and the challenge for you. No infra to manage.
import requests
response = requests.post(
"https://api.snaprender.dev/v1/render",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"url": "https://cloudflare-protected-site.com",
"output": ["html", "markdown"],
"use_flaresolverr": True
}
)
data = response.json()
print(data["markdown"])That's it. Two lines of real work — the URL and the flag. SnapRender returns HTML, Markdown, a screenshot, or a PDF — whatever you need.
Which approach is right for you?
| Self-hosted FlareSolverr | SnapRender API | |
|---|---|---|
| Setup time | 30-60 min | 2 minutes |
| Infrastructure | Your server | None |
| Concurrency | Manual session pooling | Handled automatically |
| Maintenance | You (Cloudflare updates) | Us |
| Cost | Server costs + engineering time | From $0 free tier |
| Output formats | HTML only | HTML, Markdown, Screenshot, PDF |
Skip the infra headache
100 free requests every month. No credit card. Add use_flaresolverr: true and start scraping in minutes.