Tutorial··6 min read

How to Scrape Cloudflare-Protected Sites in 2026

Cloudflare now protects over 20% of the web. If you've tried scraping a Cloudflare-backed site, you've hit the wall: 403 errors, endless CAPTCHA loops, or JavaScript challenges that never resolve. Here's exactly how to get past them.

Why Cloudflare blocks scrapers

Cloudflare's Bot Management layer sits between your scraper and the origin server. It runs a multi-step challenge pipeline before serving any HTML:

  • !JavaScript fingerprinting — evaluates browser APIs, timing, and canvas rendering
  • !TLS fingerprinting — checks your TLS handshake against known browser profiles
  • !IP reputation — flags datacenter IPs, proxies, and known scraper ranges
  • !Behavioral analysis — bot-like request patterns trigger automatic challenges

Standard HTTP clients (requests, axios, fetch) fail all of these checks. You get a 403 or a Turnstile CAPTCHA page instead of the content you want.


Solution 1: Self-hosted FlareSolverr

FlareSolverr is an open-source proxy server that launches a real Chromium browser, solves the Cloudflare challenge, and returns the final HTML to your scraper. It's free and runs locally — but there are real operational costs to consider.

Setup

# 1. Pull and run FlareSolverr
docker run -d \
  --name=flaresolverr \
  -p 8191:8191 \
  -e LOG_LEVEL=info \
  ghcr.io/flaresolverr/flaresolverr:latest

# 2. Test it's running
curl http://localhost:8191/health

Usage

# Send a request through FlareSolverr
curl -X POST http://localhost:8191/v1 \
  -H "Content-Type: application/json" \
  -d '{
    "cmd": "request.get",
    "url": "https://cloudflare-protected-site.com",
    "maxTimeout": 60000
  }'

Trade-offs

Pros

  • Free and open source
  • Full control over the instance
  • No per-request costs

Cons

  • Requires a server with a real browser (2GB+ RAM)
  • Sessions expire and need active management
  • Cloudflare updates can break it without warning
  • Each cold challenge takes ~30-60 seconds
  • Not scalable for high concurrency without complex session pooling

Solution 2: SnapRender API (managed)

SnapRender has FlareSolverr baked in. You make one API call with use_flaresolverr: true and we handle the browser, the session, and the challenge for you. No infra to manage.

import requests

response = requests.post(
    "https://api.snaprender.dev/v1/render",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "url": "https://cloudflare-protected-site.com",
        "output": ["html", "markdown"],
        "use_flaresolverr": True
    }
)
data = response.json()
print(data["markdown"])

That's it. Two lines of real work — the URL and the flag. SnapRender returns HTML, Markdown, a screenshot, or a PDF — whatever you need.


Which approach is right for you?

Self-hosted FlareSolverrSnapRender API
Setup time30-60 min2 minutes
InfrastructureYour serverNone
ConcurrencyManual session poolingHandled automatically
MaintenanceYou (Cloudflare updates)Us
CostServer costs + engineering timeFrom $0 free tier
Output formatsHTML onlyHTML, Markdown, Screenshot, PDF

Skip the infra headache

100 free requests every month. No credit card. Add use_flaresolverr: true and start scraping in minutes.

Start free