Why scrape Indeed?
Job listing data is valuable across multiple use cases:
Job market analysis
Track which roles are in demand, which skills appear most frequently, and how job volume changes over time.
Salary research
Benchmark compensation across roles, companies, and locations. Build salary comparison tools with real data.
Competitor hiring
Monitor what positions competitors are hiring for. A flurry of ML engineer postings signals a pivot to AI.
Recruitment automation
Aggregate listings from Indeed and other boards into a unified pipeline. Match candidates to relevant openings automatically.
The Cloudflare challenge
Indeed uses Cloudflare's enterprise bot management, which is one of the most sophisticated anti-scraping systems available. Here is what you are up against:
Protection layers
- !JavaScript challenges that must be solved in a real browser environment
- !TLS fingerprinting that detects non-browser HTTP clients
- !Behavioral analysis that flags automated navigation patterns
- !CAPTCHA challenges triggered by suspicious request velocity
- !Cookie-based session tracking that detects session reuse across IPs
- !Canvas and WebGL fingerprinting to detect headless browsers
Standard Puppeteer or Playwright setups fail against this stack. You either need a purpose-built stealth browser with proxy rotation, or you offload the challenge to a service like SnapRender with FlareSolverr integration.
Method 1: DIY with Puppeteer
Here is a basic Puppeteer scraper for Indeed search results. Note: this will only work with stealth plugins and residential proxies — Cloudflare blocks vanilla Puppeteer within seconds.
#E8A0BF">const puppeteer = #E8A0BF">require(#A8D4A0">'puppeteer');
(#E8A0BF">async () => {
#E8A0BF">const browser = #E8A0BF">await puppeteer.#87CEEB">launch({
headless: #A8D4A0">'new',
args: [#A8D4A0">'--no-sandbox', #A8D4A0">'-#FFB347">-disable-setuid-sandbox'],
});
#E8A0BF">const page = #E8A0BF">await browser.#87CEEB">newPage();
#E8A0BF">await page.setUserAgent(
#A8D4A0">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
#A8D4A0">'AppleWebKit/537.36 (KHTML, like Gecko) ' +
#A8D4A0">'Chrome/124.0.0.0 Safari/537.36'
);
#E8A0BF">await page.#87CEEB">goto(
#A8D4A0">'https://www.indeed.com/jobs?q=software+engineer&l=remote',
{ waitUntil: #A8D4A0">'networkidle2', timeout: 30000 }
);
// Wait #E8A0BF">for job cards to render
#E8A0BF">await page.#87CEEB">waitForSelector(#A8D4A0">'.job_seen_beacon', { timeout: 15000 });
#E8A0BF">const jobs = #E8A0BF">await page.#87CEEB">$$eval(#A8D4A0">'.job_seen_beacon', (cards) =>
cards.map((card) => ({
title: card.#87CEEB">querySelector(#A8D4A0">'.jobTitle span')?.innerText,
company: card.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"company-name"]')?.innerText,
location: card.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"text-location"]')?.innerText,
salary: card.#87CEEB">querySelector(#A8D4A0">'.salary-snippet-container')?.innerText || #E8A0BF">null,
}))
);
console.#87CEEB">log(jobs);
#E8A0BF">await browser.#87CEEB">close();
})();To make this work reliably, you would need puppeteer-extra with stealth plugin, residential rotating proxies ($15-50/GB), and retry logic for Cloudflare challenges. The infra overhead is significant.
Method 2: SnapRender API with FlareSolverr
SnapRender's use_flaresolverr: true flag routes requests through a Chromium session that solves Cloudflare challenges automatically. No proxies, no stealth plugins, no CAPTCHA services.
Render as markdown
Get the full Indeed search results page as clean markdown — ideal for LLM analysis or building job aggregator pipelines.
#E8A0BF">import requests
# Render Indeed search results #E8A0BF">as clean markdown
render = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/render",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://www.indeed.com/jobs?q=software+engineer&l=remote",
#A8D4A0">"format": #A8D4A0">"markdown",
#A8D4A0">"use_flaresolverr": #E8A0BF">True
}
)
#E8A0BF">print(render.#87CEEB">json()[#A8D4A0">"data"][#A8D4A0">"markdown"])Extract structured data
Pull job titles, companies, locations, and salaries as structured JSON arrays.
#E8A0BF">import requests
# Extract structured job listing data
extract = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/extract",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://www.indeed.com/jobs?q=software+engineer&l=remote",
#A8D4A0">"use_flaresolverr": #E8A0BF">True,
#A8D4A0">"selectors": {
#A8D4A0">"titles": #A8D4A0">".jobTitle span",
#A8D4A0">"companies": #A8D4A0">"[data-testid=#A8D4A0">'company-name']",
#A8D4A0">"locations": #A8D4A0">"[data-testid=#A8D4A0">'text-location']",
#A8D4A0">"salaries": #A8D4A0">".salary-snippet-container"
}
}
)
#E8A0BF">print(extract.#87CEEB">json())Example response
{
#A8D4A0">"status": #A8D4A0">"success",
#A8D4A0">"data": {
#A8D4A0">"titles": [
#A8D4A0">"Senior Software Engineer",
#A8D4A0">"Full Stack Developer",
#A8D4A0">"Backend Engineer - Python"
],
#A8D4A0">"companies": [#A8D4A0">"Stripe", #A8D4A0">"Shopify", #A8D4A0">"Datadog"],
#A8D4A0">"locations": [#A8D4A0">"Remote", #A8D4A0">"Remote", #A8D4A0">"Remote - US"],
#A8D4A0">"salaries": [#A8D4A0">"$180,000 - $220,000 a year", #A8D4A0">"$150,000 - $190,000 a year", #E8A0BF">null]
},
#A8D4A0">"url": #A8D4A0">"https://www.indeed.com/jobs?q=software+engineer&l=remote",
#A8D4A0">"elapsed_ms": 4210
}Legal considerations
Indeed is more litigious about scraping than most sites. Here is what you need to know:
- 1.Indeed's Terms of Service explicitly prohibit automated access, and they have filed lawsuits against scraping companies (Indeed v. Glassdoor, Indeed v. Mixrank).
- 2.The hiQ v. LinkedIn ruling supports scraping public data, but Indeed has argued their data is not truly "public" since listings are submitted by employers under specific terms.
- 3.Never scrape applicant data, resumes, or any personally identifiable information. Stick to public job listing details.
- 4.Rate-limit aggressively. Beyond the legal risk, overwhelming Indeed's servers could constitute a denial-of-service.
- 5.Consider whether Indeed's paid API or job posting partnerships might be a better fit for your commercial use case.
Disclaimer
This tutorial is for educational purposes. SnapRender provides the technical capability to render and extract web content, but it is your responsibility to ensure your use case complies with applicable laws and website terms of service.
Start free — 100 requests/month
Get your API key in 30 seconds. Bypass Cloudflare and extract job data with a single API call. No browser fleet, no proxy rotation, no CAPTCHA solving.
Get Your API KeyFrequently asked questions
Indeed's Terms of Service explicitly prohibit scraping and automated access. They actively enforce this through litigation — Indeed has sued multiple scraping companies. While the hiQ v. LinkedIn ruling supports scraping public data, Indeed's aggressive legal stance makes it riskier than most targets. Consult a lawyer before scraping Indeed at scale.
Indeed uses Cloudflare's enterprise-tier bot protection including JavaScript challenges, browser fingerprinting, and behavioral analysis. Simple HTTP clients and default headless browser configurations are detected instantly. SnapRender's use_flaresolverr flag handles Cloudflare challenges automatically.
Indeed shows salary data on some listings — either employer-provided ranges or Indeed's estimated salary. You can extract these with CSS selectors targeting the salary metadata container. Note that not all listings include salary information, so your extraction should handle missing values gracefully.
SnapRender starts free with 100 requests/month. Paid plans begin at $9/month for 1,500 requests. Requests using FlareSolverr (needed for Indeed's Cloudflare protection) count as one request each — no credit multipliers.