Guide

How to Scrape Job Boards in 2026

|14 min read

Job boards are goldmines for market research, salary benchmarking, and talent intelligence. This guide covers scraping Indeed, LinkedIn, Glassdoor, ZipRecruiter, and Dice -- the five largest job platforms in the US. Each has different anti-bot protections and DOM structures, but SnapRender handles them all.

What you will learn

1.Scraping Indeed listings
2.Glassdoor salary extraction
3.Multi-platform scraping
4.Salary range parsing
5.Job market trend analysis
6.Handling bot detection
7.Cross-platform comparison
8.Data export and analysis

1. Scraping Indeed

Indeed is the largest job board with millions of listings. Scrape job titles, companies, salaries, and locations:

indeed_scraper.py
#E8A0BF">import requests
#E8A0BF">import json
#E8A0BF">import time

API_KEY = #A8D4A0">"sr_live_YOUR_KEY"

#E8A0BF">def scrape_indeed(query, location, start=0):
    #A8D4A0">""#A8D4A0">"Scrape Indeed job listings"#A8D4A0">""
    url = (
        f#A8D4A0">"https://www.indeed.com/jobs"
        f#A8D4A0">"?q={query}&l={location}&start={start}"
    )

    resp = requests.post(
        #A8D4A0">"https://api.snaprender.dev/v1/extract",
        headers={
            #A8D4A0">"x-api-key": API_KEY,
            #A8D4A0">"Content-Type": #A8D4A0">"application/json"
        },
        json={
            #A8D4A0">"url": url,
            #A8D4A0">"selectors": {
                #A8D4A0">"titles": #A8D4A0">".jobTitle a span",
                #A8D4A0">"companies": #A8D4A0">".company_location [data-testid=#A8D4A0">'company-name']",
                #A8D4A0">"locations": #A8D4A0">".company_location [data-testid=#A8D4A0">'text-location']",
                #A8D4A0">"salaries": #A8D4A0">".salary-snippet-container",
                #A8D4A0">"summaries": #A8D4A0">".job-snippet",
                #A8D4A0">"dates": #A8D4A0">".date",
                #A8D4A0">"links": #A8D4A0">".jobTitle a::attr(href)"
            },
            #A8D4A0">"use_flaresolverr": true
        }
    )

    #E8A0BF">return resp.json()[#A8D4A0">"data"]

# Scrape #A8D4A0">"python developer" jobs #E8A0BF">in NYC
all_jobs = []
#E8A0BF">for page #E8A0BF">in range(5):
    data = scrape_indeed(#A8D4A0">"python+developer", #A8D4A0">"New+York+NY", page * 10)
    titles = data.get(#A8D4A0">"titles", [])

    #E8A0BF">for i #E8A0BF">in range(len(titles)):
        all_jobs.append({
            #A8D4A0">"title": titles[i],
            #A8D4A0">"company": data[#A8D4A0">"companies"][i] #E8A0BF">if i < len(data.get(#A8D4A0">"companies", [])) #E8A0BF">else #A8D4A0">"",
            #A8D4A0">"location": data[#A8D4A0">"locations"][i] #E8A0BF">if i < len(data.get(#A8D4A0">"locations", [])) #E8A0BF">else #A8D4A0">"",
            #A8D4A0">"salary": data[#A8D4A0">"salaries"][i] #E8A0BF">if i < len(data.get(#A8D4A0">"salaries", [])) #E8A0BF">else #A8D4A0">"",
        })

    #E8A0BF">print(f#A8D4A0">"Page {page + 1}: {len(titles)} jobs")
    time.sleep(3)  # polite delay

#E8A0BF">print(f#A8D4A0">"Total: {len(all_jobs)} jobs scraped")

2. Scraping Glassdoor

Glassdoor provides company ratings alongside job listings, making it valuable for employer research:

glassdoor_scraper.py
#E8A0BF">def scrape_glassdoor_jobs(query, location):
    #A8D4A0">""#A8D4A0">"Scrape Glassdoor job listings #E8A0BF">with salary data"#A8D4A0">""
    url = (
        f#A8D4A0">"https://www.glassdoor.com/Job/"
        f#A8D4A0">"{location}-{query}-jobs-SRCH_IL.0,8_IC1132348_KO9,25.htm"
    )

    resp = requests.post(
        #A8D4A0">"https://api.snaprender.dev/v1/extract",
        headers={
            #A8D4A0">"x-api-key": API_KEY,
            #A8D4A0">"Content-Type": #A8D4A0">"application/json"
        },
        json={
            #A8D4A0">"url": url,
            #A8D4A0">"selectors": {
                #A8D4A0">"titles": #A8D4A0">"[data-test=#A8D4A0">'job-title']",
                #A8D4A0">"companies": #A8D4A0">"[data-test=#A8D4A0">'emp-name']",
                #A8D4A0">"locations": #A8D4A0">"[data-test=#A8D4A0">'emp-location']",
                #A8D4A0">"salaries": #A8D4A0">"[data-test=#A8D4A0">'detailSalary']",
                #A8D4A0">"ratings": #A8D4A0">"[data-test=#A8D4A0">'rating']",
                #A8D4A0">"ages": #A8D4A0">"[data-test=#A8D4A0">'job-age']",
                #A8D4A0">"links": #A8D4A0">"[data-test=#A8D4A0">'job-title']::attr(href)"
            },
            #A8D4A0">"use_flaresolverr": true
        }
    )

    #E8A0BF">return resp.json()[#A8D4A0">"data"]

glassdoor_jobs = scrape_glassdoor_jobs(
    #A8D4A0">"software-engineer", #A8D4A0">"new-york-city"
)
#E8A0BF">print(f#A8D4A0">"Found {len(glassdoor_jobs.get(#A8D4A0">'titles', []))} Glassdoor jobs")

3. Multi-platform scraping

Scrape the same job search across multiple platforms to get comprehensive market coverage:

multi_platform.py
#E8A0BF">def scrape_all_platforms(query, location):
    #A8D4A0">""#A8D4A0">"Scrape the same job search across multiple platforms"#A8D4A0">""
    platforms = {
        #A8D4A0">"indeed": f#A8D4A0">"https://www.indeed.com/jobs?q={query}&l={location}",
        #A8D4A0">"ziprecruiter": f#A8D4A0">"https://www.ziprecruiter.com/jobs-search?search={query}&location={location}",
        #A8D4A0">"dice": f#A8D4A0">"https://www.dice.com/jobs?q={query}&location={location}",
    }

    selectors_map = {
        #A8D4A0">"indeed": {
            #A8D4A0">"titles": #A8D4A0">".jobTitle a span",
            #A8D4A0">"companies": #A8D4A0">"[data-testid=#A8D4A0">'company-name']",
            #A8D4A0">"salaries": #A8D4A0">".salary-snippet-container",
        },
        #A8D4A0">"ziprecruiter": {
            #A8D4A0">"titles": #A8D4A0">".job_title a",
            #A8D4A0">"companies": #A8D4A0">".job_org",
            #A8D4A0">"salaries": #A8D4A0">".job_salary",
        },
        #A8D4A0">"dice": {
            #A8D4A0">"titles": #A8D4A0">".card-title-link",
            #A8D4A0">"companies": #A8D4A0">".card-company a",
            #A8D4A0">"salaries": #A8D4A0">".card-salary",
        },
    }

    all_results = {}
    #E8A0BF">for platform, url #E8A0BF">in platforms.items():
        #E8A0BF">try:
            resp = requests.post(
                #A8D4A0">"https://api.snaprender.dev/v1/extract",
                headers={
                    #A8D4A0">"x-api-key": API_KEY,
                    #A8D4A0">"Content-Type": #A8D4A0">"application/json"
                },
                json={
                    #A8D4A0">"url": url,
                    #A8D4A0">"selectors": selectors_map[platform],
                    #A8D4A0">"use_flaresolverr": #E8A0BF">True
                }
            )
            data = resp.json()[#A8D4A0">"data"]
            count = len(data.get(#A8D4A0">"titles", []))
            all_results[platform] = data
            #E8A0BF">print(f#A8D4A0">"{platform}: {count} jobs found")
            time.sleep(3)
        #E8A0BF">except Exception #E8A0BF">as e:
            #E8A0BF">print(f#A8D4A0">"{platform}: Error - {e}")

    #E8A0BF">return all_results

results = scrape_all_platforms(#A8D4A0">"data+engineer", #A8D4A0">"San+Francisco+CA")

Pro tip

Deduplicate results across platforms by matching on company name + job title. The same position often appears on 3-4 job boards. Deduplication gives a true count of unique open roles.

4. Salary analysis

Parse salary ranges and analyze compensation trends across scraped listings:

salary_analysis.py
#E8A0BF">import pandas #E8A0BF">as pd
#E8A0BF">import re

#E8A0BF">def parse_salary(salary_str):
    #A8D4A0">""#A8D4A0">"Extract min/max salary #E8A0BF">from text like #A8D4A0">'$80K - $120K'"#A8D4A0">""
    #E8A0BF">if #E8A0BF">not salary_str:
        #E8A0BF">return #E8A0BF">None, #E8A0BF">None
    numbers = re.findall(r#A8D4A0">"[\d,]+\.?\d*", salary_str.replace(#A8D4A0">"K", #A8D4A0">"000"))
    #E8A0BF">if len(numbers) >= 2:
        #E8A0BF">return float(numbers[0].replace(#A8D4A0">",", #A8D4A0">"")), float(numbers[1].replace(#A8D4A0">",", #A8D4A0">""))
    #E8A0BF">elif len(numbers) == 1:
        val = float(numbers[0].replace(#A8D4A0">",", #A8D4A0">""))
        #E8A0BF">return val, val
    #E8A0BF">return #E8A0BF">None, #E8A0BF">None

# Analyze salary data across job listings
df = pd.DataFrame(all_jobs)

salary_data = df[#A8D4A0">"salary"].apply(
    #E8A0BF">lambda x: pd.Series(parse_salary(x), index=[#A8D4A0">"min_salary", #A8D4A0">"max_salary"])
)
df = pd.concat([df, salary_data], axis=1)
df_with_salary = df.dropna(subset=[#A8D4A0">"min_salary"])

#E8A0BF">print(#A8D4A0">"=== Python Developer Jobs - NYC ===")
#E8A0BF">print(f#A8D4A0">"Total jobs:       {len(df)}")
#E8A0BF">print(f#A8D4A0">"With salary:      {len(df_with_salary)} ({len(df_with_salary)/len(df)*100:.0f}%)")

#E8A0BF">if len(df_with_salary) > 0:
    df_with_salary[#A8D4A0">"mid_salary"] = (
        df_with_salary[#A8D4A0">"min_salary"] + df_with_salary[#A8D4A0">"max_salary"]
    ) / 2
    #E8A0BF">print(f#A8D4A0">"Median salary:    $" + f#A8D4A0">"{df_with_salary[#A8D4A0">'mid_salary'].median():,.0f}")
    #E8A0BF">print(f#A8D4A0">"Salary range:     $" + f#A8D4A0">"{df_with_salary[#A8D4A0">'min_salary'].min():,.0f} - $" + f#A8D4A0">"{df_with_salary[#A8D4A0">'max_salary'].max():,.0f}")

    # Top paying companies
    top = df_with_salary.nlargest(10, #A8D4A0">"mid_salary")[[#A8D4A0">"title", #A8D4A0">"company", #A8D4A0">"salary"]]
    #E8A0BF">print(#A8D4A0">"\n=== Top Paying Jobs ===")
    #E8A0BF">print(top.to_string(index=#E8A0BF">False))

df.to_csv(#A8D4A0">"job_listings.csv", index=#E8A0BF">False)

Scrape job boards without getting blocked

SnapRender handles bot detection, JavaScript rendering, and structured data extraction across all major job boards. One API for Indeed, Glassdoor, ZipRecruiter, and more.

Get Your API Key — Free

Frequently asked questions

Most job boards prohibit automated scraping in their ToS. The HiQ v. LinkedIn case (2022) established that scraping publicly accessible data is not a CFAA violation, but ToS breaches can still trigger civil claims. Use scraped job data for personal research, market analysis, or academic purposes only.

Indeed and Dice have relatively simple HTML structures. LinkedIn and Glassdoor use aggressive bot detection and require JavaScript rendering. ZipRecruiter uses moderate protection. All major job boards require headless browser rendering for complete data extraction.

Use rotating proxies, add random delays between requests (2-5 seconds), rotate user agents, and use a rendering API like SnapRender that handles bot detection automatically. Avoid scraping during peak hours and limit request volume to reasonable levels.

Many job listings now include salary ranges due to pay transparency laws. You can extract salary data where posted. For jobs without listed salaries, platforms like Glassdoor and Levels.fyi provide estimated compensation ranges that can be scraped separately.

For job market research, weekly scraping provides good trend data. For job aggregation, daily scraping catches new postings. For time-sensitive applications, scraping every few hours can help identify new listings quickly. Always respect rate limits.