Why scrape Reddit?
Reddit hosts 50+ million daily active users across 100,000+ communities. It is an unmatched source of organic human conversation:
Sentiment analysis
Track how people feel about your brand, product, or industry. Reddit comments are raw and unfiltered — more honest than reviews.
Market research
Discover pain points, feature requests, and buying signals from communities like r/SaaS, r/startups, or niche subreddits.
Competitor monitoring
Track mentions of competitor products, pricing complaints, and migration patterns between tools.
Content ideas
Find trending topics, common questions, and gaps in existing content by analyzing top posts and comments.
The Reddit API problem
In June 2023, Reddit introduced API pricing at $0.24 per 1,000 requests — up from free. This killed Apollo, RIF, and dozens of other third-party apps overnight. The free tier is now limited to 100 queries per minute with OAuth (10 without).
For research and monitoring at scale, the API is no longer practical. Web scraping — either through headless browsers or rendering APIs — is the most cost-effective approach.
old.reddit.com is your friend
The legacy Reddit interface at old.reddit.com is server-rendered HTML with simple, stable CSS classes. It does not require JavaScript rendering, making it faster and easier to parse than the new React-based UI. Use it whenever possible.
Method 1: DIY with Python requests
Since old.reddit.com is server-rendered, you can scrape it with plain HTTP requests and BeautifulSoup — no headless browser needed:
#E8A0BF">import requests
#E8A0BF">from bs4 #E8A0BF">import BeautifulSoup
# old.reddit.com serves static HTML — no JS rendering needed
headers = {#A8D4A0">"User-Agent": #A8D4A0">"Mozilla/5.0 (research bot)"}
resp = requests.#87CEEB">get(
#A8D4A0">"https://old.reddit.com/r/webdev/top/?t=week",
headers=headers
)
soup = BeautifulSoup(resp.#87CEEB">text, #A8D4A0">"html.parser")
posts = soup.#87CEEB">select(#A8D4A0">".thing.link")
#E8A0BF">for post in posts[:10]:
title = post.#87CEEB">select_one(#A8D4A0">"a.title").#87CEEB">text
score = post.#87CEEB">select_one(#A8D4A0">".score.unvoted").#87CEEB">get(#A8D4A0">"title", #A8D4A0">"0")
comments = post.#87CEEB">select_one(#A8D4A0">".comments").#87CEEB">text
#E8A0BF">print(f#A8D4A0">"{score} | {title} | {comments}")This works for basic scraping, but has limitations:
Pain points
- !Reddit rate-limits requests aggressively — you will hit 429 errors quickly without throttling
- !New Reddit (www.reddit.com) requires full JS rendering — requests only gets the loading shell
- !Comment threads are paginated — you need to follow "load more comments" links recursively
- !Reddit changes HTML structure periodically, breaking your CSS selectors
- !No built-in handling for Reddit's anti-bot measures on high-traffic pages
Method 2: SnapRender API
SnapRender handles the rendering, rate limiting, and anti-bot challenges for you. Use /render for full page content as markdown, or /extract for structured data with CSS selectors.
Render as markdown
Get the full subreddit or thread page as clean markdown — works with both old.reddit.com and new Reddit.
#E8A0BF">import requests
# Render a subreddit page #E8A0BF">as clean markdown
render = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/render",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://www.reddit.com/r/webdev/top/?t=week",
#A8D4A0">"format": #A8D4A0">"markdown"
}
)
#E8A0BF">print(render.#87CEEB">json()[#A8D4A0">"data"][#A8D4A0">"markdown"])Extract structured data
Pull post titles, scores, authors, and comment counts as structured JSON arrays.
#E8A0BF">import requests
# Extract post data #E8A0BF">from old.reddit.com (simpler selectors)
extract = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/extract",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://old.reddit.com/r/webdev/top/?t=week",
#A8D4A0">"selectors": {
#A8D4A0">"titles": #A8D4A0">"a.title",
#A8D4A0">"scores": #A8D4A0">".score.unvoted",
#A8D4A0">"authors": #A8D4A0">".author",
#A8D4A0">"comment_counts": #A8D4A0">".comments"
}
}
)
#E8A0BF">print(extract.#87CEEB">json())Example response
{
#A8D4A0">"status": #A8D4A0">"success",
#A8D4A0">"data": {
#A8D4A0">"titles": [
#A8D4A0">"I replaced my entire CI/CD pipeline #E8A0BF">with 40 lines of bash",
#A8D4A0">"The mass layoffs are hitting differently in 2026",
#A8D4A0">"Show r/webdev: I built a free alternative to Vercel"
],
#A8D4A0">"scores": [#A8D4A0">"2847", #A8D4A0">"1923", #A8D4A0">"1456"],
#A8D4A0">"authors": [#A8D4A0">"u/devops_dan", #A8D4A0">"u/senior_dev_42", #A8D4A0">"u/indiehacker"],
#A8D4A0">"comment_counts": [#A8D4A0">"342 comments", #A8D4A0">"891 comments", #A8D4A0">"267 comments"]
},
#A8D4A0">"url": #A8D4A0">"https://old.reddit.com/r/webdev/top/?t=week",
#A8D4A0">"elapsed_ms": 1840
}Extracting comments from a thread
Individual threads contain the real gold — user opinions, detailed experiences, and direct product feedback. Use old.reddit.com thread URLs with the /extract endpoint to pull comment text, authors, and scores:
#E8A0BF">import requests
# Extract comments #E8A0BF">from a specific Reddit thread
extract = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/extract",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://old.reddit.com/r/webdev/comments/abc123/post_title/",
#A8D4A0">"selectors": {
#A8D4A0">"comments": #A8D4A0">".usertext-body .md",
#A8D4A0">"authors": #A8D4A0">".comment .author",
#A8D4A0">"scores": #A8D4A0">".comment .score"
}
}
)
#E8A0BF">print(extract.#87CEEB">json())For threads with hundreds of comments, consider rendering as markdown first — SnapRender captures the full rendered comment tree, including nested replies.
Start free — 100 requests/month
Get your API key in 30 seconds. Scrape Reddit posts and comments with a single API call. No browser infrastructure, no API rate limits.
Get Your API KeyFrequently asked questions
Scraping publicly visible Reddit content is generally protected under the hiQ v. LinkedIn precedent. However, Reddit's Terms of Service prohibit unauthorized automated access. After the 2023 API pricing changes, Reddit has become more aggressive about enforcement. Avoid scraping private communities, personal data, or content behind login walls.
In June 2023, Reddit raised API pricing from free to $0.24 per 1,000 calls, killing most third-party apps. The free tier is severely limited (100 queries/minute for OAuth, 10/minute without). For large-scale data collection, rendering and extracting from the HTML is often more practical than paying for API access.
old.reddit.com is significantly easier to parse because it uses server-rendered HTML with simple, stable CSS classes. New Reddit (www.reddit.com) is a React SPA that requires JavaScript rendering. If you are using SnapRender, either works — but old.reddit.com is faster because there is less JavaScript to execute.
SnapRender starts free with 100 requests/month. Paid plans begin at $9/month for 1,500 requests. Each Reddit page you render or extract from counts as one request. No credit multipliers.