Why scrape Twitter/X?
Despite the platform's turbulence, Twitter/X remains the go-to source for real-time public discourse. Here is why developers and businesses scrape it:
Sentiment analysis
Track public opinion on brands, products, or events in real time. Feed tweet text into NLP models to gauge market mood.
Trend tracking
Identify emerging topics, hashtags, and viral content before they hit mainstream media. Build early-warning systems for your industry.
Social listening
Monitor brand mentions, competitor activity, and customer complaints. Respond faster than your competitors.
Research & datasets
Build datasets for academic research, machine learning training, or market analysis. Twitter data is uniquely timestamped and public.
The Twitter API cost problem
Twitter/X restructured its API pricing in 2023, and it has only gotten more expensive since. Here is what you are looking at in 2026:
| Tier | Price | Tweets/mo |
|---|---|---|
| Free | $0 | 1,500 read (write only) |
| Basic | $100/mo | 10,000 read |
| Pro | $5,000/mo | 1,000,000 read |
| Enterprise | $42,000+/mo | Unlimited |
For most use cases — sentiment analysis on a few hundred tweets, tracking competitor mentions, or building a research dataset — $100/month for 10K tweets is overkill. Web scraping public tweet pages is the practical alternative.
Why Twitter/X is hard to scrape
Anti-scraping measures
- !Login walls — most content now requires authentication to view
- !Aggressive bot detection using browser fingerprinting and behavioral analysis
- !Rate limiting with IP-level and account-level throttling
- !React-based SPA — no data in the initial HTML response
- !Dynamic data-testid attributes that shift between deployments
- !Cloudflare-level protection on x.com with JavaScript challenges
Method 1: DIY with Playwright
Launch a headless browser, navigate to a tweet URL, and extract data from the rendered DOM. Playwright is preferred over Puppeteer here because of better anti-detection capabilities:
#E8A0BF">const { chromium } = #E8A0BF">require(#A8D4A0">'playwright');
(#E8A0BF">async () => {
#E8A0BF">const browser = #E8A0BF">await chromium.#87CEEB">launch({ headless: #E8A0BF">true });
#E8A0BF">const context = #E8A0BF">await browser.newContext({
userAgent: #A8D4A0">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
#A8D4A0">'AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36',
});
#E8A0BF">const page = #E8A0BF">await context.#87CEEB">newPage();
// Navigate to a public tweet
#E8A0BF">await page.#87CEEB">goto(
#A8D4A0">'https://x.com/elonmusk/status/1234567890',
{ waitUntil: #A8D4A0">'networkidle', timeout: 30000 }
);
// Wait #E8A0BF">for tweet content to render
#E8A0BF">await page.#87CEEB">waitForSelector(#A8D4A0">'[data-testid=#A8D4A0">"tweetText"]', {
timeout: 15000,
});
// Extract tweet data
#E8A0BF">const tweet = #E8A0BF">await page.evaluate(() => ({
text: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"tweetText"]')
?.innerText,
likes: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"like"] span')
?.innerText,
retweets: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"retweet"] span')
?.innerText,
replies: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"reply"] span')
?.innerText,
views: document.#87CEEB">querySelector(
#A8D4A0">'a[href*=#A8D4A0">"/analytics"] span'
)?.innerText,
}));
console.#87CEEB">log(tweet);
#E8A0BF">await browser.#87CEEB">close();
})();This works for individual tweets, but scaling is painful. You will need rotating residential proxies, browser fingerprint randomization, and constant selector maintenance as Twitter/X updates its frontend. Most DIY scrapers break within weeks.
Method 2: SnapRender API
SnapRender handles the browser rendering, anti-bot bypass, and JavaScript execution. Use /render for markdown output or /extract for structured data.
Render as markdown
Get the full tweet content as LLM-ready markdown — perfect for sentiment analysis pipelines or archiving.
#E8A0BF">import requests
# Render a tweet page #E8A0BF">as clean markdown
resp = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/render",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
#A8D4A0">"format": #A8D4A0">"markdown",
#A8D4A0">"wait_for": #A8D4A0">"[data-testid=#A8D4A0">'tweetText']",
#A8D4A0">"use_flaresolverr": #E8A0BF">True
}
)
#E8A0BF">print(resp.#87CEEB">json()[#A8D4A0">"data"][#A8D4A0">"markdown"])Extract structured data
Pull specific fields — text, likes, retweets, author — as clean JSON.
#E8A0BF">import requests
# Extract structured tweet data #E8A0BF">with CSS selectors
resp = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/extract",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
#A8D4A0">"use_flaresolverr": #E8A0BF">True,
#A8D4A0">"selectors": {
#A8D4A0">"text": #A8D4A0">"[data-testid=#A8D4A0">'tweetText']",
#A8D4A0">"likes": #A8D4A0">"[data-testid=#A8D4A0">'like'] span",
#A8D4A0">"retweets": #A8D4A0">"[data-testid=#A8D4A0">'retweet'] span",
#A8D4A0">"replies": #A8D4A0">"[data-testid=#A8D4A0">'reply'] span",
#A8D4A0">"author": #A8D4A0">"[data-testid=#A8D4A0">'User-Name'] a",
#A8D4A0">"timestamp": #A8D4A0">"time"
}
}
)
#E8A0BF">print(resp.#87CEEB">json())Example response
{
#A8D4A0">"status": #A8D4A0">"success",
#A8D4A0">"data": {
#A8D4A0">"text": #A8D4A0">"The future of AI is going to be incredible...",
#A8D4A0">"likes": #A8D4A0">"142.5K",
#A8D4A0">"retweets": #A8D4A0">"18.3K",
#A8D4A0">"replies": #A8D4A0">"12.1K",
#A8D4A0">"author": #A8D4A0">"@elonmusk",
#A8D4A0">"timestamp": #A8D4A0">"2026-04-10T14:32:00.000Z"
},
#A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
#A8D4A0">"elapsed_ms": 4120
}Practical use cases
Here are the most common ways developers use scraped Twitter/X data:
- 1.Brand monitoring — track every mention of your brand or product. Feed tweets into an LLM to classify sentiment (positive, negative, neutral) and alert your team on spikes.
- 2.Competitor intelligence — scrape competitor tweet engagement to understand what messaging resonates. Compare your engagement rates against theirs.
- 3.Influencer discovery — find accounts with high engagement rates in your niche. Extract follower counts, average likes, and posting frequency to build outreach lists.
- 4.Event tracking — monitor tweets about conferences, product launches, or breaking news. Build real-time dashboards that surface the most-engaged-with content.
- 5.Academic research — build datasets of public discourse on specific topics. Twitter data is uniquely valuable because of its real-time, timestamped nature.
Legal considerations
Twitter/X scraping exists in a legal gray area. Here is what you should know:
- 1.Twitter/X's Terms of Service explicitly prohibit scraping. They have sued scrapers in the past (e.g., the 2023 lawsuit against data scrapers).
- 2.Public tweets are generally considered public data, but the method of access matters. Bypassing technical barriers may raise legal issues.
- 3.GDPR applies if you are scraping tweets from EU users. Tweet text combined with handles constitutes personal data under EU law.
- 4.Never scrape DMs, protected accounts, or non-public information. Only collect publicly visible data.
- 5.Rate-limit your requests and do not disrupt the platform. Excessive scraping could be considered unauthorized access.
Start free — 100 requests/month
Skip the $100/mo Twitter API. Get your SnapRender API key in 30 seconds and start extracting tweet data with a single API call.
Get Your API KeyFrequently asked questions
Scraping publicly available tweets is generally legal in the US, but Twitter/X's Terms of Service prohibit automated data collection. The platform actively detects and blocks scrapers. Use scraped data responsibly, respect rate limits, and never collect private or direct message data. Consult a lawyer for commercial use cases.
Twitter/X's API now starts at $100/month for basic access (formerly free). The Pro tier costs $5,000/month. For many use cases — sentiment analysis, trend tracking, competitive research — the API cost is prohibitive. Web scraping offers an alternative for public data at a fraction of the cost.
Twitter/X now requires login to view most content. However, individual tweet pages and some profile pages are still accessible without authentication. SnapRender renders these public pages in a real browser, bypassing the login wall for publicly accessible content.
From public tweet pages, you can extract: tweet text, like count, retweet/repost count, reply count, bookmark count, view count, timestamp, author handle, author display name, and media URLs (images/videos). SnapRender's /extract endpoint pulls all of these with CSS selectors.