Tutorial

How to Scrape Twitter/X Data in 2026

|11 min read

Twitter/X is one of the richest real-time data sources on the internet — but scraping it in 2026 is harder than ever. The official API now costs $100+/month, login walls block unauthenticated access, and anti-bot measures are aggressive. This tutorial covers how to extract tweets, engagement metrics, and profile data despite these challenges.

Why scrape Twitter/X?

Despite the platform's turbulence, Twitter/X remains the go-to source for real-time public discourse. Here is why developers and businesses scrape it:

1

Sentiment analysis

Track public opinion on brands, products, or events in real time. Feed tweet text into NLP models to gauge market mood.

2

Trend tracking

Identify emerging topics, hashtags, and viral content before they hit mainstream media. Build early-warning systems for your industry.

3

Social listening

Monitor brand mentions, competitor activity, and customer complaints. Respond faster than your competitors.

4

Research & datasets

Build datasets for academic research, machine learning training, or market analysis. Twitter data is uniquely timestamped and public.

The Twitter API cost problem

Twitter/X restructured its API pricing in 2023, and it has only gotten more expensive since. Here is what you are looking at in 2026:

TierPriceTweets/mo
Free$01,500 read (write only)
Basic$100/mo10,000 read
Pro$5,000/mo1,000,000 read
Enterprise$42,000+/moUnlimited

For most use cases — sentiment analysis on a few hundred tweets, tracking competitor mentions, or building a research dataset — $100/month for 10K tweets is overkill. Web scraping public tweet pages is the practical alternative.

Why Twitter/X is hard to scrape

Anti-scraping measures

  • !Login walls — most content now requires authentication to view
  • !Aggressive bot detection using browser fingerprinting and behavioral analysis
  • !Rate limiting with IP-level and account-level throttling
  • !React-based SPA — no data in the initial HTML response
  • !Dynamic data-testid attributes that shift between deployments
  • !Cloudflare-level protection on x.com with JavaScript challenges

Method 1: DIY with Playwright

Launch a headless browser, navigate to a tweet URL, and extract data from the rendered DOM. Playwright is preferred over Puppeteer here because of better anti-detection capabilities:

scraper.js
#E8A0BF">const { chromium } = #E8A0BF">require(#A8D4A0">'playwright');

(#E8A0BF">async () => {
  #E8A0BF">const browser = #E8A0BF">await chromium.#87CEEB">launch({ headless: #E8A0BF">true });
  #E8A0BF">const context = #E8A0BF">await browser.newContext({
    userAgent: #A8D4A0">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
      #A8D4A0">'AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36',
  });
  #E8A0BF">const page = #E8A0BF">await context.#87CEEB">newPage();

  // Navigate to a public tweet
  #E8A0BF">await page.#87CEEB">goto(
    #A8D4A0">'https://x.com/elonmusk/status/1234567890',
    { waitUntil: #A8D4A0">'networkidle', timeout: 30000 }
  );

  // Wait #E8A0BF">for tweet content to render
  #E8A0BF">await page.#87CEEB">waitForSelector(#A8D4A0">'[data-testid=#A8D4A0">"tweetText"]', {
    timeout: 15000,
  });

  // Extract tweet data
  #E8A0BF">const tweet = #E8A0BF">await page.evaluate(() => ({
    text: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"tweetText"]')
      ?.innerText,
    likes: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"like"] span')
      ?.innerText,
    retweets: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"retweet"] span')
      ?.innerText,
    replies: document.#87CEEB">querySelector(#A8D4A0">'[data-testid=#A8D4A0">"reply"] span')
      ?.innerText,
    views: document.#87CEEB">querySelector(
      #A8D4A0">'a[href*=#A8D4A0">"/analytics"] span'
    )?.innerText,
  }));

  console.#87CEEB">log(tweet);
  #E8A0BF">await browser.#87CEEB">close();
})();

This works for individual tweets, but scaling is painful. You will need rotating residential proxies, browser fingerprint randomization, and constant selector maintenance as Twitter/X updates its frontend. Most DIY scrapers break within weeks.

Method 2: SnapRender API

SnapRender handles the browser rendering, anti-bot bypass, and JavaScript execution. Use /render for markdown output or /extract for structured data.

Render as markdown

Get the full tweet content as LLM-ready markdown — perfect for sentiment analysis pipelines or archiving.

render.py
#E8A0BF">import requests

# Render a tweet page #E8A0BF">as clean markdown
resp = requests.#87CEEB">post(
    #A8D4A0">"https://api.snaprender.dev/v1/render",
    headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
    json={
        #A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
        #A8D4A0">"format": #A8D4A0">"markdown",
        #A8D4A0">"wait_for": #A8D4A0">"[data-testid=#A8D4A0">'tweetText']",
        #A8D4A0">"use_flaresolverr": #E8A0BF">True
    }
)
#E8A0BF">print(resp.#87CEEB">json()[#A8D4A0">"data"][#A8D4A0">"markdown"])

Extract structured data

Pull specific fields — text, likes, retweets, author — as clean JSON.

extract.py
#E8A0BF">import requests

# Extract structured tweet data #E8A0BF">with CSS selectors
resp = requests.#87CEEB">post(
    #A8D4A0">"https://api.snaprender.dev/v1/extract",
    headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
    json={
        #A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
        #A8D4A0">"use_flaresolverr": #E8A0BF">True,
        #A8D4A0">"selectors": {
            #A8D4A0">"text": #A8D4A0">"[data-testid=#A8D4A0">'tweetText']",
            #A8D4A0">"likes": #A8D4A0">"[data-testid=#A8D4A0">'like'] span",
            #A8D4A0">"retweets": #A8D4A0">"[data-testid=#A8D4A0">'retweet'] span",
            #A8D4A0">"replies": #A8D4A0">"[data-testid=#A8D4A0">'reply'] span",
            #A8D4A0">"author": #A8D4A0">"[data-testid=#A8D4A0">'User-Name'] a",
            #A8D4A0">"timestamp": #A8D4A0">"time"
        }
    }
)
#E8A0BF">print(resp.#87CEEB">json())

Example response

response.json
{
  #A8D4A0">"status": #A8D4A0">"success",
  #A8D4A0">"data": {
    #A8D4A0">"text": #A8D4A0">"The future of AI is going to be incredible...",
    #A8D4A0">"likes": #A8D4A0">"142.5K",
    #A8D4A0">"retweets": #A8D4A0">"18.3K",
    #A8D4A0">"replies": #A8D4A0">"12.1K",
    #A8D4A0">"author": #A8D4A0">"@elonmusk",
    #A8D4A0">"timestamp": #A8D4A0">"2026-04-10T14:32:00.000Z"
  },
  #A8D4A0">"url": #A8D4A0">"https://x.com/elonmusk/status/1234567890",
  #A8D4A0">"elapsed_ms": 4120
}

Practical use cases

Here are the most common ways developers use scraped Twitter/X data:

  • 1.Brand monitoring — track every mention of your brand or product. Feed tweets into an LLM to classify sentiment (positive, negative, neutral) and alert your team on spikes.
  • 2.Competitor intelligence — scrape competitor tweet engagement to understand what messaging resonates. Compare your engagement rates against theirs.
  • 3.Influencer discovery — find accounts with high engagement rates in your niche. Extract follower counts, average likes, and posting frequency to build outreach lists.
  • 4.Event tracking — monitor tweets about conferences, product launches, or breaking news. Build real-time dashboards that surface the most-engaged-with content.
  • 5.Academic research — build datasets of public discourse on specific topics. Twitter data is uniquely valuable because of its real-time, timestamped nature.

Legal considerations

Twitter/X scraping exists in a legal gray area. Here is what you should know:

  • 1.Twitter/X's Terms of Service explicitly prohibit scraping. They have sued scrapers in the past (e.g., the 2023 lawsuit against data scrapers).
  • 2.Public tweets are generally considered public data, but the method of access matters. Bypassing technical barriers may raise legal issues.
  • 3.GDPR applies if you are scraping tweets from EU users. Tweet text combined with handles constitutes personal data under EU law.
  • 4.Never scrape DMs, protected accounts, or non-public information. Only collect publicly visible data.
  • 5.Rate-limit your requests and do not disrupt the platform. Excessive scraping could be considered unauthorized access.

Start free — 100 requests/month

Skip the $100/mo Twitter API. Get your SnapRender API key in 30 seconds and start extracting tweet data with a single API call.

Get Your API Key

Frequently asked questions

Scraping publicly available tweets is generally legal in the US, but Twitter/X's Terms of Service prohibit automated data collection. The platform actively detects and blocks scrapers. Use scraped data responsibly, respect rate limits, and never collect private or direct message data. Consult a lawyer for commercial use cases.

Twitter/X's API now starts at $100/month for basic access (formerly free). The Pro tier costs $5,000/month. For many use cases — sentiment analysis, trend tracking, competitive research — the API cost is prohibitive. Web scraping offers an alternative for public data at a fraction of the cost.

Twitter/X now requires login to view most content. However, individual tweet pages and some profile pages are still accessible without authentication. SnapRender renders these public pages in a real browser, bypassing the login wall for publicly accessible content.

From public tweet pages, you can extract: tweet text, like count, retweet/repost count, reply count, bookmark count, view count, timestamp, author handle, author display name, and media URLs (images/videos). SnapRender's /extract endpoint pulls all of these with CSS selectors.