Guide

Web Scraping for Market Research

|13 min read

Market research traditionally costs thousands in analyst reports and survey tools. Web scraping gives you direct access to competitor pricing, customer sentiment, product catalogs, and market trends -- updated automatically and customized to your exact competitive landscape. This guide covers the complete market research scraping pipeline.

What you will learn

1.Competitor pricing scraping
2.Review sentiment analysis
3.Product catalog tracking
4.Market trend reports
5.Feature comparison extraction
6.Content strategy analysis
7.Automated intelligence reports
8.Multi-competitor monitoring

1. Competitor pricing analysis

Scrape competitor pricing pages to understand their plan structures, feature bundles, and price points:

pricing_scraper.py
#E8A0BF">import requests
#E8A0BF">import json
#E8A0BF">import time
#E8A0BF">from datetime #E8A0BF">import datetime

API_KEY = #A8D4A0">"sr_live_YOUR_KEY"

#E8A0BF">def scrape_competitor_pricing(competitor_url, selectors):
    #A8D4A0">""#A8D4A0">"Scrape product pricing #E8A0BF">from a competitor site"#A8D4A0">""

    resp = requests.post(
        #A8D4A0">"https://api.snaprender.dev/v1/extract",
        headers={
            #A8D4A0">"x-api-key": API_KEY,
            #A8D4A0">"Content-Type": #A8D4A0">"application/json"
        },
        json={
            #A8D4A0">"url": competitor_url,
            #A8D4A0">"selectors": selectors,
            #A8D4A0">"use_flaresolverr": #E8A0BF">True
        }
    )

    #E8A0BF">return resp.json()[#A8D4A0">"data"]

# Define competitor product pages
competitors = {
    #A8D4A0">"competitor_a": {
        #A8D4A0">"pricing_url": #A8D4A0">"https://competitor-a.com/pricing",
        #A8D4A0">"selectors": {
            #A8D4A0">"plan_names": #A8D4A0">".pricing-card h3",
            #A8D4A0">"prices": #A8D4A0">".pricing-card .price",
            #A8D4A0">"features": #A8D4A0">".pricing-card .feature-list li",
            #A8D4A0">"cta_text": #A8D4A0">".pricing-card .cta-button"
        }
    },
    #A8D4A0">"competitor_b": {
        #A8D4A0">"pricing_url": #A8D4A0">"https://competitor-b.com/pricing",
        #A8D4A0">"selectors": {
            #A8D4A0">"plan_names": #A8D4A0">".plan-header h2",
            #A8D4A0">"prices": #A8D4A0">".plan-header .amount",
            #A8D4A0">"features": #A8D4A0">".plan-features li",
            #A8D4A0">"cta_text": #A8D4A0">".plan-cta"
        }
    }
}

pricing_data = {}
#E8A0BF">for name, config #E8A0BF">in competitors.items():
    data = scrape_competitor_pricing(
        config[#A8D4A0">"pricing_url"],
        config[#A8D4A0">"selectors"]
    )
    pricing_data[name] = data
    #E8A0BF">print(f#A8D4A0">"{name}: {len(data.get(#A8D4A0">'plan_names', []))} plans found")
    time.sleep(2)

#E8A0BF">print(json.dumps(pricing_data, indent=2))

2. Review sentiment analysis

Scrape competitor reviews from G2, Capterra, or Trustpilot to understand customer sentiment and pain points:

review_analysis.py
#E8A0BF">from collections #E8A0BF">import Counter

#E8A0BF">def scrape_competitor_reviews(product_url, review_selectors):
    #A8D4A0">""#A8D4A0">"Scrape #E8A0BF">and analyze competitor product reviews"#A8D4A0">""

    resp = requests.post(
        #A8D4A0">"https://api.snaprender.dev/v1/extract",
        headers={
            #A8D4A0">"x-api-key": API_KEY,
            #A8D4A0">"Content-Type": #A8D4A0">"application/json"
        },
        json={
            #A8D4A0">"url": product_url,
            #A8D4A0">"selectors": review_selectors,
            #A8D4A0">"use_flaresolverr": #E8A0BF">True
        }
    )

    #E8A0BF">return resp.json()[#A8D4A0">"data"]

# Scrape G2 reviews #E8A0BF">for a competitor
g2_selectors = {
    #A8D4A0">"titles": #A8D4A0">".review-title",
    #A8D4A0">"ratings": #A8D4A0">".star-rating::attr(#E8A0BF">class)",
    #A8D4A0">"pros": #A8D4A0">".review-pros p",
    #A8D4A0">"cons": #A8D4A0">".review-cons p",
    #A8D4A0">"dates": #A8D4A0">".review-date",
    #A8D4A0">"roles": #A8D4A0">".reviewer-role"
}

reviews = scrape_competitor_reviews(
    #A8D4A0">"https://www.g2.com/products/competitor-tool/reviews",
    g2_selectors
)

# Simple sentiment analysis #E8A0BF">from pros/cons
pros = reviews.get(#A8D4A0">"pros", [])
cons = reviews.get(#A8D4A0">"cons", [])

# Extract common themes
#E8A0BF">def extract_themes(texts, top_n=10):
    #A8D4A0">""#A8D4A0">"Find most common words #E8A0BF">in review text"#A8D4A0">""
    stop_words = {#A8D4A0">"the",#A8D4A0">"a",#A8D4A0">"an",#A8D4A0">"#E8A0BF">is",#A8D4A0">"it",#A8D4A0">"to",#A8D4A0">"#E8A0BF">and",#A8D4A0">"of",#A8D4A0">"#E8A0BF">for",#A8D4A0">"#E8A0BF">in",#A8D4A0">"on",#A8D4A0">"#E8A0BF">with",#A8D4A0">"that",#A8D4A0">"this",#A8D4A0">"was",#A8D4A0">"are",#A8D4A0">"but",#A8D4A0">"#E8A0BF">not",#A8D4A0">"very",#A8D4A0">"have",#A8D4A0">"has"}
    words = []
    #E8A0BF">for text #E8A0BF">in texts:
        words.extend([
            w.lower() #E8A0BF">for w #E8A0BF">in text.split()
            #E8A0BF">if len(w) > 3 #E8A0BF">and w.lower() #E8A0BF">not #E8A0BF">in stop_words
        ])
    #E8A0BF">return Counter(words).most_common(top_n)

#E8A0BF">print(#A8D4A0">"=== Competitor Review Analysis ===")
#E8A0BF">print(f#A8D4A0">"Total reviews:  {len(reviews.get(#A8D4A0">'titles', []))}")
#E8A0BF">print(f#A8D4A0">"\nTop POSITIVE themes:")
#E8A0BF">for word, count #E8A0BF">in extract_themes(pros):
    #E8A0BF">print(f#A8D4A0">"  {word}: {count}")
#E8A0BF">print(f#A8D4A0">"\nTop NEGATIVE themes:")
#E8A0BF">for word, count #E8A0BF">in extract_themes(cons):
    #E8A0BF">print(f#A8D4A0">"  {word}: {count}")

3. Product catalog tracking

Track competitor product catalogs over time to spot new launches, discontinued products, and category shifts:

catalog_tracker.py
#E8A0BF">import pandas #E8A0BF">as pd

#E8A0BF">def track_product_catalog(site_name, category_url, selectors):
    #A8D4A0">""#A8D4A0">"Scrape a competitor product catalog #E8A0BF">for changes"#A8D4A0">""

    resp = requests.post(
        #A8D4A0">"https://api.snaprender.dev/v1/extract",
        headers={
            #A8D4A0">"x-api-key": API_KEY,
            #A8D4A0">"Content-Type": #A8D4A0">"application/json"
        },
        json={
            #A8D4A0">"url": category_url,
            #A8D4A0">"selectors": selectors,
            #A8D4A0">"use_flaresolverr": #E8A0BF">True
        }
    )

    data = resp.json()[#A8D4A0">"data"]
    names = data.get(#A8D4A0">"names", [])

    products = []
    #E8A0BF">for i #E8A0BF">in range(len(names)):
        products.append({
            #A8D4A0">"site": site_name,
            #A8D4A0">"name": names[i],
            #A8D4A0">"price": data[#A8D4A0">"prices"][i] #E8A0BF">if i < len(data.get(#A8D4A0">"prices", [])) #E8A0BF">else #A8D4A0">"",
            #A8D4A0">"category": data[#A8D4A0">"categories"][i] #E8A0BF">if i < len(data.get(#A8D4A0">"categories", [])) #E8A0BF">else #A8D4A0">"",
            #A8D4A0">"date_scraped": datetime.now().strftime(#A8D4A0">"%Y-%m-%d"),
        })

    #E8A0BF">return products

# Track competitor catalog over time
current = track_product_catalog(
    #A8D4A0">"competitor_a",
    #A8D4A0">"https://competitor-a.com/products",
    {
        #A8D4A0">"names": #A8D4A0">".product-card h3",
        #A8D4A0">"prices": #A8D4A0">".product-card .price",
        #A8D4A0">"categories": #A8D4A0">".product-card .category-badge",
    }
)

# Compare #E8A0BF">with previous scrape
#E8A0BF">try:
    previous = pd.read_csv(#A8D4A0">"catalog_competitor_a.csv")
    prev_names = set(previous[#A8D4A0">"name"])
    curr_names = set(p[#A8D4A0">"name"] #E8A0BF">for p #E8A0BF">in current)

    new_products = curr_names - prev_names
    removed_products = prev_names - curr_names

    #E8A0BF">print(f#A8D4A0">"New products:     {len(new_products)}")
    #E8A0BF">print(f#A8D4A0">"Removed products: {len(removed_products)}")

    #E8A0BF">for p #E8A0BF">in new_products:
        #E8A0BF">print(f#A8D4A0">"  NEW: {p}")
    #E8A0BF">for p #E8A0BF">in removed_products:
        #E8A0BF">print(f#A8D4A0">"  REMOVED: {p}")
#E8A0BF">except FileNotFoundError:
    #E8A0BF">print(#A8D4A0">"First scrape - no previous data to compare")

pd.DataFrame(current).to_csv(#A8D4A0">"catalog_competitor_a.csv", index=#E8A0BF">False)

Pro tip

Track catalog changes weekly and feed the delta into a Slack channel. When a competitor launches a new product or changes pricing, your team knows immediately -- not weeks later from an analyst report.

4. Market intelligence reports

Combine all data sources into an automated market intelligence report:

market_report.py
#E8A0BF">import pandas #E8A0BF">as pd

#E8A0BF">def generate_market_report(pricing_data, review_data, catalog_data):
    #A8D4A0">""#A8D4A0">"Generate a comprehensive market research report"#A8D4A0">""

    #E8A0BF">print(#A8D4A0">"=" * 50)
    #E8A0BF">print(#A8D4A0">"MARKET INTELLIGENCE REPORT")
    #E8A0BF">print(f#A8D4A0">"Generated: {datetime.now():%Y-%m-%d %H:%M}")
    #E8A0BF">print(#A8D4A0">"=" * 50)

    # Pricing comparison
    #E8A0BF">print(#A8D4A0">"\n--- PRICING LANDSCAPE ---")
    #E8A0BF">for competitor, data #E8A0BF">in pricing_data.items():
        plans = data.get(#A8D4A0">"plan_names", [])
        prices = data.get(#A8D4A0">"prices", [])
        #E8A0BF">print(f#A8D4A0">"\n{competitor}:")
        #E8A0BF">for i #E8A0BF">in range(len(plans)):
            price = prices[i] #E8A0BF">if i < len(prices) #E8A0BF">else #A8D4A0">"N/A"
            #E8A0BF">print(f#A8D4A0">"  {plans[i]}: {price}")

    # Review sentiment summary
    #E8A0BF">print(#A8D4A0">"\n--- SENTIMENT ANALYSIS ---")
    #E8A0BF">print(#A8D4A0">"Top complaints across competitors:")
    all_cons = []
    #E8A0BF">for comp, reviews #E8A0BF">in review_data.items():
        all_cons.extend(reviews.get(#A8D4A0">"cons", []))
    #E8A0BF">for theme, count #E8A0BF">in extract_themes(all_cons, 5):
        #E8A0BF">print(f#A8D4A0">"  {theme}: mentioned {count} times")

    # Catalog velocity
    #E8A0BF">print(#A8D4A0">"\n--- PRODUCT CATALOG VELOCITY ---")
    #E8A0BF">for comp, catalog #E8A0BF">in catalog_data.items():
        df = pd.DataFrame(catalog)
        #E8A0BF">print(f#A8D4A0">"{comp}: {len(df)} products tracked")

    #E8A0BF">print(#A8D4A0">"\n--- OPPORTUNITIES ---")
    #E8A0BF">print(#A8D4A0">"1. Gaps #E8A0BF">in competitor offerings (features they lack)")
    #E8A0BF">print(#A8D4A0">"2. Common pain points (#E8A0BF">from review analysis)")
    #E8A0BF">print(#A8D4A0">"3. Price positioning opportunities")
    #E8A0BF">print(#A8D4A0">"4. New market segments (#E8A0BF">from catalog trends)")

# Run the report
generate_market_report(pricing_data, {#A8D4A0">"comp_a": reviews}, {#A8D4A0">"comp_a": current})

Build your market research engine

SnapRender handles JavaScript rendering, bot detection, and structured data extraction. Monitor competitors and generate market intelligence with a single API.

Get Your API Key — Free

Frequently asked questions

Scraping publicly available data for market research and competitive analysis is a standard business practice. Many companies, including hedge funds and consulting firms, use web scraping at scale. Respect each site's ToS, use polite scraping rates, and use data for internal analysis only.

Competitor pricing and product catalogs, customer reviews and sentiment, product launch frequency, feature comparisons, market positioning (taglines, value props), content strategy (blog topics, keywords), and technology stack choices. Combine multiple data sources for comprehensive intelligence.

Use sentiment analysis (TextBlob, VADER, or a language model) to classify reviews as positive, negative, or neutral. Extract common themes by counting frequent nouns and adjectives. Track sentiment over time to spot quality issues or improvements in competitor products.

Pricing: daily to weekly. Product catalog changes: weekly. Reviews: weekly to monthly. Content strategy: monthly. Technology stack: quarterly. Market trends: depends on your industry velocity -- SaaS moves fast (weekly), manufacturing moves slowly (monthly).

Yes, monitoring publicly displayed prices is a common competitive practice. Retailers, airlines, and SaaS companies all monitor competitor pricing. Do not scrape prices that require login or are personalized. Focus on publicly listed prices that any customer would see.