Why scrape Amazon?
Thousands of businesses scrape Amazon every day for legitimate reasons:
Price monitoring
Track competitor pricing across thousands of ASINs. Adjust your own prices in real time.
Competitor analysis
Monitor new product launches, review velocity, and Best Sellers Rank changes.
Product research
Find high-demand, low-competition niches by analyzing ratings, reviews, and pricing gaps.
Method 1: DIY with Puppeteer
The most common starting point. Launch a headless browser, navigate to the product page, and extract data from the DOM. Here is a working example:
#E8A0BF">const puppeteer = #E8A0BF">require(#A8D4A0">'puppeteer');
(#E8A0BF">async () => {
#E8A0BF">const browser = #E8A0BF">await puppeteer.#87CEEB">launch({
headless: #A8D4A0">'new',
args: [#A8D4A0">'--no-sandbox', #A8D4A0">'-#FFB347">-disable-setuid-sandbox'],
});
#E8A0BF">const page = #E8A0BF">await browser.#87CEEB">newPage();
// Set a realistic user-agent to avoid instant blocks
#E8A0BF">await page.setUserAgent(
#A8D4A0">'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ' +
#A8D4A0">'AppleWebKit/537.36 (KHTML, like Gecko) ' +
#A8D4A0">'Chrome/124.0.0.0 Safari/537.36'
);
#E8A0BF">await page.#87CEEB">goto(
#A8D4A0">'https://www.amazon.com/dp/B0DGHZT3GF',
{ waitUntil: #A8D4A0">'networkidle2', timeout: 30000 }
);
// Wait #E8A0BF">for the product title to render
#E8A0BF">await page.#87CEEB">waitForSelector(#A8D4A0">'#productTitle', { timeout: 10000 });
#E8A0BF">const product = #E8A0BF">await page.evaluate(() => ({
title: document.#87CEEB">querySelector(#A8D4A0">'#productTitle')?.innerText.trim(),
price: document.#87CEEB">querySelector(#A8D4A0">'.a-price .a-offscreen')?.innerText,
rating: document.#87CEEB">querySelector(#A8D4A0">'#acrPopover .a-icon-alt')?.innerText,
reviews: document.#87CEEB">querySelector(#A8D4A0">'#acrCustomerReviewText')?.innerText,
}));
console.#87CEEB">log(product);
#E8A0BF">await browser.#87CEEB">close();
})();This works — until it doesn't. In practice, you will hit several pain points:
Pain points
- !Amazon detects and blocks headless Chromium within minutes at scale
- !CAPTCHAs appear randomly — you need a solving service ($2-5 per 1,000 solves)
- !IP rotation requires residential proxies ($10-50/GB depending on provider)
- !Product page layout changes break selectors without warning
- !Rate limiting kicks in fast — too many requests and your IP is burned
- !Memory overhead: each Puppeteer instance uses 200-400 MB of RAM
- !Maintenance burden: you are now running a browser fleet, not building your product
For a one-off scrape of 10 products? Puppeteer works fine. For a production pipeline monitoring thousands of ASINs daily? The infra cost and maintenance will eat you alive.
Method 2: SnapRender API
Same task, no browser to manage. SnapRender's /render endpoint returns the page as clean markdown, and the /extract endpoint pulls structured data with CSS selectors.
Render as markdown
Get the full product page content as LLM-ready markdown — perfect for feeding into AI pipelines or storing in a database.
#E8A0BF">import requests
# Render the product page #E8A0BF">as clean markdown
render = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/render",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://www.amazon.com/dp/B0DGHZT3GF",
#A8D4A0">"format": #A8D4A0">"markdown",
#A8D4A0">"use_flaresolverr": #E8A0BF">True
}
)
#E8A0BF">print(render.#87CEEB">json()[#A8D4A0">"data"][#A8D4A0">"markdown"])Extract structured data
Use CSS selectors to pull exactly the fields you need. Returns clean JSON — no HTML parsing on your end.
#E8A0BF">import requests
# Extract structured data #E8A0BF">with CSS selectors
extract = requests.#87CEEB">post(
#A8D4A0">"https://api.snaprender.dev/v1/extract",
headers={#A8D4A0">"x-api-key": #A8D4A0">"sr_live_YOUR_KEY"},
json={
#A8D4A0">"url": #A8D4A0">"https://www.amazon.com/dp/B0DGHZT3GF",
#A8D4A0">"use_flaresolverr": #E8A0BF">True,
#A8D4A0">"selectors": {
#A8D4A0">"title": #A8D4A0">"#productTitle",
#A8D4A0">"price": #A8D4A0">".a-price .a-offscreen",
#A8D4A0">"rating": #A8D4A0">"#acrPopover .a-icon-alt",
#A8D4A0">"reviews": #A8D4A0">"#acrCustomerReviewText"
}
}
)
#E8A0BF">print(extract.#87CEEB">json())Example response
{
#A8D4A0">"status": #A8D4A0">"success",
#A8D4A0">"data": {
#A8D4A0">"title": #A8D4A0">"Apple AirPods Pro (2nd Generation)",
#A8D4A0">"price": #A8D4A0">"$189.99",
#A8D4A0">"rating": #A8D4A0">"4.7 out of 5 stars",
#A8D4A0">"reviews": #A8D4A0">"82,431 ratings"
},
#A8D4A0">"url": #A8D4A0">"https://www.amazon.com/dp/B0DGHZT3GF",
#A8D4A0">"elapsed_ms": 2140
}Handling Amazon's anti-bot protection
Amazon uses a layered defense: browser fingerprinting, CAPTCHA challenges, IP reputation scoring, and request pattern analysis. Most simple HTTP clients and headless browsers get blocked within a handful of requests.
SnapRender integrates FlareSolverr, which routes your request through a real Chromium session that passes these checks. Just add use_flaresolverr: true to any request body. No extra config, no proxy management, no CAPTCHA-solving service.
When to use the flag
Not every Amazon request needs FlareSolverr. Try your request without it first. If you get a CAPTCHA page or empty response, add the flag. Requests with FlareSolverr take slightly longer (3-8 seconds vs. 1-2 seconds) because they run through a full browser session.
Legal considerations
Web scraping law is nuanced. Here is what you should know:
- 1.The hiQ v. LinkedIn ruling (2022) confirmed that scraping publicly available data is not a violation of the CFAA in the US. However, Amazon's Terms of Service explicitly prohibit automated access.
- 2.Always check robots.txt before scraping. Amazon's robots.txt disallows many paths — respect it where possible.
- 3.Rate-limit your requests. Hammering Amazon's servers can constitute a denial-of-service, which is illegal everywhere.
- 4.Never scrape personal data (customer names, emails, addresses). Stick to public product information.
- 5.If in doubt, consult a lawyer familiar with web scraping case law in your jurisdiction.
Start free — 100 requests/month
Get your API key in 30 seconds. Scrape Amazon product data with five lines of code. No credit card, no browser fleet, no proxy bills.
Get Your API KeyFrequently asked questions
Scraping publicly available data is generally legal in the US after the hiQ v. LinkedIn ruling, but Amazon's Terms of Service prohibit automated access. Use scraped data responsibly, respect robots.txt, rate-limit your requests, and never scrape personal user data. Consult a lawyer for your specific use case.
Amazon uses aggressive anti-bot measures including CAPTCHAs, IP blocking, and browser fingerprinting. SnapRender's use_flaresolverr flag handles these challenges automatically by routing requests through a real browser session. For DIY scrapers, you'd need rotating residential proxies and headless browser automation.
Product title, price (current and list), star rating, review count, ASIN, availability status, bullet-point features, product images, seller name, and Best Sellers Rank. SnapRender's /extract endpoint lets you target any of these with CSS selectors.
SnapRender starts free with 100 requests/month. Paid plans begin at $9/month for 1,500 requests. There are no credit multipliers — a scraping request costs the same as a screenshot or PDF. Each Amazon product page is one request.