1. Your first scraper with Colly
Colly is the most popular Go scraping framework. Install it and write a working scraper in minutes:
go get github.com/gocolly/colly/v2package main
import (
"fmt"
"github.com/gocolly/colly/v2"
)
func main() {
c := colly.NewCollector(
colly.UserAgent("Mozilla/5.0 (compatible; MyBot/1.0)"),
)
// Extract book titles and prices
c.OnHTML("article.product_pod", func(e *colly.HTMLElement) {
title := e.ChildAttr("h3 a", "title")
price := e.ChildText(".price_color")
fmt.Printf("%s: %s\n", title, price)
})
// Handle errors
c.OnError(func(r *colly.Response, err error) {
fmt.Printf("Error on %s: %v\n", r.Request.URL, err)
})
// Start scraping
c.Visit("https://books.toscrape.com/")
}Colly uses a callback pattern: you register handlers for HTML elements, errors, and responses. The OnHTML callback fires for every element matching the CSS selector.
2. Lower-level scraping with goquery
When you need more control over HTTP requests, use net/http with goquery for parsing:
go get github.com/PuerkitoBio/goquerypackage main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
type Book struct {
Title string
Price string
}
func main() {
// Create HTTP client with custom headers
client := &http.Client{}
req, _ := http.NewRequest("GET", "https://books.toscrape.com/", nil)
req.Header.Set("User-Agent",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/124.0.0.0")
resp, err := client.Do(req)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
// Parse with goquery
doc, err := goquery.NewDocumentFromReader(resp.Body)
if err != nil {
log.Fatal(err)
}
// Extract data with CSS selectors
var books []Book
doc.Find("article.product_pod").Each(func(i int, s *goquery.Selection) {
title, _ := s.Find("h3 a").Attr("title")
price := s.Find(".price_color").Text()
books = append(books, Book{Title: title, Price: price})
})
for _, book := range books[:5] {
fmt.Printf("%s: %s\n", book.Title, book.Price)
}
}3. Concurrent scraping with rate limiting
Go's goroutines make concurrent scraping trivial. Colly has built-in async mode with rate limiting:
package main
import (
"fmt"
"sync"
"github.com/gocolly/colly/v2"
)
func main() {
c := colly.NewCollector(
colly.Async(true),
)
// Rate limit: 2 requests/second per domain
c.Limit(&colly.LimitRule{
DomainGlob: "*",
Parallelism: 5,
Delay: 500 * time.Millisecond,
RandomDelay: 1 * time.Second,
})
var mu sync.Mutex
var results []map[string]string
c.OnHTML("article.product_pod", func(e *colly.HTMLElement) {
mu.Lock()
results = append(results, map[string]string{
"title": e.ChildAttr("h3 a", "title"),
"price": e.ChildText(".price_color"),
})
mu.Unlock()
})
// Follow pagination
c.OnHTML("li.next a", func(e *colly.HTMLElement) {
e.Request.Visit(e.Attr("href"))
})
c.Visit("https://books.toscrape.com/")
c.Wait()
fmt.Printf("Scraped %d books\n", len(results))
}Pro tip
Always use a mutex when appending to shared slices from concurrent callbacks. Colly's OnHTML handlers can fire from multiple goroutines simultaneously.
4. JavaScript pages with SnapRender
Colly and goquery cannot execute JavaScript. For React, Vue, and Angular SPAs, use SnapRender to get fully-rendered content via a simple HTTP call:
Render as markdown
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
// Render any page as clean markdown (handles JS)
payload, _ := json.Marshal(map[string]string{
"url": "https://example.com/spa-page",
"format": "markdown",
})
req, _ := http.NewRequest("POST",
"https://api.snaprender.dev/v1/render",
bytes.NewBuffer(payload))
req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
req.Header.Set("Content-Type", "application/json")
resp, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result map[string]interface{}
json.Unmarshal(body, &result)
data := result["data"].(map[string]interface{})
fmt.Println(data["markdown"])
}Extract structured data
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
)
func main() {
// Extract structured data with CSS selectors
payload, _ := json.Marshal(map[string]interface{}{
"url": "https://example.com/products/widget-pro",
"selectors": map[string]string{
"name": "h1.product-title",
"price": ".price-current",
"rating": ".star-rating",
"description": ".product-description p",
"in_stock": ".availability-status",
},
})
req, _ := http.NewRequest("POST",
"https://api.snaprender.dev/v1/extract",
bytes.NewBuffer(payload))
req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
fmt.Println(string(body))
}Bypass anti-bot protection
// Bypass Cloudflare / anti-bot protection
payload, _ := json.Marshal(map[string]interface{}{
"url": "https://protected-site.com/data",
"format": "markdown",
"use_flaresolverr": true,
})
req, _ := http.NewRequest("POST",
"https://api.snaprender.dev/v1/render",
bytes.NewBuffer(payload))
req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
// Returns fully rendered content even behind
// Cloudflare, DataDome, etc.
fmt.Println(string(body))Comparison: when to use what
| Approach | Best for | Limitation |
|---|---|---|
| Colly | Static sites, crawling, speed | No JS rendering |
| goquery + net/http | Custom requests, full control | No JS rendering |
| chromedp | JS pages, local browser | High RAM, complexity |
| SnapRender API | JS + anti-bot + scale | API cost at high volume |
Skip the browser infrastructure
SnapRender handles JavaScript rendering, anti-bot bypass, and data extraction. One HTTP call from your Go program — no chromedp, no browser binaries.
Get Your API Key — FreeFrequently asked questions
Go is excellent for high-performance web scraping. Goroutines make concurrent scraping trivial (thousands of parallel requests with minimal RAM), the standard library includes a solid HTTP client, and Colly is one of the fastest scraping frameworks in any language. Go scrapers compile to a single binary — no dependency management on servers.
Colly is the most popular Go scraping framework. It provides a callback-based API for visiting pages, extracting data with CSS selectors (via goquery), following links, handling cookies, rate limiting, and caching. It supports parallel scraping out of the box and is used in production by companies processing millions of pages.
Go is significantly faster for CPU-bound parsing and uses less memory per concurrent request (goroutines vs threads). Python has more scraping libraries and is easier to prototype with. Go wins for production scrapers processing millions of pages; Python wins for quick scripts and data analysis pipelines.
The standard net/http package and Colly cannot execute JavaScript. You can use chromedp (a Go Chrome DevTools Protocol client) for headless browser scraping, or use SnapRender's API to offload JavaScript rendering entirely — just send a URL via net/http and get back rendered HTML or markdown.
Goroutines and channels are Go's killer feature for scraping. Use a worker pool pattern: spawn N goroutines reading URLs from a channel, each making HTTP requests and sending results back through another channel. Colly has built-in parallelism with configurable concurrency limits via Async() and Limit().