Guide

Web Scraping with Go: The Complete Guide

|14 min read

Go is the performance choice for web scraping. Goroutines handle thousands of concurrent requests with minimal memory, Colly provides a battle-tested scraping framework, and goquery brings jQuery-like CSS selectors to server-side parsing. This guide covers everything from your first Colly scraper to production-grade concurrent pipelines with SnapRender.

What you will learn

1.Colly framework basics
2.goquery CSS selectors
3.net/http for custom requests
4.Concurrent scraping patterns
5.Rate limiting with Colly
6.Handling JS-rendered pages
7.Anti-bot bypass with SnapRender
8.Structured data extraction

1. Your first scraper with Colly

Colly is the most popular Go scraping framework. Install it and write a working scraper in minutes:

terminal
go get github.com/gocolly/colly/v2
main.go
package main

import (
    "fmt"
    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(
        colly.UserAgent("Mozilla/5.0 (compatible; MyBot/1.0)"),
    )

    // Extract book titles and prices
    c.OnHTML("article.product_pod", func(e *colly.HTMLElement) {
        title := e.ChildAttr("h3 a", "title")
        price := e.ChildText(".price_color")
        fmt.Printf("%s: %s\n", title, price)
    })

    // Handle errors
    c.OnError(func(r *colly.Response, err error) {
        fmt.Printf("Error on %s: %v\n", r.Request.URL, err)
    })

    // Start scraping
    c.Visit("https://books.toscrape.com/")
}

Colly uses a callback pattern: you register handlers for HTML elements, errors, and responses. The OnHTML callback fires for every element matching the CSS selector.

2. Lower-level scraping with goquery

When you need more control over HTTP requests, use net/http with goquery for parsing:

terminal
go get github.com/PuerkitoBio/goquery
goquery_scraper.go
package main

import (
    "fmt"
    "log"
    "net/http"

    "github.com/PuerkitoBio/goquery"
)

type Book struct {
    Title string
    Price string
}

func main() {
    // Create HTTP client with custom headers
    client := &http.Client{}
    req, _ := http.NewRequest("GET", "https://books.toscrape.com/", nil)
    req.Header.Set("User-Agent",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/124.0.0.0")

    resp, err := client.Do(req)
    if err != nil {
        log.Fatal(err)
    }
    defer resp.Body.Close()

    // Parse with goquery
    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        log.Fatal(err)
    }

    // Extract data with CSS selectors
    var books []Book
    doc.Find("article.product_pod").Each(func(i int, s *goquery.Selection) {
        title, _ := s.Find("h3 a").Attr("title")
        price := s.Find(".price_color").Text()
        books = append(books, Book{Title: title, Price: price})
    })

    for _, book := range books[:5] {
        fmt.Printf("%s: %s\n", book.Title, book.Price)
    }
}

3. Concurrent scraping with rate limiting

Go's goroutines make concurrent scraping trivial. Colly has built-in async mode with rate limiting:

concurrent.go
package main

import (
    "fmt"
    "sync"
    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector(
        colly.Async(true),
    )

    // Rate limit: 2 requests/second per domain
    c.Limit(&colly.LimitRule{
        DomainGlob:  "*",
        Parallelism: 5,
        Delay:       500 * time.Millisecond,
        RandomDelay: 1 * time.Second,
    })

    var mu sync.Mutex
    var results []map[string]string

    c.OnHTML("article.product_pod", func(e *colly.HTMLElement) {
        mu.Lock()
        results = append(results, map[string]string{
            "title": e.ChildAttr("h3 a", "title"),
            "price": e.ChildText(".price_color"),
        })
        mu.Unlock()
    })

    // Follow pagination
    c.OnHTML("li.next a", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.Visit("https://books.toscrape.com/")
    c.Wait()

    fmt.Printf("Scraped %d books\n", len(results))
}

Pro tip

Always use a mutex when appending to shared slices from concurrent callbacks. Colly's OnHTML handlers can fire from multiple goroutines simultaneously.

4. JavaScript pages with SnapRender

Colly and goquery cannot execute JavaScript. For React, Vue, and Angular SPAs, use SnapRender to get fully-rendered content via a simple HTTP call:

Render as markdown

render.go
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    // Render any page as clean markdown (handles JS)
    payload, _ := json.Marshal(map[string]string{
        "url":    "https://example.com/spa-page",
        "format": "markdown",
    })

    req, _ := http.NewRequest("POST",
        "https://api.snaprender.dev/v1/render",
        bytes.NewBuffer(payload))
    req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        panic(err)
    }
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)

    var result map[string]interface{}
    json.Unmarshal(body, &result)

    data := result["data"].(map[string]interface{})
    fmt.Println(data["markdown"])
}

Extract structured data

extract.go
package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    // Extract structured data with CSS selectors
    payload, _ := json.Marshal(map[string]interface{}{
        "url": "https://example.com/products/widget-pro",
        "selectors": map[string]string{
            "name":        "h1.product-title",
            "price":       ".price-current",
            "rating":      ".star-rating",
            "description": ".product-description p",
            "in_stock":    ".availability-status",
        },
    })

    req, _ := http.NewRequest("POST",
        "https://api.snaprender.dev/v1/extract",
        bytes.NewBuffer(payload))
    req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
    req.Header.Set("Content-Type", "application/json")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    body, _ := io.ReadAll(resp.Body)
    fmt.Println(string(body))
}

Bypass anti-bot protection

bypass.go
// Bypass Cloudflare / anti-bot protection
payload, _ := json.Marshal(map[string]interface{}{
    "url":              "https://protected-site.com/data",
    "format":           "markdown",
    "use_flaresolverr": true,
})

req, _ := http.NewRequest("POST",
    "https://api.snaprender.dev/v1/render",
    bytes.NewBuffer(payload))
req.Header.Set("x-api-key", "sr_live_YOUR_KEY")
req.Header.Set("Content-Type", "application/json")

resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

body, _ := io.ReadAll(resp.Body)
// Returns fully rendered content even behind
// Cloudflare, DataDome, etc.
fmt.Println(string(body))

Comparison: when to use what

ApproachBest forLimitation
CollyStatic sites, crawling, speedNo JS rendering
goquery + net/httpCustom requests, full controlNo JS rendering
chromedpJS pages, local browserHigh RAM, complexity
SnapRender APIJS + anti-bot + scaleAPI cost at high volume

Skip the browser infrastructure

SnapRender handles JavaScript rendering, anti-bot bypass, and data extraction. One HTTP call from your Go program — no chromedp, no browser binaries.

Get Your API Key — Free

Frequently asked questions

Go is excellent for high-performance web scraping. Goroutines make concurrent scraping trivial (thousands of parallel requests with minimal RAM), the standard library includes a solid HTTP client, and Colly is one of the fastest scraping frameworks in any language. Go scrapers compile to a single binary — no dependency management on servers.

Colly is the most popular Go scraping framework. It provides a callback-based API for visiting pages, extracting data with CSS selectors (via goquery), following links, handling cookies, rate limiting, and caching. It supports parallel scraping out of the box and is used in production by companies processing millions of pages.

Go is significantly faster for CPU-bound parsing and uses less memory per concurrent request (goroutines vs threads). Python has more scraping libraries and is easier to prototype with. Go wins for production scrapers processing millions of pages; Python wins for quick scripts and data analysis pipelines.

The standard net/http package and Colly cannot execute JavaScript. You can use chromedp (a Go Chrome DevTools Protocol client) for headless browser scraping, or use SnapRender's API to offload JavaScript rendering entirely — just send a URL via net/http and get back rendered HTML or markdown.

Goroutines and channels are Go's killer feature for scraping. Use a worker pool pattern: spawn N goroutines reading URLs from a channel, each making HTTP requests and sending results back through another channel. Colly has built-in parallelism with configurable concurrency limits via Async() and Limit().