Web Scraping with C#: HtmlAgilityPack & AngleSharp Guide (2026)

1. Setting up your project

Install the NuGet packages:

terminal

dotnet add package HtmlAgilityPack
dotnet add package AngleSharp
dotnet add package System.Text.Json

2. Scraping with HtmlAgilityPack (XPath)

HtmlAgilityPack is the most popular .NET HTML parser. It uses XPath expressions to query elements:

Program.cs

using HtmlAgilityPack;

var web = new HtmlWeb();
web.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64)";

var doc = web.Load("https://books.toscrape.com/");

// Select all book articles using XPath
var books = doc.DocumentNode
    .SelectNodes("//article[@class='product_pod']");

foreach (var book in books)
{
    var title = book.SelectSingleNode(".//h3/a")
        ?.GetAttributeValue("title", "N/A");

    var price = book.SelectSingleNode(".//p[@class='price_color']")
        ?.InnerText.Trim();

    Console.WriteLine($"{title}: {price}");
}

XPath is powerful for complex queries. SelectNodes() returns all matches, while SelectSingleNode() returns the first match.

3. Scraping with AngleSharp (CSS selectors)

AngleSharp provides a browser-like API with CSS selectors. If you come from JavaScript, this will feel familiar:

Program.cs

using AngleSharp;
using AngleSharp.Dom;

var config = Configuration.Default
    .WithDefaultLoader()
    .WithDefaultCookies();

var context = BrowsingContext.New(config);
var document = await context.OpenAsync(
    "https://books.toscrape.com/"
);

// CSS selectors - just like JavaScript
var books = document.QuerySelectorAll("article.product_pod");

foreach (var book in books)
{
    var title = book.QuerySelector("h3 a")
        ?.GetAttribute("title") ?? "N/A";

    var price = book.QuerySelector(".price_color")
        ?.TextContent.Trim();

    Console.WriteLine($"{title}: {price}");
}

4. Configuring HttpClient

Always reuse a single HttpClient instance and set realistic browser headers:

HttpSetup.cs

using System.Net.Http.Headers;

// Reuse a single HttpClient instance
var handler = new HttpClientHandler
{
    AutomaticDecompression =
        System.Net.DecompressionMethods.GZip |
        System.Net.DecompressionMethods.Deflate
};

var client = new HttpClient(handler);
client.DefaultRequestHeaders.UserAgent.ParseAdd(
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " +
    "AppleWebKit/537.36 Chrome/124.0.0.0 Safari/537.36"
);
client.DefaultRequestHeaders.Accept.ParseAdd(
    "text/html,application/xhtml+xml"
);
client.DefaultRequestHeaders.AcceptLanguage.ParseAdd("en-US,en;q=0.9");

var html = await client.GetStringAsync(
    "https://example.com/products"
);
Console.WriteLine($"Fetched {html.Length} chars");

Pro tip

Never create a new HttpClient per request. .NET's socket exhaustion problem is real. Use IHttpClientFactory in ASP.NET or a shared static instance in console apps.

5. Async concurrent scraping

C#'s async/await and SemaphoreSlim make concurrent scraping elegant:

Concurrent.cs

using System.Collections.Concurrent;

var urls = Enumerable.Range(1, 50)
    .Select(i => $"https://example.com/page/{i}")
    .ToList();

var results = new ConcurrentBag<(string url, string data)>();
var semaphore = new SemaphoreSlim(5); // max 5 concurrent

var tasks = urls.Select(async url =>
{
    await semaphore.WaitAsync();
    try
    {
        var html = await client.GetStringAsync(url);
        // ... parse with HtmlAgilityPack or AngleSharp
        results.Add((url, html));
        Console.WriteLine($"Done: {url}");
    }
    catch (HttpRequestException ex)
    {
        Console.WriteLine($"Error on {url}: {ex.Message}");
    }
    finally
    {
        semaphore.Release();
    }

    // Polite delay
    await Task.Delay(Random.Shared.Next(1000, 3000));
});

await Task.WhenAll(tasks);
Console.WriteLine($"Scraped {results.Count} pages");

6. Storing scraped data

System.Text.Json handles JSON serialization. For CSV, use string interpolation or the CsvHelper package:

Storage.cs

using System.Text.Json;

var products = new List<Product>
{
    new("Widget Pro", "$29.99", "/products/widget-pro"),
    new("Gadget Max", "$49.99", "/products/gadget-max")
};

// Save to JSON
var json = JsonSerializer.Serialize(products,
    new JsonSerializerOptions { WriteIndented = true });
File.WriteAllText("products.json", json);

// Save to CSV
var csv = "Name,Price,Url\n" + string.Join("\n",
    products.Select(p => $"{p.Name},{p.Price},{p.Url}"));
File.WriteAllText("products.csv", csv);

record Product(string Name, string Price, string Url);

7. Handling JavaScript pages with SnapRender

HttpClient and HtmlAgilityPack cannot execute JavaScript. SPAs built with React, Angular, or Blazor return empty shells. Use SnapRender to get fully-rendered content:

Render as markdown

Render.cs

using System.Text.Json;

var client = new HttpClient();
client.DefaultRequestHeaders.Add("x-api-key", "sr_live_YOUR_KEY");

// Render any JS-heavy page as clean markdown
var payload = new
{
    url = "https://example.com/spa-page",
    format = "markdown"
};

var response = await client.PostAsJsonAsync(
    "https://api.snaprender.dev/v1/render", payload
);

var json = await response.Content.ReadAsStringAsync();
var doc = JsonDocument.Parse(json);
var markdown = doc.RootElement
    .GetProperty("data")
    .GetProperty("markdown")
    .GetString();

Console.WriteLine(markdown);

Extract structured data

Use CSS selectors to pull specific fields. Returns clean JSON — no parsing needed.

Extract.cs

var payload = new
{
    url = "https://example.com/products/widget-pro",
    selectors = new Dictionary<string, string>
    {
        ["name"] = "h1.product-title",
        ["price"] = ".price-current",
        ["rating"] = ".star-rating",
        ["description"] = ".product-description p",
        ["in_stock"] = ".availability-status"
    }
};

var response = await client.PostAsJsonAsync(
    "https://api.snaprender.dev/v1/extract", payload
);

var result = await response.Content.ReadAsStringAsync();
Console.WriteLine(result);

Comparison: when to use what

Approach	Best for	Limitation
HtmlAgilityPack	Static HTML, XPath queries	No JS rendering
AngleSharp	CSS selectors, modern API	JS support is experimental
Playwright .NET	Full browser automation	Heavy, slow, resource-hungry
SnapRender API	JS + anti-bot + scale	API cost at high volume

Skip the browser infrastructure

SnapRender handles JavaScript rendering, anti-bot bypass, and data extraction. Just send a URL from your C# app and get results back as JSON.

Get Your API Key — Free

Frequently asked questions

C# is excellent for web scraping, especially in enterprise environments. HtmlAgilityPack and AngleSharp are mature, well-maintained libraries. C# offers strong typing, async/await, and excellent IDE support. It integrates naturally into .NET pipelines, Azure Functions, and Windows services.

HtmlAgilityPack uses XPath for querying and is the older, more established library. AngleSharp uses CSS selectors (like jQuery) and has a more modern API. AngleSharp also supports JavaScript execution via AngleSharp.Js. For most scraping, AngleSharp is the more developer-friendly choice.

Standard C# HTTP clients (HttpClient, RestSharp) cannot execute JavaScript. You can use Playwright for .NET, Selenium WebDriver, or AngleSharp.Js for local rendering. For production scraping, an API like SnapRender handles JS rendering server-side without browser dependencies.

Set realistic User-Agent headers, implement random delays between requests, rotate proxies, and handle 429/403 responses with exponential backoff. For Cloudflare-protected sites, use SnapRender with the use_flaresolverr flag to bypass protection automatically.

HttpClient is built into .NET and is the recommended choice for most scraping. It supports connection pooling, automatic decompression, and cookies. RestSharp adds convenience methods but introduces an extra dependency. For scraping, HttpClient with a shared instance is the standard approach.

Web Scraping with C#: The Complete Guide

What you will learn