How the Siteimprove Crawler Works

Modified on: Thu, 11 Jun, 2026 at 9:05 PM

Summary

The Siteimprove crawler scans your website by following links, collecting HTML content, and generating data used across Siteimprove products.

Overview

This article explains how the Siteimprove crawler discovers, processes, and reports website data.

What is the Siteimprove crawler?

Web crawlers are computer programs that scan the web, ‘reading’ everything they find. A crawler starts out by visiting your website and systematically identifies all hyperlinks on all pages; it then follows them to their conclusion.
Our crawlers scan your website using Siteimprove servers from specific IP addresses with identifiable user agents. Our crawlers use HTTP (Hypertext Transfer Protocol) requests to collect the HTML code on which to carry out error checks.
The data harvested by the crawler is stored in Siteimprove's databases. Based on the content found on each page, information is reported to Siteimprove's online platform, i.e., accessibility issues, misspellings, broken links, etc.
Learn more about the Siteimprove crawler and how it identifies broken links.

Key Concepts

Link discovery and crawling logic
HTTP request-based scanning
Data extraction and reporting

Did you find it helpful? Yes No