How the Siteimprove Crawler Works
Summary
The Siteimprove crawler scans your website by following links, collecting HTML content, and generating data used across Siteimprove products.
Overview
This article explains how the Siteimprove crawler discovers, processes, and reports website data.
What is the Siteimprove crawler?
Web crawlers are computer programs that scan the web, ‘reading’ everything they find. A crawler starts out by visiting your website and systematically identifies all hyperlinks on all pages; it then follows them to their conclusion.
Our crawlers scan your website using Siteimprove servers from specific IP addresses with identifiable user agents. Our crawlers use HTTP (Hypertext Transfer Protocol) requests to collect the HTML code on which to carry out error checks.
The data harvested by the crawler is stored in Siteimprove's databases. Based on the content found on each page, information is reported to Siteimprove's online platform, i.e., accessibility issues, misspellings, broken links, etc.
Learn more about the Siteimprove crawler and how it identifies broken links.
Key Concepts
- Link discovery and crawling logic
- HTTP request-based scanning
- Data extraction and reporting
Did you find it helpful? Yes No
Send feedback