Skip to main content

Siteimprove Content Suite: Data Flows and Compliance

Modified on: Fri, 20 Feb, 2026 at 9:04 PM


What is the Siteimprove Content Suite?


The Siteimprove Content Suite is a platform that helps organizations monitor and improve their website content across four key areas: Quality Assurance (QA), Accessibility, SEO, and Policy Compliance.

 

How Crawling Works

The Content Suite uses a configurable crawler to scan your website. The crawler is primed with a set of configurations to yield optimal results, among those are:

  • User agent strings to identify the crawler
  • Proxies to manage IP distribution or access geo-specific content
  • Renderer configurations to correctly render the site’s pages (removing cookie banners, scrolling down, wait times, etc.)
  • The crawler starts from seed URLs, renders pages like a browser, and extracts content, metadata, links, and accessibility elements.

 

Dataflow Overview

  • Crawling: Siteimprove crawls your website using the configured settings.
  • Data Transfer: The crawled data is securely transferred to AWS (Amazon Web Services).    
  • Processing & Storage: Content Suite application services (QA, Accessibility, SEO, Policy) process and store the data in AWS.
  • Analysis & Reporting: Each product uses this data to identify issues, generate insights, and provide actionable recommendations.

 

How the Data Is Used

  • QA: Finds broken links, typos, and outdated content.
  • Accessibility: Checks for WCAG compliance and usability issues.
  • SEO: Analyzes metadata, structure, and crawlability.
  • Policy: Flags content that violates internal or legal guidelines.

How does the Siteimprove Crawler and the Content Products work?


Figure 1 - Map of dataflow for EU Datacenter


Figure 2- Map of dataflow for US Datacenter






Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.