Skip to main content

How Siteimprove Classifies Internal vs External Content

Modified on: Tue, 30 Jun, 2026 at 7:47 PM

Summary

Siteimprove classifies content as internal or external primarily based on your site’s index URL and crawl configuration. Content matching your domain (or configured inclusions) is treated as internal, while anything outside those rules is considered external.

Overview

This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.

What Is Internal Content?

Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your index URL.
It can also be affected by including or excluding content in the crawl settings.

For example, on the website with the index URL https://siteimprove.com, without any additional Site Content Settings being configured for the site, the following links are considered internal:

  • https://siteimprove.com/work/quality
  • https://siteimprove.com/maps.pdf
  • https://siteimprove.com/files/offices.pdf

However, links that do not match the domain—even those on subdomains—are not considered internal to the site.

For example, the following links are not internal to the site with the index URL https://siteimprove.com by default:

  • https://www.siteimprove.com/maps.pdf
  • https://download.siteimprove.com/file.pdf
  • https://www.wikipedia.com/tell-me-something.pdf

How Crawl Settings Affect Classification

If you want certain subdomains or URLs to be considered internal, you need to:

  • Add separate sites to your account so those subdomains are crawled, OR
  • Set up an inclusion or exclusion in the configuration for your original site

An inclusion or exclusion dictates whether links seen during a crawl should be regarded as internal or external.

If configured correctly, links matching specified URL elements (such as subdomains) will be treated as internal.

If a link is regarded as internal, Siteimprove will:

  • Follow the link
  • Crawl the content
  • Download, render, and store the content
  • Use it for analysis in QA, Accessibility, SEO, and Policy

Important Considerations

  • Configure inclusions and exclusions as specifically as possible
  • Broad rules (for example, including all .pdf URLs) may unintentionally classify external content as internal

For example, if .pdf is broadly included, the following would all be treated as internal—even if they are on external domains:

  • https://www.siteimprove.com/maps.pdf
  • https://download.siteimprove.com/file.pdf
  • https://www.wikipedia.com/tell-me-something.pdf

What Is External Content?

A link is considered external when:

  • It does not contain your domain name in the URL
  • No inclusion rule has been configured to treat it as internal

For example, on the website siteimprove.com, the following links are considered external:



Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.