Skip to main content

How does Siteimprove determine what is classified as internal and external in our inventory section?

Modified on: Wed, 27 Sep, 2023 at 3:17 AM

This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.


Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your "index URL".
It can also be affected by including or excluding content in the crawl settings.

For example, on the website with the index-URL, without any additional Site Content Settings being configured for the site, the following links are considered internal:

… but links not matching the domain - even those on subdomains - are not considered internal to the site.

For example, the following links are not internal to the site with the index URL by default.

In this case, if you did want links like the following

to be considered internal, you'd need to either:

  • Add separate sites to your account so those subdomains are crawled OR 
  • Set up an inclusion or exclusion in the configuration for your original site.

An inclusion or exclusion dictates if links seen during a crawl should be regarded as internal or external. 

If you include or exclude for 

and set up 

then the site will regard all links matching these URL elements as internal.

If a link is regarded as internal, Siteimprove will follow that link, crawl the content, and subsequently download, render, and store the content. This content will then be used for further analysis in the QA, Accessibility, SEO, and Policy products.

Note: We recommend that an  is set up to be as specific as possible to avoid a situation like the following.

If you configure Siteimprove to regard .pdf as internal using that URL element as an inclusion or exclusion, then we will consider any PDF link containing .pdf as internal, regardless if it is on your domain or on someone else's domain. So all of the PDFs listed below (including the Wikipedia PDF that you most likely do not want to check) would be considered internal to your site.


The link does not contain your domain name in the URL and a setting has not been configured to include them. 

For example, on the website, the following links are considered external:

Did you find it helpful? Yes No

Send feedback
Sorry we couldn't be helpful. Help us improve this article with your feedback.