How does Siteimprove determine what is classified as internal and external in our inventory section?
This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.
Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your "index URL".
It can also be affected by configuring an alias in the crawl settings.
For example, on the website with the index-URL https://siteimprove.com, without any additional aliases being configured for the site, the following links are considered internal:
… but links not matching the domain - even those on subdomains - are not considered internal to the site.
For example, the following links are not internal to the site with the index URL https://siteimprove.com by default.
In this case, if you did want links like the following
to be considered internal, you'd need to either:
- Add separate sites to your account so those subdomains are crawled OR
- Set up an alias to the configuration for your original site.
An alias dictates if links seen during a crawl should be regarded as internal or external.
If you create an alias for
and another alias for
then the site will regard all links matching these URL elements as internal.
If a link is regarded as internal, Siteimprove will follow that link, crawl the content, and subsequently download, render, and store the content. This content will then be used for further analysis in the QA, Accessibility, SEO, and Policy products.
Note: We recommend that an alias is set up to be as specific as possible to avoid a situation like the following.
If you configure Siteimprove to regard .pdf as internal using that URL element as an alias, then we will consider any PDF link containing .pdf as internal, regardless if it is on your domain or on someone else's domain. So all of the PDFs listed below (including the Wikipedia PDF that you most likely do not want to check) would be considered internal to your site.
The link does not contain your domain name in the URL and an alias has not been configured to include them.
For example, on the website siteimprove.com, the following links are considered external:
Did you find it helpful? Yes NoSend feedback