How does Siteimprove determine what is classified as internal and external in our inventory section?
This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.
Internal
Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your "index URL".
It can also be affected by including or excluding content in the crawl settings.
For example, on the website with the index-URL https://siteimprove.com, without any additional Site Content Settings being configured for the site, the following links are considered internal:
https://siteimprove.com/work/quality
https://siteimprove.com/maps.pdf
https://siteimprove.com/files/offices.pdf
… but links not matching the domain - even those on subdomains - are not considered internal to the site.
For example, the following links are not internal to the site with the index URL https://siteimprove.com by default.
https://www.siteimprove.com/maps.pdf
https://download.siteimprove.com/file.pdf
https://www.wikipedia.com/tell-me-something.pdf
In this case, if you did want links like the following
https://www.siteimprove.com/maps.pdf
https://download.siteimprove.com/file.pdf
to be considered internal, you'd need to either:
- Add separate sites to your account so those subdomains are crawled OR
- Set up an inclusion or exclusion in the configuration for your original site.
An inclusion or exclusion dictates if links seen during a crawl should be regarded as internal or external.
If you include or exclude for
www.siteimprove.com
and set up
download.siteimprove.com
then the site will regard all links matching these URL elements as internal.
If a link is regarded as internal, Siteimprove will follow that link, crawl the content, and subsequently download, render, and store the content. This content will then be used for further analysis in the QA, Accessibility, SEO, and Policy products.
Note: We recommend that an is set up to be as specific as possible to avoid a situation like the following.
If you configure Siteimprove to regard .pdf as internal using that URL element as an inclusion or exclusion, then we will consider any PDF link containing .pdf as internal, regardless if it is on your domain or on someone else's domain. So all of the PDFs listed below (including the Wikipedia PDF that you most likely do not want to check) would be considered internal to your site.
https://www.siteimprove.com/maps.pdf
https://download.siteimprove.com/file.pdf
https://www.wikipedia.com/tell-me-something.pdf
External
The link does not contain your domain name in the URL and a setting has not been configured to include them.
For example, on the website siteimprove.com, the following links are considered external:
http://planning.com/work/quality
http://www.planning.com/maps.pdf
http://download.planning.com/file.pdf
Did you find it helpful? Yes No
Send feedback