How Siteimprove Classifies Internal vs External Content
Summary
Siteimprove classifies content as internal or external primarily based on your site’s index URL and crawl configuration. Content matching your domain (or configured inclusions) is treated as internal, while anything outside those rules is considered external.
Overview
This article explains how Siteimprove's tools determine internal and external inventory, including pages, documents, media files, links, and other digital assets.
What Is Internal Content?
Internal content on your site in QA, Accessibility, SEO, and Policy is defined primarily by your index URL.
It can also be affected by including or excluding content in the crawl settings.
For example, on the website with the index URL https://siteimprove.com, without any additional Site Content Settings being configured for the site, the following links are considered internal:
- https://siteimprove.com/work/quality
- https://siteimprove.com/maps.pdf
- https://siteimprove.com/files/offices.pdf
However, links that do not match the domain—even those on subdomains—are not considered internal to the site.
For example, the following links are not internal to the site with the index URL https://siteimprove.com by default:
- https://www.siteimprove.com/maps.pdf
- https://download.siteimprove.com/file.pdf
- https://www.wikipedia.com/tell-me-something.pdf
How Crawl Settings Affect Classification
If you want certain subdomains or URLs to be considered internal, you need to:
- Add separate sites to your account so those subdomains are crawled, OR
- Set up an inclusion or exclusion in the configuration for your original site
An inclusion or exclusion dictates whether links seen during a crawl should be regarded as internal or external.
If configured correctly, links matching specified URL elements (such as subdomains) will be treated as internal.
If a link is regarded as internal, Siteimprove will:
- Follow the link
- Crawl the content
- Download, render, and store the content
- Use it for analysis in QA, Accessibility, SEO, and Policy
Important Considerations
- Configure inclusions and exclusions as specifically as possible
- Broad rules (for example, including all .pdf URLs) may unintentionally classify external content as internal
For example, if .pdf is broadly included, the following would all be treated as internal—even if they are on external domains:
- https://www.siteimprove.com/maps.pdf
- https://download.siteimprove.com/file.pdf
- https://www.wikipedia.com/tell-me-something.pdf
What Is External Content?
A link is considered external when:
- It does not contain your domain name in the URL
- No inclusion rule has been configured to treat it as internal
For example, on the website siteimprove.com, the following links are considered external:
- http://planning.com/work/quality
- http://www.planning.com/maps.pdf
- http://download.planning.com/file.pdf
Did you find it helpful? Yes No
Send feedback