Crawl Errors: Excluded URLs, Redirects, and HTTP 400 Issues
Issue / Question
You are receiving the following errors:
- Final URL is not internal
- Final URL is excluded
- Redirect chain contains excluded URL
- HTTP 400 Bad Request (URL formatting issues)
Environment (If Applicable)
Specify the conditions where the issue occurs:
- Product version: Siteimprove (all supported versions)
- Platform / OS / Browser: Platform‑independent (issue occurs at the site configuration and crawl level, not browser‑specific)
- User role / permissions: Account Owner (required to edit site URL, manage remove‑link rules, or delete a site)
- Preconditions:
- A site has been added in Siteimprove with an index (start) URL
- The index URL redirects to another URL
- One or more of the following apply:
- The final URL does not match internal content rules
- The final or intermediate URL is blocked by a remove‑link rule
- The redirect chain includes an excluded URL
- The index URL contains a formatting error that results in an HTTP 400 response
Solutions
If the final URL is not internal
- Make sure the start URL matches your internal content and doesn’t redirect elsewhere. The internal domain is set by both the index URL and any include content rules in your site content settings. The last slash of the index URL defines the internal content in the site; both http and https are set to be internal based on the index URL’s pattern. In the example above, the internal content would be either http:// or https:// followed by www.example.com/. https://example.com:443/about-us is not part of that pattern, both due to the missing "www." and the “:443/” at the end. Account Owners can edit the site URL or delete the site by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
If the final URL or redirect chain is excluded
- Delete the remove-link rule that blocks the redirected index URL, or delete the site if it’s no longer required. Account Owners can perform these actions by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
If the site returns an HTTP 400 Bad Request
- Check the index URL for typos or formatting issues. Account Owners can edit the site URL or delete the site by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
If the Redirect chain contains excluded URL
- Delete the remove-link rule blocking the redirect path, or edit the site URL to avoid the blocked page. Account Owners can do this by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
Cause
The root cause of these issues is index URL vs internal/include/exclude rules mismatch.
Additional Information
- Related errors or messages: N/A
- Known limitations: N/A
- Configuration notes: N/A
Did you find it helpful? Yes No
Send feedbackSorry we couldn't be helpful. Help us improve this article with your feedback.