Crawl Errors: What They Mean and How to Fix Them
At Siteimprove, we help you keep your websites healthy by regularly crawling them and checking for issues. This ensures your content is always up to date and optimized. However, sometimes a site crawl fails, and no pages are found. This can happen for several reasons—maybe the site no longer exists, your server is blocking our crawler, or the site settings are preventing access. To help you quickly understand and resolve these issues, we’ve listed the most common crawl errors below with simple explanations and what you can do about them. Common Crawl Errors Explained
1. Final URL is not internal
- What it means: The starting URL (index URL) redirects to a URL that doesn’t match your site’s configured internal content rules.
- Example: Your crawl might expect to start at
http://www.example.com/
but end up athttps://example.com:443/about-us
. - Recommended options:
- Edit site URL
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Make sure the start URL matches your internal content and doesn’t redirect elsewhere. The internal domain is set by both the index URL and any include content rules in your site content settings. The last slash of the index URL defines the internal content in the site; both http and https are set to be internal based on the index URL’s pattern. In the example above, the internal content would be either
http://
orhttps://
followed bywww.example.com/
.https://example.com:443/about-us
is not part of that pattern, both due to the missing "www." and the “:443/” at the end. Account Owners can edit the site URL or delete the site by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
2. Request URL is external
- What it means: The index URL is blocked by an exclude-content rule in your site settings.
- Example: Crawl expects
http://www.example.com/about
but/about
is excluded. - Recommended options:
- Delete exclude-content rule
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Review exclude-content rules and delete the one blocking the index URL. Account Owners can delete the rule by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
3. 404 Not Found
- What it means: The requested page isn’t available.
- Example: A crawl of
http://www.example.com/page
returns a 404 response. - Recommended options:
- Edit site URL
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Double-check the URL. If the page was removed, update the index URL to point somewhere valid, or delete the site if it’s no longer needed. Account Owners can make these changes by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
4. Request URL is excluded
- What it means: The index URL is blocked by a remove-link rule in your site settings.
- Example: Crawl expects
http://www.example.com/about
but/about
is removed. - Recommended options:
- Delete remove-link rule
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Delete the remove-link rule that blocks the index URL. Account Owners can delete the rule by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
5. 403 Forbidden
- What it means: The crawler’s request is blocked (login required or firewall issue).
- Recommended options:
- Contact IT or web agency
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: Ask your IT department or hosting provider to allow the Siteimprove crawler. This error must be resolved outside the platform.
6. No handlers able to process resource
- What it means: The URL points to a file that cannot be crawled (e.g.,
.json
). - Recommended options:
- Edit site URL
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Use a standard web page (HTML/XML) as the start URL. Account Owners can edit the site URL or delete the site by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
7. HTTP 429 Too Many Requests
- What it means: The server is overloaded and blocking requests.
- Recommended options:
- Edit site URL
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: Partially
- What to do: Check with your hosting provider and review crawl-delay rules in your robots.txt file. Account Owners can change the starting point by using Edit site URL in the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites. Adjusting crawl-delay requires contacting Customer Support.
8. HTTP 401 Unauthorized
- What it means: The site requires a login or is blocking the crawler.
- Example: A crawl attempt redirects to a login page.
- Recommended options:
- Add/replace credentials
- Contact IT or hosting provider
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Test the index URL in an incognito browser. If login is required, add or update credentials by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites. If blocked by CDN/firewall, IT or hosting must fix the issue outside the platform.
9. Final URL is excluded
- What it means: The index URL redirects to a URL blocked by a remove-link rule in your site settings.
- Example:
http://www.example.com/about
redirects to/index.html
, but/index
is blocked. - Recommended options:
- Delete remove-link rule
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Delete the remove-link rule that blocks the redirected index URL, or delete the site if it’s no longer required. Account Owners can perform these actions by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
10. Redirect chain contains excluded URL
- What it means: The index URL redirects through a chain where one URL is blocked by a remove-link rule.
- Recommended options:
- Delete remove-link rule
- Edit site URL
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Delete the remove-link rule blocking the redirect path, or edit the site URL to avoid the blocked page. Account Owners can do this by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
11. HTTP 400 Bad Request
- What it means: The server couldn’t understand the request.
- Recommended options:
- Edit site URL
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Check the index URL for typos or formatting issues. Account Owners can edit the site URL or delete the site by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
12. HTTP 410 Gone
- What it means: The page has been permanently removed.
- Recommended options:
- Edit site URL
- Delete Site
- Can I resolve this myself in the Siteimprove platform?: Yes
- What to do: Point the index URL to a valid page, or delete the site if it’s no longer needed. Account Owners can take these actions by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
13. Authentication failure
- What it means: The crawler could not log in to your site.
- Recommended options:
- Update credentials
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: Partially
- What to do: Verify that the login credentials are correct. Account Owners can update credentials by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites. If the login method has changed, contact Customer Support.
14. Proxy error
- What it means: A proxy or network issue blocked the crawl.
- Recommended options:
- Contact IT or web agency
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: Adjust proxy settings or contact IT. If the issue persists, contact Customer Support.
15. Rendering error
- What it means: The page was too complex to load.
- Recommended options:
- Simplify the page
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: Reduce overly complex elements (scripts, heavy resources). If the issue continues, contact Customer Support.
16. Connection error
- What it means: The crawler could not connect to the site.
- Recommended options:
- Contact IT or hosting provider
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: Check your server or network settings. If the error persists, contact Customer Support.
17. Max tries reached (Other)
- What it means: The crawler attempted multiple times but could not reach the site.
- Recommended options:
- Contact IT or hosting provider
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: Verify that your site is online. If the error continues, contact Customer Support.
18. 500 Internal Server Error
- What it means: The server had an unexpected issue.
- Recommended options:
- Delete Site
- Contact hosting provider
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: Partially
- What to do: Contact your hosting provider to resolve the server issue. If the site is no longer needed, Account Owners can delete it by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
19. HTTP 503 Service Unavailable
- What it means: The server is down or overloaded.
- Recommended options:
- Delete Site
- Contact hosting provider
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: Partially
- What to do: Try again later or contact your hosting provider. If the site is no longer needed, Account Owners can delete it by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites.
20. Index URL blocked by robots.txt
- What it means: The index URL you want to crawl is blocked by the domain's robots.txt file. One or more rules in robots.txt are disallowing the Siteimprove bot from accessing this site’s index URL.
- Example:Your robots.txt may restrict all bots from crawling the site:
User-agent: * Disallow: /
To allow Siteimprove but block other bots, you could use:User-agent: * Disallow: / User-agent: SiteimproveBot Disallow:
- Recommended options:
- Update robots.txt to allow SiteimproveBot
- Contact your IT department, web agency, or hosting provider
- Can I resolve this myself in the Siteimprove platform?: No
- What to do: This error must be fixed outside the platform by updating your robots.txt file. Contact your IT department, web agency, or hosting provider if needed.
21. Unidentified error
- What it means: The Siteimprove crawler could not crawl the site, and the cause is unknown.
- Recommended options:
- Contact your IT department or web agency
- Edit site URL
- Add/replace credentials for non-public sites
- Update site content settings
- Delete Site if no longer needed
- Contact Customer Support
- Can I resolve this myself in the Siteimprove platform?: Partially
- What to do: Account Owners can adjust settings (e.g., Edit site URL, Delete Site, Add/replace credentials, Update site content settings) by using the options listed under the How to fix this dropdown menu in the Site summary panel under Manage sites. If the error continues, contact Customer Support.
Did you find it helpful? Yes No
Send feedbackSorry we couldn't be helpful. Help us improve this article with your feedback.