Crawl Errors: Crawler Blocked by Server, Network, or robots.txt settings
Issue / Question
You are receiving the following errors:
- 403 Forbidden
- Proxy error
- Connection error
- Max tries reached
- Index URL blocked by robots.txt
Environment (If Applicable)
Specify the conditions where the issue occurs:
- Product version: Siteimprove (all supported versions)
- Platform / OS / Browser: Platform‑independent (issue occurs at the site configuration and crawl level, not browser‑specific)
- User role / permissions: Account Owners
- Preconditions:
Solutions
If you are seeing a 403 Forbidden error
- Ask your IT department or hosting provider to allow the Siteimprove crawler. This error must be resolved outside the platform.
If you are seeing a Proxy error
- Adjust proxy settings or contact IT. If the issue persists, contact Customer Support.
If you are seeing a Connection error
Contact Customer Support.
If you are seeing Max tries reached
- Verify that your site is online. If the error continues, contact Customer Support.
If you are seeing Index URL blocked by robots.txt
- This error must be fixed outside the platform by updating your robots.txt file. Contact your IT department, web agency, or hosting provider if needed.
- Example: Your robots.txt may restrict all bots from crawling the site:
User-agent: *
Disallow: /
To allow Siteimprove but block other bots, you could use:
User-agent: *
Disallow: /
User-agent: SiteimproveBot
Disallow:
Cause
The root cause of these issues is blocking outside Siteimprove.
Additional Information
- Related errors or messages: N/A
- Known limitations: N/A
- Configuration notes: N/A
Did you find it helpful? Yes No
Send feedbackSorry we couldn't be helpful. Help us improve this article with your feedback.