3 things you should know about crawler queuing times
Before your site is crawled it is added to a queue of sites that are waiting their turn to be crawled.
The amount of time it spends in the queue depends on the following :
- The time at which it's added to the queue
- The maximum simultaneous crawls allowed
- The number of crawl slots globally available
The time at which it's added to the queue
The specific time at which the site has been added to the crawl queue. Sites that have been in the queue longer than other sites will receive a crawl slot earlier.
Maximum simultaneous crawls
The number of crawl slots available on your account is called the Maximum simultaneous crawls. Your account has been configured with a certain number of maximum simultaneous crawls allowed.
By default, we will crawl a maximum of 2 sites simultaneously on an account. This is to ensure that we are not overloading your servers. Often multiple websites are hosted on the same server, therefore increasing the number of maximum simultaneous crawls will add requests and therefore the load on your server.
Each site is crawled with one request at a time and a delay of 200 milliseconds by default. Read more about "How do requests to my website affect crawl speed?"
If your servers can handle more simultaneous requests from our crawler without getting overloaded, then the number of simultaneously crawled sites on your account can be increased by Siteimprove Technical Support. Information on your server's capacities should be available from your hosting provider or IT department.
The number of crawl slots globally available.
If your account has an empty slot available (i.e. it has not reached maximum simultaneous crawls), a queued crawl will start as soon as there is a free crawl slot available globally. Siteimprove has thousands of simultaneous crawl slots available, allowing crawls to start soon after entering the queue.
Why do my scans not show a queue time for previous scans?
We started tracking the queue time of scans on August 12th, 2020, and only for scans using the new Siteimprove Crawler. Read more about why the scan history might show "not available" for certain queue, crawl, or processing times.
Did you find it helpful?Send feedback