You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a simple way to re-queue a page for crawling? Many sites employ request rate limiting (429 http status code) and typically it's a question of putting that back in the queue for retry.
An alternative would be a function to rate limit the crawler beyond max concurrency - perhaps a global maximum requests/s (with the ability to provide less-than-1 for slower crawling).
Setting maxConcurrency to 1 still crawls too quickly.
The text was updated successfully, but these errors were encountered:
To achieve retries I suggest to check request option. It uses got module inside website-scraper to make http requests and I suppose it's possible to configure got to do retries when request fails.
This issue has been automatically closed because there has been no response from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
Is there a simple way to re-queue a page for crawling? Many sites employ request rate limiting (429 http status code) and typically it's a question of putting that back in the queue for retry.
An alternative would be a function to rate limit the crawler beyond max concurrency - perhaps a global maximum requests/s (with the ability to provide less-than-1 for slower crawling).
Setting maxConcurrency to 1 still crawls too quickly.
The text was updated successfully, but these errors were encountered: