Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawling rate limit or requeue? #485

Closed
abale opened this issue Mar 9, 2022 · 2 comments
Closed

Crawling rate limit or requeue? #485

abale opened this issue Mar 9, 2022 · 2 comments

Comments

@abale
Copy link

abale commented Mar 9, 2022

Is there a simple way to re-queue a page for crawling? Many sites employ request rate limiting (429 http status code) and typically it's a question of putting that back in the queue for retry.

An alternative would be a function to rate limit the crawler beyond max concurrency - perhaps a global maximum requests/s (with the ability to provide less-than-1 for slower crawling).

Setting maxConcurrency to 1 still crawls too quickly.

@s0ph1e
Copy link
Member

s0ph1e commented Mar 30, 2022

Hi @abale 👋

Sorry for late response.

To achieve retries I suggest to check request option. It uses got module inside website-scraper to make http requests and I suppose it's possible to configure got to do retries when request fails.

Also you can try to add delays between requests - please check an example of beforeRequest action usage

Hope it helps

@no-response
Copy link

no-response bot commented Apr 13, 2022

This issue has been automatically closed because there has been no response from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.

@no-response no-response bot closed this as completed Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants