Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to pause and resume crawling using Java crawler crawler4j? #253

Closed
ukul3l3 opened this issue Oct 16, 2017 · 3 comments
Closed

Comments

@ukul3l3
Copy link

ukul3l3 commented Oct 16, 2017

I already know that you can configure crawling to be resumable. But is it possible to use resumable functionality to pause the crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with the resumable parameter set to true, then start again crawling. Will it work this way, because the primary purpose of the resumable parameter is to handle accidental crashes of the crawler? Is there any other way or better way how to achieve this functionality with crawler4j?

@s17t
Copy link
Contributor

s17t commented Oct 19, 2017

Hi, the resume-able option does what it tell. If you run an instance with that option to true and your crawler stops either for a crash or for a programmatic shutdown the next execution will resume when the previous run stops. The storage folder must be the same.

Since the multi-thread nature of the library a crash drive a lost in some links. A programmatic shutdown is supposed to be reliable (i.e. call the shutdown on controller) and no links would be lost.

@progrock2002
Copy link

I have problems using the resume functionality, see #257. Are there any positive experiences?

@s17t
Copy link
Contributor

s17t commented Feb 14, 2018

@progrock2002 #257 could help your case.

@s17t s17t closed this as completed Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants