You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I already know that you can configure crawling to be resumable. But is it possible to use resumable functionality to pause the crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with the resumable parameter set to true, then start again crawling. Will it work this way, because the primary purpose of the resumable parameter is to handle accidental crashes of the crawler? Is there any other way or better way how to achieve this functionality with crawler4j?
The text was updated successfully, but these errors were encountered:
Hi, the resume-able option does what it tell. If you run an instance with that option to true and your crawler stops either for a crash or for a programmatic shutdown the next execution will resume when the previous run stops. The storage folder must be the same.
Since the multi-thread nature of the library a crash drive a lost in some links. A programmatic shutdown is supposed to be reliable (i.e. call the shutdown on controller) and no links would be lost.
I already know that you can configure crawling to be resumable. But is it possible to use resumable functionality to pause the crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with the resumable parameter set to true, then start again crawling. Will it work this way, because the primary purpose of the resumable parameter is to handle accidental crashes of the crawler? Is there any other way or better way how to achieve this functionality with crawler4j?
The text was updated successfully, but these errors were encountered: