Is it possible to pause and resume crawling using Java crawler crawler4j? #253

ukul3l3 · 2017-10-16T11:36:32Z

I already know that you can configure crawling to be resumable. But is it possible to use resumable functionality to pause the crawling process and then resume crawling later programmatically? E.g. I can gracefully shutdown crawling with shutdown method of the crawler and with the resumable parameter set to true, then start again crawling. Will it work this way, because the primary purpose of the resumable parameter is to handle accidental crashes of the crawler? Is there any other way or better way how to achieve this functionality with crawler4j?

s17t · 2017-10-19T07:20:44Z

Hi, the resume-able option does what it tell. If you run an instance with that option to true and your crawler stops either for a crash or for a programmatic shutdown the next execution will resume when the previous run stops. The storage folder must be the same.

Since the multi-thread nature of the library a crash drive a lost in some links. A programmatic shutdown is supposed to be reliable (i.e. call the shutdown on controller) and no links would be lost.

progrock2002 · 2017-10-30T10:25:06Z

I have problems using the resume functionality, see #257. Are there any positive experiences?

s17t · 2018-02-14T18:12:07Z

@progrock2002 #257 could help your case.

s17t closed this as completed Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to pause and resume crawling using Java crawler crawler4j? #253

Is it possible to pause and resume crawling using Java crawler crawler4j? #253

ukul3l3 commented Oct 16, 2017

s17t commented Oct 19, 2017

progrock2002 commented Oct 30, 2017

s17t commented Feb 14, 2018

Is it possible to pause and resume crawling using Java crawler crawler4j? #253

Is it possible to pause and resume crawling using Java crawler crawler4j? #253

Comments

ukul3l3 commented Oct 16, 2017

s17t commented Oct 19, 2017

progrock2002 commented Oct 30, 2017

s17t commented Feb 14, 2018