You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 20, 2019. It is now read-only.
as results are fetching, report each result back to manager at once
repeat
Workers communicate with manager through HTTP RESTful API. At the time of writing, manager location is predefined in workers configuration.
When worker finds a URL on just retrieved page, it is obviously new to him. But it may be already visited before, so workers are restricted to crawl only a set of URLs got from queue manager. Worker reports all URLs found to the manager. This approach allows to have full control over what URLs are crawled and when. But as disadvantage, it means lots of traffic (Link lists) between workers and manager.