Web crwaler based on crawler4j library for massive news pages download.
- Crawled latest news content.
- Access page source and headers.
- Crawling through outgoing links (to enhance Page Rank computation).
- Setting about number of spiders (concurrent crawling), politeness delay, etc.