Skip to content

Web crawler for news pages and serving news search engine.

Notifications You must be signed in to change notification settings

yixiangding/news-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Spider

Intro

Web crwaler based on crawler4j library for massive news pages download.

Features

  • Crawled latest news content.
  • Access page source and headers.
  • Crawling through outgoing links (to enhance Page Rank computation).
  • Setting about number of spiders (concurrent crawling), politeness delay, etc.

About

Web crawler for news pages and serving news search engine.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages