yixiangding / news-spider Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

Web crawler for news pages and serving news search engine.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
backups		backups
lib		lib
results		results
src		src
.DS_Store		.DS_Store
README.md		README.md

Repository files navigation

News Spider

Intro

Web crwaler based on crawler4j library for massive news pages download.

Features

Crawled latest news content.
Access page source and headers.
Crawling through outgoing links (to enhance Page Rank computation).
Setting about number of spiders (concurrent crawling), politeness delay, etc.

About

Web crawler for news pages and serving news search engine.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Java 100.0%