Skip to content
This repository

Heroshi – open source web crawler.

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 doc
Octocat-spinner-32 heroshi-worker
Octocat-spinner-32 heroshi
Octocat-spinner-32 limitmap
Octocat-spinner-32 slow-server
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README
Octocat-spinner-32 all.bash
README
Heroshi, open source web crawler.

Motivation 1: learn HTTP, libraries, real world quirks.
Motivation 2: collection of libraries and tools for building custom crawlers.
Motivation 3: provide access to representative subset of Web for educational and research purposes.

As of 2012-10-12, last goal is not even started, but these guys did amazing job at it http://commoncrawl.org/

See http://temoto.github.com/heroshi/ for more information.
Something went wrong with that request. Please try again.