Spider and analyzis for a given domain, Rails 3 with anemone
Ruby JavaScript
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
app
config
db
lib
public
script
test
vendor/plugins
.gitignore
Gemfile
Gemfile.lock
README.textile
Rakefile
config.ru
vhost.conf

README.textile

Site Structure Analyzer

Webspider (anemone) with a Rails analysis app on top.

Usage

  • Edit your database.yml and config/spider.yml
  • bundle install
  • migrate
  • rake spider:start
  • Wait a night
  • rake spider:refresh
    • Will refresh the cached count columns for every page. This is necessary for sorting and displaying of links and backlinks count in the overview
  • open localhost:3000/pages
  • Validator: This also supports to check for W3 Parsing Errors. See spider.yml “w3c_url:” to a private W3 Installation, or set it to nil

Tested with ruby 1.8.7 only!

When you like to crawl again or a new page clear out the database before (TODO):

  rake spider:clear
  rake spider:start

Using the unix tool “screen” is very nice for not having to run my client machine all the time while crawling on a different machine btw.

TODO

  • i18n
  • more filtering options
  • support multiple domains (up to now, you can specify one in the config/spider.yml and have to clear out everything
  • Speed of Crawling… ideas? anyone? threading seems not to work