Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Simply awesome web scraping with Nokogiri
Ruby
tree: 45ef29eac3

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
examples
lib
spec
.document
.gitignore
LICENSE
README.mdown
Rakefile
VERSION
graboid.gemspec

README.mdown

Graboid

Graboid

Simply awesome web scraping. Better docs later. See specs.

0.3.4 Update

http://twoism.posterous.com/new-graboid-dsl

Installation

gem install nokogiri graboid

Usage

%w{rubygems graboid}.each { |f| require f }

class RedditEntry
  include Graboid::Scraper

  selector '.entry'

  set :title
  set :domain, :selector => '.domain a'

  set :link,   :selector => '.title' do |entry| 
    entry.css('a').first['href'] 
  end

  page_with do |doc|
    doc.css('p.nextprev a').select{|a| a.text =~ /next/i  }.first['href']
  end

  before_paginate do
    puts "opening page: #{self.source}"
    puts "collection size: #{self.collection.length}"
    puts "#{"*"*100}"
  end

end

@posts = RedditEntry.new( :source => 'http://reddit.com' ).all( :max_pages => 2 )

@posts.each do |p| 
  puts "title: #{p.title}"
  puts "domain: #{p.domain}"
  puts "link: #{p.link}"
  puts "#{"*"*100}"
end

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2010 Christopher Burnett. See LICENSE for details.

Something went wrong with that request. Please try again.