Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tag: v0.3.4
Fetching contributors…

Cannot retrieve contributors at this time

62 lines (41 sloc) 1.554 kb

Graboid

Graboid

Simply awesome web scraping. Better docs later. See specs.

Installation

gem install nokogiri graboid

Usage

%w{rubygems graboid}.each { |f| require f }

class RedditEntry
  include Graboid::Entity

  selector '.entry'

  set :title
  set :domain, :selector => '.domain a'

  set :link,   :selector => '.title' do |entry| 
    entry.css('a').first['href'] 
  end

  pager do |doc|
    doc.css('p.nextprev a').select{|a| a.text =~ /next/i  }.first['href']
  end

  before_paginate do
    puts "opening page: #{self.source}"
    puts "collection size: #{self.collection.length}"
    puts "#{"*"*100}"
  end

end

RedditEntry.source = 'http://reddit.com'

RedditEntry.all(:max_pages => 5).each do |p| 
  puts "title: #{p.title}"
  puts "domain: #{p.domain}"
  puts "link: #{p.link}"
  puts "#{"*"*100}"
end

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright (c) 2010 Christopher Burnett. See LICENSE for details.

Jump to Line
Something went wrong with that request. Please try again.