Scrape data using method chaining
CoffeeScript
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
.jshintrc
.travis.yml
Readme.md
index.coffee
package.json

Readme.md

#HTML Scraper The scraper has three components: httpsplitextract, executed in the same order. You make an http request to fetch a page, split the page into different sections and ultimately extract the data from each section using a custom parser.

The best part about this scraper is that you can create a chain of actions that you need to perform.

###Order of execution httpsplitextracthttpsplit and so on…

###Example Say I want extract al the information of students who got admitted to the University of Southern California, Los Angeles. I would do it as follows —

  1. Make an http request to this page — http://edulix.com/universityfinder/university_of_southern_california.
  2. Page consists of multiple anchor tags containing links of each
	# Standard require
    Scraper = require 'HTML-Scraper'
	
    # Specify the key to read urls
    Scraper().http 'url'
    .split '.archive a'
    .extract (doc) ->
        href: "http://tusharm.com" +  doc.attr 'href'
        text: doc.html()
    .http 'href'
    .extract ($) ->
        http: $('a:nth-child(2)').attr('href')
    
    #Launch with base params
    .$launch  url: 'http://tusharm.com/projects.html'
    
    #Returns a promise
    .then (val) -> console.log val
    .done()