Skip to content

vermiculite/mrspider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mr Spider

Crawl the web politely.

NPM

alt tag Coverage Status

Join the chat at https://gitter.im/vermiculite/mrspider

Installation

$ npm i mrspider --save

Example

An example for parking data

Included streams

####Fetch the page

mrspider request

let spider = mr.Spider({
    baseUrl: 'http://www.idealista.com'
});
let mr = require('mrspider');
let request = mr.request();

spider.createReadStream.pipe(request);

####Parse DOM

mrspider cheerio

let spider = mr.Spider({
    baseUrl: 'http://www.idealista.com'
});
let mr = require('mrspider');
let mrspiderCheerio = mr.cheerio;
spider.createReadStream().pipe(...).pipe(cheerio);

mrspider JSDOM

let spider = mr.Spider({
    baseUrl: 'http://www.idealista.com'
});
let mr = require('mrspider');
let jsdom = mr.jsdom;
spider.createReadStream().pipe(...).pipe(jsdom);

####Parse Data

mrspider regex data extractor

mrspider css data extractor

mrspider css links

mrspoder image extraction

####Data validation

mrspider validator

####Data persistence

mrspider mongodb persister

Features

  • Super simple api.
  • Streaming architecture allows complete customisation.
  • Use the full power of JavaScript giving you great flexibility.

Tests

To run the test suite, first install the dependencies, then run npm test:

About

Crawl the web politely.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published