Skip to content

thraxil/canhaz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CANHAZ is a simple scraping tool

Give it a URL and an XPath expression and it will fetch the URL, parse it, and return a JSON structure of the results of the XPath selection.

Parameters

  • url: the URL to fetch
  • xpath: the xpath expression

Example

curl 'http://localhost:8080/?url=http://example.com/&xpath=//img/@src'

Results

A successful request will return something like:

{"results" : {"text": "foo.jpg"}}

A bad xpath expression will return:

{"error" : "bad xpath"}

And HTTP errors will get passed along as well.

Future Plans

  • CSS Selectors
  • multiple xpath expressions
  • multiple URLs
  • async callbacks

About

erlang/mochiweb app to scrape the web

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published