Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
erlang/mochiweb app to scrape the web
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
priv/www
src
templates
.gitignore
Makefile
README.md
rebar
rebar.config
start-dev.sh

README.md

CANHAZ is a simple scraping tool

Give it a URL and an XPath expression and it will fetch the URL, parse it, and return a JSON structure of the results of the XPath selection.

Parameters

  • url: the URL to fetch
  • xpath: the xpath expression

Example

curl 'http://localhost:8080/?url=http://example.com/&xpath=//img/@src'

Results

A successful request will return something like:

{"results" : {"text": "foo.jpg"}}

A bad xpath expression will return:

{"error" : "bad xpath"}

And HTTP errors will get passed along as well.

Future Plans

  • CSS Selectors
  • multiple xpath expressions
  • multiple URLs
  • async callbacks
Something went wrong with that request. Please try again.