Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Forgiving HTML/XML/RSS Parser in JS for *both* Node and Browsers
JavaScript

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
tests
LICENSE
README
node-htmlparser.js
runtests.html
runtests.js

README

A forgiving HTML parser written in JS for both the browser and NodeJS (yes, despite the name it works just fine in any modern browser).
The parser can handle streams (chunked data) and supports custom handlers for writing custom DOMs/output.

Node Usage:
	var htmlparser = require("node-htmlparser");
	var rawHtml = "Xyz <script language= javascript>var foo = '<<bar>>';< /  script><!--<!-- Waah! -- -->";
	var handler = new htmlparser.DefaultHandler();
	var parser = new htmlparser.Parser(handler);
	parser.ParseComplete(rawHtml);
	sys.puts(sys.inspect(handler.dom, false, null));

Browser Usage:
	var handler = new Tautologistics.NodeHtmlParser.DefaultHandler();
	var parser = new Tautologistics.NodeHtmlParser.Parser(handler);
	parser.ParseComplete(document.body.innerHTML);
	alert(JSON.stringify(handler.dom, null, 2));

Example output...
		[ { raw: 'Xyz ', data: 'Xyz ', type: 'text' }
		, { raw: 'script language= javascript'
		  , data: 'script language= javascript'
		  , type: 'script'
		  , name: 'script'
		  , attribs: { language: 'javascript' }
		  , children: 
		     [ { raw: 'var foo = \'<bar>\';<'
		       , data: 'var foo = \'<bar>\';<'
		       , type: 'text'
		       }
		     ]
		  }
		, { raw: '<!-- Waah! -- '
		  , data: '<!-- Waah! -- '
		  , type: 'comment'
		  }
		]
Something went wrong with that request. Please try again.