HTML Parser
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.gitignore
.travis.yml
LICENSE
README.md
fetch.sh
pom.xml

README.md

Travis Codacy Badge

HTMLParser

A simple HTML Parser using ANTLR4

Maven Coordinates

<dependency>
	<groupId>com.khubla.htmlparser</groupId>
	<artifactId>htmlparser</artifactId>
	<version>1.0</version>
	<type>jar</type>
	<scope>compile</scope>
</dependency>

Fetching and Validating a Page

HTMLParser can be used as a command-line jar file to fetch a single page and parse it. Parse errors will be logged to the console. For example

sh fetch.sh http://www.slashdot.org

Example Usage of the Library

To parse an arbitrary HTML document using the callback parser, provide an implementation of HTMLParserListener along with an InputStream of HTML to HTMLDocumentParser:parse

  final InputStream inputStream = TestTreeWalk.class.getResourceAsStream("/example1.html");
  final HTMLParserListener htmlParserListener = new ExampleListener();
  HTMLDocumentParser.parse(inputStream, htmlParserListener);

Licensing

HTMLParser is licensed under the GPLv2