Skip to content

A simple and efficient C++ HTML parser following Whatwg HTML specification

License

Notifications You must be signed in to change notification settings

vaperce/eclair-html-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

eclair-html-parser

A simple and efficient C++ HTML parser following Whatwg HTML specification.

Features:

  • full HTML/XML parsing with browser like DOM tree corrections
  • HTML 5 compatible
  • browser like encoding detection
  • HTML content cleanup
  • simple DOM access API
  • pretty printing

Made with ❤️ in Paris 🇫🇷.

Usage example

Document document;

// Parse document
document.parse(
  "<html><head><title>test</title></head><body><h1>Test</h1></body></html>",
  71,
  "utf8",
  nullptr
);

// Clean document
document.clean(CleanFlags::SPACE);

// Access root node
Node root = document.root();

Build

Requires CMake.

Builds static libraries only.

$ mkdir build
$ cd build
$ cmake -DCMAKE_INSTALL_PREFIX=~/my-install-dir ..
$ make -j
$ make install

Dependencies

  • compact_enc_det: head
  • icu4c: 65-1

Dependencies are automatically downloaded and built during the build.

License

eclair-html-parser is distributed under MIT license.

About

A simple and efficient C++ HTML parser following Whatwg HTML specification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages