This is an HTTP frontend for the validator.nu HTML 5 parser. It parses some input text and returns the reserialized HTML.
Ubuntu build/test dependencies:
- openjdk-7-jdk
- maven2
- jsvc
Compile and generate a .jar file by running mvn package
.
We use the "shade" plugin to bundle all dependencies. So to run it, you only need the generated .jar file in the class path. To start the server as a daemon, use something like:
/usr/bin/jsvc \
-cp $(pwd)/target/html5depurate-1.0-SNAPSHOT.jar \
-pidfile /tmp/html5depurate.pid \
-errfile /tmp/html5depurate.err \
-outfile /tmp/html5depurate.out \
-procname html5depurate \
org.wikimedia.html5depurate.DepurateDaemon
The default log format is pretty bad but can be configured by the usual means, with -Djava.util.logging.config.file=/path/to/logging.properties
Then to test:
curl http://localhost:4339/document -F text=foo
This will return an HTML document which is a reserialized version of "foo".
Configuration options may be specified in /etc/html5depurate/html5depurate.conf. Possible configuration options and their default values are documented below:
# Max POST size, in bytes.
maxPostSize = 100000000
# Host or IP and port on which Html5depurate will listen.
host = localhost
port = 4339
-
Debian packaging
- A SysV init script wrapping jsvc should be fairly simple.
- Very strong security guarantees are possible by using a security.policy file.
- There is no package for grizzly, so we will have to bundle it for now. Using Maven Central during build, instead of creating about 9 new Debian source packages, is not allowed in Debian upstream, but WMF can distribute the resulting file.
-
Collect warnings/errors and provide a JSON serialized return format exposed at /info.
-
Help out MW a bit by extracting the contents of the body tag. This could be provided at /body.
-
A servlet version, if someone needs that. An early version depended on a servlet container, but I abandoned that approach in favour of the robustness and management simplicity of a standalone daemon.