GitHub - wikimedia/mediawiki-services-html5depurate: This is an HTTP frontend for the validator.nu HTML 5 parser. It parses some input text and returns the reserialized HTML. Mirror from https://gerrit.wikimedia.org/g/mediawiki/services/html5depurate - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing)

This is an HTTP frontend for the validator.nu HTML 5 parser. It parses some input text and returns the reserialized HTML.

Compile and test

Ubuntu build/test dependencies:

openjdk-7-jdk
maven2
jsvc

Compile and generate a .jar file by running mvn package.

We use the "shade" plugin to bundle all dependencies. So to run it, you only need the generated .jar file in the class path. To start the server as a daemon, use something like:

/usr/bin/jsvc \
	-cp $(pwd)/target/html5depurate-1.0-SNAPSHOT.jar \
	-pidfile /tmp/html5depurate.pid \
	-errfile /tmp/html5depurate.err \
	-outfile /tmp/html5depurate.out \
	-procname html5depurate \
	org.wikimedia.html5depurate.DepurateDaemon

The default log format is pretty bad but can be configured by the usual means, with -Djava.util.logging.config.file=/path/to/logging.properties

Then to test:

curl http://localhost:4339/document -F text=foo

This will return an HTML document which is a reserialized version of "foo".

Configuration

Configuration options may be specified in /etc/html5depurate/html5depurate.conf. Possible configuration options and their default values are documented below:

# Max POST size, in bytes.
maxPostSize = 100000000

# Host or IP and port on which Html5depurate will listen.
host = localhost
port = 4339

To do

Debian packaging
- A SysV init script wrapping jsvc should be fairly simple.
- Very strong security guarantees are possible by using a security.policy file.
- There is no package for grizzly, so we will have to bundle it for now. Using Maven Central during build, instead of creating about 9 new Debian source packages, is not allowed in Debian upstream, but WMF can distribute the resulting file.
Collect warnings/errors and provide a JSON serialized return format exposed at /info.
Help out MW a bit by extracting the contents of the body tag. This could be provided at /body.
A servlet version, if someone needs that. An early version depended on a servlet container, but I abandoned that approach in favour of the robustness and management simplicity of a standalone daemon.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
web/WEB-INF		web/WEB-INF
.gitignore		.gitignore
.gitreview		.gitreview
LICENSE.TXT		LICENSE.TXT
README.md		README.md
build-tomcat.xml		build-tomcat.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

web/WEB-INF

web/WEB-INF

.gitignore

.gitignore

.gitreview

.gitreview

LICENSE.TXT

LICENSE.TXT

README.md

README.md

build-tomcat.xml

build-tomcat.xml

pom.xml

pom.xml

Repository files navigation

Compile and test

Configuration

To do

About

Contributors 3

Languages

License

wikimedia/mediawiki-services-html5depurate

Folders and files

Latest commit

History

Repository files navigation

Compile and test

Configuration

To do

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages