Skip to content

Github mirror of "analytics/wikihadoop" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing

Notifications You must be signed in to change notification settings

wikimedia/analytics-wikihadoop

Repository files navigation

WikiHadoop

This repository contains java and scala code providing mediawiki XML-dumps parsing capability to Hadoop through InputFormat classes.

The java code provides an InputFormat for the old hadoop API (mapred), and the scala code for the new API (mapreduce).

About

Github mirror of "analytics/wikihadoop" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for contributing

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published