Zip Reader for Hadoop Streaming
Java Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src works with many files Mar 12, 2013
.hgignore Ignore eclipse Mar 12, 2013
.hgtags Added tag v1.0 for changeset 649b6f50dabe Mar 7, 2013
.travis.yml jdk6 Mar 8, 2013
LICENSE.txt Docs and license Mar 7, 2013
README.rst Docs and license Mar 7, 2013
pom.xml name Mar 8, 2013
run-example.sh example Mar 8, 2013

README.rst

Zip Reader for Hadoop Streaming

This is a reader that will return (filename, line) key value pairs for a zip file in Hadoop streaming.

Note that currently only the first file in the zip will be processed, if you want more - submit a pull request :)

Usage

#!/bin/bash
# Unzip a file in HDFS

case $1 in
    -h | --help ) echo "usage: $(basename $0) INDIR OUTDIR"; exit;;
esac

if [ $# -ne 2 ]; then
    $0 -h
    exit 1
fi

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -libjars zipmapred-1.0-SNAPSHOT.jar \
    -mapper /bin/cat \
    -reducer /bin/cat \
    -inputformat com.mikitebeka.mapred.ZipInputFormat \
    -input $1 -output $2

FAQ

A. It uses the old(?) mapreduce API and doesn't work with CDH4
Q. Where does this project live?