Skip to content
Zip Reader for Hadoop Streaming
Java Shell
Latest commit 04cd2d3 Mar 12, 2013 @tebeka works with many files
--HG--
branch : multi
Failed to load latest commit information.
src
.hgignore
.hgtags
.travis.yml
LICENSE.txt
README.rst
pom.xml
run-example.sh

README.rst

Zip Reader for Hadoop Streaming

This is a reader that will return (filename, line) key value pairs for a zip file in Hadoop streaming.

Note that currently only the first file in the zip will be processed, if you want more - submit a pull request :)

Usage

#!/bin/bash
# Unzip a file in HDFS

case $1 in
    -h | --help ) echo "usage: $(basename $0) INDIR OUTDIR"; exit;;
esac

if [ $# -ne 2 ]; then
    $0 -h
    exit 1
fi

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -libjars zipmapred-1.0-SNAPSHOT.jar \
    -mapper /bin/cat \
    -reducer /bin/cat \
    -inputformat com.mikitebeka.mapred.ZipInputFormat \
    -input $1 -output $2

FAQ

A. It uses the old(?) mapreduce API and doesn't work with CDH4
Q. Where does this project live?
Something went wrong with that request. Please try again.