Skip to content
Zip Reader for Hadoop Streaming
Java Shell
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src
.hgignore
.hgtags
.travis.yml
ChangeLog
LICENSE.txt
README.rst
pom.xml
run-example.sh

README.rst

Zip Reader for Hadoop Streaming

This is a reader that will return (filename, line) key value pairs for a zip file in Hadoop streaming.

Note that currently only the first file in the zip will be processed, if you want more - submit a pull request :)

Usage

#!/bin/bash
# Unzip a file in HDFS

case $1 in
    -h | --help ) echo "usage: $(basename $0) INDIR OUTDIR"; exit;;
esac

if [ $# -ne 2 ]; then
    $0 -h
    exit 1
fi

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
    -libjars zipmapred-1.0-SNAPSHOT.jar \
    -mapper /bin/cat \
    -reducer /bin/cat \
    -inputformat com.mikitebeka.mapred.ZipInputFormat \
    -input $1 -output $2

FAQ

A. It uses the old(?) mapreduce API and doesn't work with CDH4
Q. Where does this project live?
Something went wrong with that request. Please try again.