Skip to content
Advanced common functionality for hadoop
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



This project creates a runnable jar file that can do some common advance functionality with hadoop.

This first version was built for CDH 4. I will be making it work for CDH 3 shortly.



a collections of layered functionality for advance putting. For details on how to use this functionality click here The user will be able to use all the following:

Layer 1: Reading

CSV files, Delimiter Files, Flat Files, Variable Length Delimiter files, Variable Length Flat Files

Layer 2: Aggregating

Many files into a few

Appending file name to every row of aggregated files

Layer 3: Threading

Run in single or multi thread mode

Each thread writing to a different HDFS file to increase write speed

Layer 4: Listening

Report progress to console

Layer 5: Compresing

Use Snappy, Gzip, or Bzip2

Layer 6: Writing

Sequence, Avro Files, Rc Files, or to HBase


This allows you to make one or more directories pumps files into HDFS as you favorite splittable formates (sequence, avro, or rc) Like the "put" functionality the route logic is also layered.

Layer 1: Route

Event driven

Schedule driven

Layer 2: Put Threads

Define number of put threads in the thread pool

Layer 3: Put

Get all the functionality and options from the above put command


hadoop fs -get is good but. What if you want to get a sequence, avro, or rc file? And what if you want to be able to read the results? Well then you can use these get methods to uncompress sequence, avro or rc files into text to your local drive.


hadoop fs -text only goes so far this takes us to the next step by being able to output rc files and avro files in clear text. Click here for more information.


Converting a {key}|{field}|{value} env files to an avro file with a generated schema

Converting a multiple row type file to multiple avro files each having a generated schema


Converts a non-splittable gzip file stored in hdfs to a sequence file of your choose of compression (snappy, gzip, bzip2)


Converts a non-splittable zip file stored in hdfs to a sequence file(s) of your choose of compression (snappy, gzip, bzip2). There will be a sequence file for every file in the original zip file.

Something went wrong with that request. Please try again.