Switch branches/tags
Find file History
Pull request Compare This branch is 4 commits ahead, 6 commits behind bwhite:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Hadoopy Examples
Brandyn White <bwhite@dappervision.com>

The examples are grouped by requirements and background into 'levels'.  The background is intended to give you a sense of what is recommended, don't let it scare you off.  Each exercise is listed below with one of 'TODO' (incomplete or not started), 'Untested' (almost done but still rough), or 'Tested' (completed and integrated into the automated tests in test_examples.py).

L0 - Basic examples without Hadoop (everything is run locally)
Requirements: Hadoopy, Python 2.6+, Linux/OS X
Background: Basic knowledge of MapReduce (TODO link to resources), Syntactic understanding of Python
    ex0[Tested]: Wordcount

L1 - Basic examples with Hadoop (interacts with HDFS and Hadoop)
Requirements: L0 + CDH2/3
Background: L0 + basic understanding of Hadoop job execution and HDFS
    ex0[Tested]: Wordcount
    ex1[Tested]: Direct write to HDFS + Wordcount

L2 - Basic examples with Hadoop using a Whirr cluster
Requirements: L0 + Whirr, Amazon AWS Account, a few $
Background: L0 + familiarity with Amazon AWS

L3 - Intermediate examples with Hadoop
Requirements: L1 or L2
Background: L0 + understanding of Hadoop design patterns (TODO link to Jimmy's Book)

L4 - Image processing + Computer Vision with Hadoop
Requirements: (L1 or L2) + Python Imaging Library (PIL), OpenCV
Background: L0 + familiarity with PIL and OpenCV

L5 - Automated parallel job execution with Hadoopy Flow
Requirements: (L1 or L2) + Hadoopy Flow, gevent
Background: L0 + familiarity with 'greenlets', dataflow

L6 - Mixing Java Hadoop code with Hadoopy
Requirements: (L1 or L2) + JDK
Background: L0 + familiarity with Java

L7 - Using Hadoopy with the Oozie job execution engine
Requirements: (L1 or L2) + Oozie
Background: L0 + familiarity with Oozie

L8 - Using Hadoopy with the Avro serialization format
Requirements: (L1 or L2) + Avro
Background: L0 + familiarity with Avro

L9 - Using Hadoopy with the Cassandra database
Requirements: (L1 or L2) + Cassandra
Background: L0 + familiarity with Cassandra