Yhat EMR Demo

Using Yhat with Hadoop streaming

This example hooks up Yhat with the Hadoop streaming API. We're going to feed CPG product names into our pipeline and score/tag records using a Product Classification model on Yhat (our mapper). We don't have a reducer step, so data is just output straight from the yhat-batch.py job to our S3 bucket as line-delimited JSON that looks like this:

{"text": "Aranciata Orange", "guesses": {"Frozen Foods": 0.30769230769230771, "Hair Shampoo": 0, "Pet Care": 0, "Soda": 0.69230769230769229}}
{"text": "Tonic Water. Contains Quinine", "guesses": {"Frozen Foods": 0.23076923076923078, "Hair Shampoo": 0, "Pet Care": 0.15384615384615385, "Soda": 0.61538461538461542}}

Results can be downloaded using the sync.sh script.

Requirements

s3cmd: (to upload data to S3) brew install s3cmd or http://s3tools.org/s3cmd
awscli: (to use EMR) pip install awscli
scikit-learn: (to build the model) pip install scikit-learn==0.14.1
pandas: (to build the model) pip install pandas==0.14.1
factual-apioptional: (to source addditional data) pip install factual-api

Getting Started

Make sure all of your data is uploaded onto S3 if you're setting things up on your own S3 account, you'll need to do a find/replace on "yhat-hadoop-example" with the name of your own bucket.

# replace your YHAT_USERNAME and YHAT_APIKEY in model/classifier.py
$ cd model
$ python classifier.py
$ cd ..
$ ./sync.sh up
$ ./spin-up-server.sh
{
      "ClusterId": "j-3UH0PR3KEO157"
}
$ ./job/run.sh j-3UH0PR3KEO157

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bootstrap		bootstrap
input		input
job		job
model		model
output		output
.gitignore		.gitignore
README.md		README.md
spin-up-cluster.sh		spin-up-cluster.sh
sync.sh		sync.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bootstrap

bootstrap

input

input

job

job

model

model

output

output

.gitignore

.gitignore

README.md

README.md

spin-up-cluster.sh

spin-up-cluster.sh

sync.sh

sync.sh

Repository files navigation

Yhat EMR Demo

Requirements

Getting Started

About

Releases

Packages

Languages

yhat/yhat-elastic-mr

Folders and files

Latest commit

History

Repository files navigation

Yhat EMR Demo

Requirements

Getting Started

About

Resources

Stars

Watchers

Forks

Languages