Skip to content

Latest commit

 

History

History
14 lines (11 loc) · 957 Bytes

README.md

File metadata and controls

14 lines (11 loc) · 957 Bytes

udacity_hadoop_mapreduce

Udacity - Intro to Hadoop and MapReduce by Cloudera

This repository contains source codes to the course available on Udacity - Intro to Hadoop and MapReduce by Cloudera

Command to run scripts:

sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar -mapper 'python mapper_file.py' -reducer 'python reducer_file.py' -file mapper_file.py -file reducer_file.py -input path_to_input_file -output path_to_output_directory

Command to run scripts with combiners:

sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar -mapper 'python mapper_file.py' -reducer 'python reducer_file.py' -combiner 'python reducer_file.py' -file mapper_file.py -file reducer_file.py -input path_to_input_file -output path_to_output_directory