Udacity - Intro to Hadoop and MapReduce by Cloudera
This repository contains source codes to the course available on Udacity - Intro to Hadoop and MapReduce by Cloudera
Command to run scripts:
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar -mapper 'python mapper_file.py' -reducer 'python reducer_file.py' -file mapper_file.py -file reducer_file.py -input path_to_input_file -output path_to_output_directory
Command to run scripts with combiners:
sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar -mapper 'python mapper_file.py' -reducer 'python reducer_file.py' -combiner 'python reducer_file.py' -file mapper_file.py -file reducer_file.py -input path_to_input_file -output path_to_output_directory