Skip to content

kevinavicenna/hadoop-multi-node

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Hadoop MapReduce Multi Node

This repository contains an example of running a MapReduce job using Hadoop in Docker. Below are the steps and details of the setup and how i insert the data:

Prerequisites

  • Docker

Setup Instructions

  1. Clone the Repository

    Clone this repository to your local machine:

    git clone url_this_repo
    cd to_the_folder
  2. Run Hadoop Cluster

    Start a multinode Hadoop cluster using Docker:

    docker compose up -d
  3. Verify Hadoop Cluster Status

    Ensure all containers are running and Hadoop services are up:

    docker compose ps
  4. Prepare Input Data

    Upload input data files (/Indonesian.csv in this example) to Hadoop HDFS:

    docker exec -it hadoop-settings-namenode-1 bash -c "hadoop fs -put /usr/hadoop/Indonesian.csv /"
  5. Run MapReduce Job

    Execute a word count example job:

    docker exec -it hadoop-settings-namenode-1 bash -c "yarn jar /opt/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /Indonesian.csv /output"
  6. View Job Results

    Monitor the job progress and view results:

    • Check job status and details in the Hadoop Resource Manager UI (http://localhost:8088).
    • View job counters and output files in Hadoop HDFS:
      docker exec -it hadoop-settings-namenode-1 bash -c "hadoop fs -ls /output"
      docker exec -it hadoop-settings-namenode-1 bash -c "hadoop fs -cat /output/part-r-00000"
  7. Cleanup

    Stop and remove the Docker containers:

    docker compose down

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published