Skip to content

zhaolj/hadoop_docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

Overview 2022-12-24

This is my personal hadoop-3.3.4 docker images based on ubuntu:latest (latest = 22.04 jammy when building this image).

The aim for this image is to provide a base for building a hadoop cluster. After start-up, a ssh deamon will be launched automatically. All the information about the experimental hadoop cluster enverionment, please refer to this hadoop tutorial in Chinese.

Please check the Dockerfile for details.

All the soruce code could be found in github.com/zhaolj/hadoop_docker.

Use as a single node

  1. start a single node

    docker run -d -p10122:22 -p19870:9870 --privileged --name hadoop zhaolj/hadoop:latest
  2. login via ssh (password: hadoop)

    ssh hadoop@localhost -p10122
  3. check ip

    ip addr | grep 172

    check ip

    As example above, the ip is 172.17.0.2.

  4. config Go to the hadoop configuration files directory: /opt/hadoop/etc/hadoop.

    cd $HADOOP_HOME/etc/hadoop
    • core-site.xml

      <property>
          <name>fs.defaultFS</name>
          <value>hdfs://<your ip>:9000</value>
      </property>
    • hdfs-site.xml

      <property>
          <name>dfs.replication</name>
          <value>1</value>
      </property>
  5. format hdfs

    hdfs namenode -format
  6. start HDFS

    start-dfs.sh
  7. check processes via jps check via jps

  8. open the HDFS's HTTP board open the http://localhost:19870/ in your host brower. HDFS board

Use as a cluster

We are going to build a hadoop cluster with 1 NameNode and 2 DataNodes

cluster

We can use another pre-configed images: zhaolj/hadoop-cluster to quick start.

  1. create a network for the cluster

    docker network create --subnet=172.20.0.0/16 hnet
  2. create the containers (3 nodes)

    docker run -d -p10122:22 -p19870:9870 --name=nn --hostname=nn --network=hnet --ip=172.20.1.1 --add-host=dn1:172.20.1.2 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest
    docker run -d --name=dn1 --hostname=dn1 --network=hnet --ip=172.20.1.2 --add-host=nn:172.20.1.1 --add-host=dn2:172.20.1.3 --privileged zhaolj/hadoop-cluster:latest
    docker run -d --name=dn2 --hostname=dn2 --network=hnet --ip=172.20.1.3 --add-host=nn:172.20.1.1 --add-host=dn1:172.20.1.2 --privileged zhaolj/hadoop-cluster:latest
  3. go to the NameNode via ssh (password: hadoop)

    ssh hadoop@localhost -p10122
  4. format HDFS

    hdfs namenode -format
  5. start HDFS

    start-dfs.sh

    start-dfs

  6. check processes via jps on NameNode check via jps

  7. check processes via jps on DataNode

    • connect to dn1 via ssh in nn

      ssh dn1
      jps
      exit

      check via jps

    • connect to dn2 via ssh in nn

      ssh dn2
      jps
      exit

      check via jps

  8. open the HDFS's HTTP board open the http://localhost:19870/ in your host brower. HDFS board

Differences between zhaolj/hadoop and zhaolj/hadoop-cluster

zhaolj/hadoop-cluster includes three pre-configed files for the example cluster network.

  1. workers ($HADOOP_HOME/etc/hadoop/workers)

    dn1
    dn2
  2. core-site.xml ($HADOOP_HOME/etc/hadoop/core-site.xml)

    add following content in the default configuration (embraced by <configuration>...</configuration>).

    <!-- specify HDFS host & port -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://nn:9000</value>
    </property>
    <!-- specify Hadoop temporary directory -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:///home/hadoop/tmp</value>
    </property>
  3. hdfs-site.xml ($HADOOP_HOME/etc/hadoop/hdfs-site.xml)

    add following content in the default configuration (embraced by <configuration>...</configuration>).

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///home/hadoop/hdfs/name</value>
    </property>

About

hadoop base dockerfile

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published