## Configure Hadoop HDFS

Let us configure HDFS as part of setting up single node cluster. We will start and validate HDFS as part of next topic.

* Here are the contents of **/opt/hadoop/etc/hadoop/core-site.xml**.

```shell
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
```

* Here are the contents of **/opt/hadoop/etc/hadoop/hdfs-site.xml**.

```shell
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/dfs/name</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/opt/hadoop/dfs/namesecondary</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
```

* Make sure to validate the JDK location to set **JAVA_HOME**. If you could not find check for JDK folders using `find` command - `find /usr/lib/jvm -name javac`

```shell
ls -ltr /usr/lib/jvm/java-1.8.0-openjdk-amd64
```

* Make sure to setup environment variables under **.profile**. Append below export statements to existing .profile. Make sure to restart the session once profile is updated.

```shell
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
```

* Update JAVA_HOME in **/opt/hadoop/etc/hadoop/hadoop-env.sh**. Here are the contents of the file after deleting all commented lines.

```shell
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
```

* Format HDFS so that directories for Namenode, Secondary Namenode as well as Datanode are created.

```shell
hdfs namenode -format
ls -ltr /opt/hadoop/dfs/
```

In [9]:
!ls -R $HADOOP_HOME/dfs

/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs:
namenode

/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs/namenode:
current

/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs/namenode/current:
fsimage_0000000000000000000	 seen_txid
fsimage_0000000000000000000.md5  VERSION


In [5]:
#thay /opt/hadoop  = real $HADOOP_HOME (~/.sdkman/candidates/hadoop/current)

**my config**

* Here are the contents of **$HADOOP_HOME/etc/hadoop/core-site.xml**.

```shell
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://0.0.0.0:9000</value>
    </property>
</configuration>
```

* Here are the contents of **$HADOOP_HOME/etc/hadoop/hdfs-site.xml**.

```shell
<configuration>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs/namenode</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs/namesecondary</value>

    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/nghiaht7/.sdkman/candidates/hadoop/current/dfs/datanode</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>

<!-- config for not need: hdfs dfsadmin -safemode leave-->
    <property>
        <name>dfs.safemode.threshold.pct</name>
        <value>0</value>
    </property>
</configuration>
```
