Aim:
To install and configure Hadoop in a single-node (pseudo-distributed) mode on Google Colab.

Software Required:

Google Colab

Ubuntu Linux (in Colab environment)

Java JDK 11

Hadoop 3.3.6



In [3]:
# Install Java 11 (headless mode)
!apt-get install openjdk-11-jdk-headless -qq > /dev/null

# Verify Java installation
!java -version


openjdk version "11.0.28" 2025-07-15
OpenJDK Runtime Environment (build 11.0.28+6-post-Ubuntu-1ubuntu122.04.1)
OpenJDK 64-Bit Server VM (build 11.0.28+6-post-Ubuntu-1ubuntu122.04.1, mixed mode, sharing)


In [4]:
# Download Hadoop
!wget -q https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

# Extract Hadoop tar file
!tar -xzf hadoop-3.3.6.tar.gz

# Move Hadoop to /usr/local for global access
!mv hadoop-3.3.6 /usr/local/hadoop

# Verify folder structure
!ls /usr/local/hadoop


bin  include  libexec	      licenses-binary  NOTICE-binary  README.txt  share
etc  lib      LICENSE-binary  LICENSE.txt      NOTICE.txt     sbin


In [5]:
import os

# Set environment paths
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-11-openjdk-amd64"
os.environ["HADOOP_HOME"] = "/usr/local/hadoop"
os.environ["PATH"] = f"{os.environ['HADOOP_HOME']}/bin:{os.environ['JAVA_HOME']}/bin:" + os.environ["PATH"]

# Verify by printing Hadoop and Java paths
!echo $JAVA_HOME
!echo $HADOOP_HOME


/usr/lib/jvm/java-11-openjdk-amd64
/usr/local/hadoop


In [6]:
# Check Hadoop version to confirm successful installation
!hadoop version


Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c
Compiled by ubuntu on 2023-06-18T08:22Z
Compiled on platform linux-x86_64
Compiled with protoc 3.7.1
From source with checksum 5652179ad55f76cb287d9c633bb53bbd
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.6.jar


In [7]:
# Create config files for Hadoop (local single-node setup)
core_site = """<configuration>
 <property>
   <name>fs.defaultFS</name>
   <value>hdfs://localhost:9000</value>
 </property>
</configuration>
"""

hdfs_site = """<configuration>
 <property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:///root/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:///root/hdfs/datanode</value>
 </property>
</configuration>
"""

# Write configuration files
with open("/usr/local/hadoop/etc/hadoop/core-site.xml", "w") as f:
    f.write(core_site)
with open("/usr/local/hadoop/etc/hadoop/hdfs-site.xml", "w") as f:
    f.write(hdfs_site)

print("✅ Hadoop configuration files created successfully.")


✅ Hadoop configuration files created successfully.


In [8]:
# Create directories for Hadoop data
!mkdir -p /root/hdfs/namenode /root/hdfs/datanode

# Format Hadoop Namenode
!hdfs namenode -format


2025-11-11 15:25:14,947 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = aa5eef1b5175/172.28.0.12
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.3.6
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/netty-resolver-4.1.89.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-transport-classes-epoll-4.1.89.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-codec-socks-4.1.89.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-config-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-security-9.4.51.v20230217.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-all-4.1.89.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/had

In [9]:
# Create a sample directory in HDFS
!hdfs dfs -mkdir /testdir

# List directories in HDFS
!hdfs dfs -ls /


mkdir: Call From aa5eef1b5175/172.28.0.12 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
ls: Call From aa5eef1b5175/172.28.0.12 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused


In [10]:
!hdfs --daemon stop namenode
!hdfs --daemon stop datanode


✅ Result:

Hadoop 3.3.6 was successfully installed, configured, and verified in single-node (pseudo-distributed) mode on Google Colab.
The HDFS system was initialized, directories were created, and services started and stopped correctly