# Sqoop
![Sqoop](https://sqoop.apache.org/images/sqoop-logo.png)

- https://sqoop.apache.org/

## Setup

- download from https://downloads.apache.org/sqoop/1.4.7
- version 1.4.7

In [None]:
%%bash

# Download package
cd /opt/pkgs
# wget -q -c https://downloads.apache.org/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
wget -q -c http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
    
# unpack file and create link
tar -zxf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz -C /opt
ln -s /opt/sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop

# update commons-lang
rm /opt/sqoop/lib/commons-lang3-3.4.jar
cp /opt/hadoop/share/hadoop/yarn/timelineservice/lib/commons-lang-2.6.jar /opt/sqoop/lib

# update envvars.sh
cat >> /opt/envvars.sh << EOF
# Sqoop
export SQOOP_HOME=/opt/sqoop
export PATH=\${PATH}:\${SQOOP_HOME}/bin

EOF

cat /opt/envvars.sh

In [None]:
# Load environment variables
%load_ext dotenv
%dotenv -o /opt/envvars.sh
%env

### Mysql-connector

- https://dev.mysql.com/downloads/connector/j/

In [None]:
%%bash

# Download package
cd /opt/pkgs
wget -q -c https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java_8.0.22-1ubuntu18.04_all.deb
    
sudo dpkg -i mysql-connector-java_8.0.22-1ubuntu18.04_all.deb

cp /usr/share/java/mysql-connector-java-8.0.22.jar /opt/sqoop/lib

## Mysql installation

In [None]:
%%bash

sudo apt install -qq -y mysql-server unzip >> /tmp/install.log 2>&1

# Enable external access (from worker nodes)
sudo sed -i "s/^bind-address/#bind-address/g" /etc/mysql/mysql.conf.d/mysqld.cnf 

sudo service mysql restart
sudo service mysql status

# create hadoop user
sudo mysql -e "create user 'hadoop'"
sudo mysql -e "grant all privileges on *.* to 'hadoop'@'%'"
sudo mysql -e "flush privileges"

## Employees database setup

In [None]:
%%bash

# Download EmployeesDB sample database
cd /opt/pkgs
wget -q -c https://github.com/datacharmer/test_db/archive/master.zip

unzip master.zip

cd test_db-master

mysql -u hadoop < employees.sql

## Explore database

In [None]:
%%bash

mysql -u hadoop -e 'show databases'

printf "\n%40s\n\n" | tr ' ' '='

mysql -u hadoop -D employees -e 'show tables'

printf "\n%40s\n\n" | tr ' ' '='

mysql -u hadoop -D employees -e 'describe employees'

## Using sqoop

In [None]:
%%bash

sqoop list-databases --connect jdbc:mysql://hadoop --username hadoop

In [None]:
%%bash

sqoop list-tables --connect jdbc:mysql://hadoop/employees --username hadoop

In [None]:
%%bash

sqoop import --connect jdbc:mysql://hadoop/employees --username hadoop --table employees

In [None]:
%%bash

hdfs dfs -ls -h employees

hdfs dfs -head employees/part-m-00000

In [None]:
%%bash

rm employees.java

# Stopping mysql
sudo service mysql stop