## Copying files from HDFS to Local

We can copy files from HDFS to local file system either by using `copyToLocal` or `get` command.

* `hdfs dfs -copyToLocal` or `hdfs dfs -get` – to copy files or directories from HDFS to local filesystem.
* It will read all the blocks using index in sequence and construct the file in local file system.
* If the target file or directory already exists in the local file system, `get` will fail saying **already exists**

In [1]:
%%sh

hdfs dfs -help get

-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Copy files that match the file pattern <src> to the local name.  <src> is kept. 
  When copying multiple files, the destination must be a directory. Passing -f
  overwrites the destination if it already exists and -p preserves access and
  modification times, ownership and the mode.


In [2]:
%%sh

hdfs dfs -help copyToLocal

-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Identical to the -get command.


```{warning}
This will copy the entire folder from `/user/${USER}/retail_db` to local home directory and you will see `/home/${USER}/retail_db`. 
```

In [3]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 6 items
drwxr-xr-x   - nghiaht7 supergroup          0 2021-08-24 21:35 /user/nghiaht7/retail_db/categories
drwxr-xr-x   - nghiaht7 supergroup          0 2021-08-24 21:35 /user/nghiaht7/retail_db/customers
drwxr-xr-x   - nghiaht7 supergroup          0 2021-08-24 21:35 /user/nghiaht7/retail_db/order_items
drwxr-xr-x   - nghiaht7 supergroup          0 2021-08-24 21:35 /user/nghiaht7/retail_db/orders
-rw-r--r--   1 nghiaht7 supergroup         60 2021-08-24 21:35 /user/nghiaht7/retail_db/part-00000
drwxr-xr-x   - nghiaht7 supergroup          0 2021-08-24 21:35 /user/nghiaht7/retail_db/products


In [4]:
%%sh

ls -ltr /home/${USER}/

total 88
drwxrwxrwx  6 nghiaht7 nghiaht7 4096 Thg 6   7 22:27 hadoop
drwxrwxrwx  7 nghiaht7 nghiaht7 4096 Thg 6   7 22:52 pyspark
drwxrwxrwx  5 nghiaht7 nghiaht7 4096 Thg 6   7 22:53 scala_playground
drwxrwxrwx  3 nghiaht7 nghiaht7 4096 Thg 6   8 02:34 nghiaht962
drwxrwxrwx  8 nghiaht7 nghiaht7 4096 Thg 6  13 20:56 pythonic
drwxrwxr-x  5 nghiaht7 nghiaht7 4096 Thg 6  15 01:07 distributed-computing
drwxrwxr-x  3 nghiaht7 nghiaht7 4096 Thg 6  15 09:56 nlp-in-action
drwxrwxr-x  6 nghiaht7 nghiaht7 4096 Thg 7  10 06:37 pydata-in-action
drwxrwxr-x  2 nghiaht7 nghiaht7 4096 Thg 7  25 23:11 pypy-test
-rw-rw-r--  1 nghiaht7 nghiaht7  206 Thg 7  25 23:14 how-fast.py
drwxrwxr-x  2 nghiaht7 nghiaht7 4096 Thg 7  26 15:09 DSA
drwxrwxr-x  6 nghiaht7 nghiaht7 4096 Thg 7  26 15:28 pystat
drwxrwxr-x  2 nghiaht7 nghiaht7 4096 Thg 7  27 11:22 jax
drwxrwxr-x  2 nghiaht7 nghiaht7 4096 Thg 7  28 21:45 jetbrain-edu
drwxrwxrwx  8 nghiaht7 nghiaht7 4096 Thg 8   2 08:18 fastai-adventure
drwxrwxr-x  3 nghiaht7 n

In [5]:
%%sh

mkdir ~/data-engineer/data-engineering-essentials/to-local

In [6]:
%%sh

hdfs dfs -get /user/${USER}/retail_db ~/data-engineer/data-engineering-essentials/to-local

In [9]:
%%sh

ls -ltr -R ~/data-engineer/data-engineering-essentials/to-local

/home/nghiaht7/data-engineer/data-engineering-essentials/to-local:
total 4
drwxr-xr-x 7 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 retail_db

/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db:
total 24
drwxr-xr-x 2 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 categories
drwxr-xr-x 2 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 customers
drwxr-xr-x 2 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 order_items
drwxr-xr-x 2 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 orders
-rw-r--r-- 1 nghiaht7 nghiaht7   60 Thg 8  24 21:40 part-00000
drwxr-xr-x 2 nghiaht7 nghiaht7 4096 Thg 8  24 21:40 products

/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/categories:
total 4
-rw-r--r-- 1 nghiaht7 nghiaht7 1029 Thg 8  24 21:40 part-00000

/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/customers:
total 932
-rw-r--r-- 1 nghiaht7 nghiaht7 953719 Thg 8  24 21:40 part-00000

/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/orde

```{note}
This will fail as retail_db folder already exists.
```

In [10]:
%%sh

hdfs dfs -get /user/${USER}/retail_db ~/data-engineer/data-engineering-essentials/to-local

get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/categories/part-00000': File exists
get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/customers/part-00000': File exists
get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/order_items/part-00000': File exists
get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/orders/part-00000': File exists
get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/part-00000': File exists
get: `/home/nghiaht7/data-engineer/data-engineering-essentials/to-local/retail_db/products/part-00000': File exists


CalledProcessError: Command 'b'\nhdfs dfs -get /user/${USER}/retail_db ~/data-engineer/data-engineering-essentials/to-local\n'' returned non-zero exit status 1.

```{note}
Alternative approach, where the folder and contents are copied directly.
```

In [None]:
%%sh

rm -rf /home/${USER}/retail_db

In [None]:
%%sh

ls -ltr /home/${USER}

In [None]:
%%sh

hdfs dfs -get /user/${USER}/retail_db /home/${USER}

In [None]:
%%sh

ls -ltr /home/${USER}/retail_db/*

* We can also use patterns while using `get` command to get files from HDFS to local file system. Also, we can pass multiple files or folders in HDFS to `get` command.

In [None]:
%%sh

rm -rf /home/${USER}/retail_db

In [None]:
%%sh

ls -ltr /home/${USER}

In [None]:
%%sh

mkdir /home/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/order* /home/${USER}/retail_db

In [None]:
%%sh

ls -ltr /home/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/departments /user/${USER}/retail_db/products /home/${USER}/retail_db

In [None]:
%%sh

ls -ltr /home/${USER}/retail_db

In [None]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/categories /user/${USER}/retail_db/customers /home/${USER}/retail_db

In [None]:
%%sh

ls -ltr /home/${USER}/retail_db