# Introduction

In this demo we will be using a dataset that contains social networking, tagging, and music artist listening information 
from a set of 2K users from **[Last.fm](http://www.last.fm)** online music system. 

**[Last.fm Dataset](https://grouplens.org/datasets/hetrec-2011/):** 1892 users; 17632 artists; 12717 bi-directional user friend relations, i.e. 25434 (user_i, user_j) pairs; 92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]; 11946 tags; 186479 tag assignments (tas), i.e. tuples [user, tag, artist]. Last Update May, 2011.
         

## Understanding the dataset

Access http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-readme.txt in order to know better the dataset

## Downloading the dataset:

In [None]:
!wget http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip -q --show-progress
!unzip hetrec2011-lastfm-2k.zip
!rm hetrec2011-lastfm-2k.zip

In [None]:
!ls

In [None]:
!tail artists.dat

# HDFS Commands

You can interact with the Hadoop Distributed File System (HDFS) invoking
```bash
hdfs dfs <args>
```

## Listing the content of a directory

In [None]:
!hdfs dfs -ls /user

## Creating a directory

In [None]:
!hdfs dfs -mkdir /user/theo

In [None]:
!hdfs dfs -ls /user

## Copying files from the local system to HDFS

In [None]:
!hdfs dfs -ls /user/theo

In [None]:
#Using put
!hdfs dfs -put artists.dat /user/theo

In [None]:
#Using copyFromLocal
!hdfs dfs -copyFromLocal user_artists.dat /user/theo

In [None]:
!hdfs dfs -ls /user/theo

## Listing the content of a file

In [None]:
!hdfs dfs -cat /user/theo/artists.dat

In [None]:
!hdfs dfs -tail /user/theo/artists.dat

## Creating a empty file

In [None]:
#touchz - Create a file of zero length. An error is returned if the file exists with non-zero length.
!hdfs dfs -touchz /user/theo/newfile.txt

In [None]:
!hdfs dfs -ls /user/theo/

## Copy file from HDFS to local system

In [None]:
!hdfs dfs -get /user/theo/newfile.txt newfile.txt

In [None]:
!ls

## Merging files

In [None]:
!hdfs dfs -put tags.dat /user/theo

In [None]:
#getmerge - Takes a source directory and a destination file as input and concatenates files in src into the destination local file
!hdfs dfs -getmerge /user/theo/tags.dat /user/theo/artists.dat artisttags.txt

In [None]:
!cat artisttags.txt

## Verifiying replication factory

In [None]:
!hdfs dfs -stat %r /user/theo/artists.dat

## Changing replication factory

In [None]:
!hdfs dfs -setrep 3 /user/theo/artists.dat

In [None]:
#Replication factory is the second collumn
!hdfs dfs -ls /user/theo/artists.dat

In [None]:
!hdfs dfs -stat %r /user/theo/artists.dat

## Deleting a file

In [None]:
!hdfs dfs -ls /user/theo

In [None]:
!hdfs dfs -rm /user/theo/tags.dat

In [None]:
!hdfs dfs -ls /user/theo

In [None]:
!hdfs dfs -rm /user/theo/*

In [None]:
!hdfs dfs -ls /user/theo

## Deleting a directory

In [None]:
!hdfs dfs -ls /user

In [None]:
!hdfs dfs -rmdir /user/theo/

In [None]:
!hdfs dfs -ls /user

## Getting help

In [None]:
#usage - Return the help for an individual command
!hdfs dfs -usage chmod

In [None]:
!hdfs dfs -help

## More commands

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html