
___Introduction to Unix Bash commands___

While I use Jupyter notebooks for illustration purposes, it is more common to directly use [terminal](https://en.wikipedia.org/wiki/Terminal_(macOS)). You can find it in `Others -> Terminal` or by spotlight search.

If you are also interested in using Bash in notebook, please checkout [takluyver/bash_kernel](https://github.com/takluyver/bash_kernel)

<hr>

#### Introduction

Bash commands are no different from many other languages such as Java or Python. We can what we code. For example, we can print out the current working directory.

In [6]:
!which python 

/usr/local/bin/python


In [7]:
!which git

/usr/bin/git


In [8]:
!which go

In [9]:
pwd

'/content'

We can also print out what are in the current working directories.

In [10]:
ls

[0m[01;34msample_data[0m/


In [11]:
%cd sample_data 
!ls 

/content/sample_data
anscombe.json		      mnist_test.csv
california_housing_test.csv   mnist_train_small.csv
california_housing_train.csv  README.md


In [12]:
!cat README.md

This directory includes a few sample datasets to get you started.

*   `california_housing_data*.csv` is California housing data from the 1990 US
    Census; more information is available at:
    https://developers.google.com/machine-learning/crash-course/california-housing-data-description

*   `mnist_*.csv` is a small sample of the
    [MNIST database](https://en.wikipedia.org/wiki/MNIST_database), which is
    described at: http://yann.lecun.com/exdb/mnist/

*   `anscombe.json` contains a copy of
    [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet); it
    was originally described in

    Anscombe, F. J. (1973). 'Graphs in Statistical Analysis'. American
    Statistician. 27 (1): 17-21. JSTOR 2682899.

    and our copy was prepared by the
    [vega_datasets library](https://github.com/altair-viz/vega_datasets/blob/4f67bdaad10f45e3549984e17e1b3088c731503d/vega_datasets/_data/anscombe.json).


Just as graphical user interfaces (GUIs), we can speak "bash language" to interact with our computers. In fact, they are more powerful. For example, you cannot use the [HPC](https://hpc.uiowa.edu/) systems until you know something about shell programming. Note that Bash is only one of the shell programs but probably the most popular one.

---

#### Working directories

Let's get started by the concept of ___working directory___. As its name suggests, working directory is where you work in, or just which folder/directory you are at right now. As the previous example shows, there is a ___program___ called `pwd` that can help us do such thing.

In [13]:
from google.colab import drive 
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [14]:
pwd

'/content/sample_data'

In [15]:
cd ..

/content


In [16]:
ls

[0m[01;34mgdrive[0m/  [01;34msample_data[0m/


In [17]:
%cd /content/gdrive/My\ Drive/

/content/gdrive/My Drive


In [18]:
!pwd

/content/gdrive/My Drive


In [19]:
!ls

 anh-o-phap
'Colab Notebooks'
 Courses
'Data-Driven Feature Characterization Techniques for Laser Printer Attribution.pdf'
 du-lich
 hinh-cuoi
'Identifying 3D Printer Residual Data via Open Source Documentation .pdf'
 IMG_2749.JPG
 IMG_2752.JPG
 IMG_2754.JPG
 IMG_2758.JPG
 IMG_2759.JPG
 IMG_2762.JPG
 IMG_2784.JPG
 IMG_2800.JPG
 IMG_2801.JPG
'IMG_2816 (1).JPG'
 IMG_2816.JPG
 IMG_2872.JPG
 IMG_2886.JPG
 IMG_2905.JPG
'IMG_2906 (1).JPG'
 IMG_2906.JPG
 IMG_2907.JPG
 IMG_2908.JPG
 IMG_2909.JPG
 IMG_2910.JPG
 IMG_2911.JPG
 IMG_2912.JPG
 IMG_2913.JPG
'IMG_2914 (1).JPG'
 IMG_2914.JPG
'IMG_2915 (1).JPG'
 IMG_2915.JPG
 IMG_2916.JPG
 IMG_2917.JPG
 IMG_2918.JPG
 IMG_2919.JPG
 IMG_2920.JPG
'IMG_2921 (1).JPG'
 IMG_2921.JPG
'IMG_2922 (1).JPG'
'IMG_2922 (2).JPG'
 IMG_2922.JPG
 IMG_2924.JPG
 IMG_2925.JPG
 IMG_2926.JPG
 IMG_2927.JPG
 IMG_2928.JPG
 IMG_2929.JPG
 IMG_2930.JPG
 IMG_2931.JPG
 IMG_2934.JPG
 IMG_2935.JPG
'IMG_2936 (1).JPG'
 IMG_2936.JPG
 IMG_2937.JPG
 IMG_2938.JPG
 IMG_2940.JPG
 IMG_2941.JPG
 

`!mkdir /content/gdrive/My\ Drive/Courses`

`!mkdir /content/gdrive/My\ Drive/Courses/python-DS`

In [20]:
!mkdir /content/gdrive/My\ Drive/Courses2

In [21]:
!mkdir /content/gdrive/My\ Drive/Courses2/python-DS

In [22]:
!pwd
import os
os.chdir('/content/gdrive/My Drive/Courses/python-DS')
!pwd

/content/gdrive/My Drive
/content/gdrive/My Drive/Courses/python-DS


And, as before, we can list what we have in our current working directory by `ls`

In [23]:
ls

[0m[01;34msample-data[0m/


What if I do not want to stay here? Suppose I want to go to ___sample-data___ folder, I can ___change directory___ by `cd`

In [24]:
cd sample-data

/content/gdrive/My Drive/Courses/python-DS/sample-data


Now we can verify that we indeed changed our directory by `pwd` and `ls`

In [25]:
pwd

'/content/gdrive/My Drive/Courses/python-DS/sample-data'

In [26]:
ls

ag_news.csv       happy_test.txt  sample.html        [0m[01;34mterrorists[0m/
another_test.txt  karate.gml      sample_tweets.csv


Note that we use escaping characters for each space for our path (although this is actually NOT a good habit)

Instead of typing everything, we can use ___relative path___. The name is pretty self-explanatory: we can refer to a place, relative to our current working directory. Two important notations here:
- `.` (a dot) means current directory
- `..` (two dots without spaces) means upper level directory.

For example, we can use paths after `ls` command to print files in that given path.

In [27]:
ls .

ag_news.csv       happy_test.txt  sample.html        [0m[01;34mterrorists[0m/
another_test.txt  karate.gml      sample_tweets.csv


In [28]:
ls ..

[0m[01;34msample-data[0m/


Therefore, by `..` (two dots ), we can go back one level without switching working directory:

In [29]:
!pwd
!cat sample_tweets.csv

/content/gdrive/My Drive/Courses/python-DS/sample-data
We'll have a slice of that 👇 #PiDay https://t.co/Gfzhl5n9Ly
That smile you get on your face when you realize you're #AlwaysAHawkeye. https://t.co/5wiAl9sbiX
In a first of its kind study, #uiowa researchers proved that early intervention for children with hearing loss can… https://t.co/7dcmQ4k63G
Hawkeyes are #B1GTourney champions! Congratulations, @IowaWBB! It's time to go dancing. #FightForIowa 🖤💛🏀 https://t.co/cS4P336JfI
Judges have a history of finding ways to avoid big sentences in white collar economic crimes, with the exception of… https://t.co/KeGgfeiUSs
Creator and co-star of the hit Netflix series Love, Hawkeye Paul Rust said it best this weekend when he visited Iow… https://t.co/jLpGSp3XCd
On #InternationalWomensDay we're celebrating the many incredibly talented and inspirational women of #uiowa. 🖤💛 https://t.co/3y2U8EyF2U
Current Mood: 😁🏆  #MondayMotivation 🖤💛🏀 @IowaWBB https://t.co/klUrInwN7c
New: Montserrat Fuentes, de

Finally, it is noteworthy that `~` (tilde) means ___home directory___ in unix systems.

In [30]:
!ls ~

In [31]:
!echo $HOME

/root


---

#### Options/Input arguments 

Shell commands can take input arguments or options. A convention is to use `-` (dash) to specify arguments. For example, we can ask `ls` to show detailed information of each file/folder:

In [32]:
ls -l

total 1853
-rw------- 1 root root 1883166 Mar 26  2019 ag_news.csv
-rw------- 1 root root      46 May 27  2020 another_test.txt
-rw------- 1 root root     929 Mar 17  2012 happy_test.txt
-rw------- 1 root root    5077 Mar 26  2019 karate.gml
-rw------- 1 root root     152 May 27  2020 sample.html
-rw------- 1 root root    1755 Mar 26  2019 sample_tweets.csv
drwx------ 2 root root    4096 Mar  1 02:21 [0m[01;34mterrorists[0m/


We can aggregate different options by directly appending options one after another. The following example shows how to show size in human readable formats (`-h` option) along with a detailed view (`-l`)

In [33]:
ls -lh

total 1.9M
-rw------- 1 root root 1.8M Mar 26  2019 ag_news.csv
-rw------- 1 root root   46 May 27  2020 another_test.txt
-rw------- 1 root root  929 Mar 17  2012 happy_test.txt
-rw------- 1 root root 5.0K Mar 26  2019 karate.gml
-rw------- 1 root root  152 May 27  2020 sample.html
-rw------- 1 root root 1.8K Mar 26  2019 sample_tweets.csv
drwx------ 2 root root 4.0K Mar  1 02:21 [0m[01;34mterrorists[0m/


Sometimes commands take in arguments for various purposes. Again, using `ls` as example, it can take ___path___ as an argument. Without the path, it will by default show the current listings, as shown above. Given a path, it will list items in that path:

In [None]:
ls ../

[0m[01;34msample-data[0m/


In [34]:
ls ../sample-data/terrorists/

terrorist.gml  terrorist.groups  terrorist.names  terrorist.pairs


Note that all these options can hardly be memorized. Often we will refer to the manual (or documentation). To do this, we can use `man command_name`. For example:

In [35]:
man ls | head -20

Here, `man` is a ___command___ that takes one input argument (which should be a Bash command) and outputs the corresponding manual. Therefore, we can definitely pull up the manual for `man` 🤓

In [36]:
man man | head -20

Note that I use `| head -20` to limit the number of output to 20 lines/rows. `|` is ___pipe character___ and `head` is a command to show the ___head___ of some output, where `- 20` limit to the first 20 lines/rows. Detailed coverage is beyond the scope of this workshop though.

In [37]:
!cat > sample.txt

hi class
class
nice to meet you
you
have you had lunch?
^C


In [38]:
ls -l

total 1853
-rw------- 1 root root 1883166 Mar 26  2019 ag_news.csv
-rw------- 1 root root      46 May 27  2020 another_test.txt
-rw------- 1 root root     929 Mar 17  2012 happy_test.txt
-rw------- 1 root root    5077 Mar 26  2019 karate.gml
-rw------- 1 root root     152 May 27  2020 sample.html
-rw------- 1 root root    1755 Mar 26  2019 sample_tweets.csv
-rw------- 1 root root      56 Jun 28 12:11 sample.txt
drwx------ 2 root root    4096 Mar  1 02:21 [0m[01;34mterrorists[0m/


In [39]:
cat sample.txt

hi class
class
nice to meet you
you
have you had lunch?


In [40]:
!touch sample2.txt

In [41]:
ls -l

total 1853
-rw------- 1 root root 1883166 Mar 26  2019 ag_news.csv
-rw------- 1 root root      46 May 27  2020 another_test.txt
-rw------- 1 root root     929 Mar 17  2012 happy_test.txt
-rw------- 1 root root    5077 Mar 26  2019 karate.gml
-rw------- 1 root root       0 Jun 28 12:13 sample2.txt
-rw------- 1 root root     152 May 27  2020 sample.html
-rw------- 1 root root    1755 Mar 26  2019 sample_tweets.csv
-rw------- 1 root root      56 Jun 28 12:11 sample.txt
drwx------ 2 root root    4096 Mar  1 02:21 [0m[01;34mterrorists[0m/


In [42]:
!rm sample.txt sample2.txt

In [43]:
ls -l

total 1853
-rw------- 1 root root 1883166 Mar 26  2019 ag_news.csv
-rw------- 1 root root      46 May 27  2020 another_test.txt
-rw------- 1 root root     929 Mar 17  2012 happy_test.txt
-rw------- 1 root root    5077 Mar 26  2019 karate.gml
-rw------- 1 root root     152 May 27  2020 sample.html
-rw------- 1 root root    1755 Mar 26  2019 sample_tweets.csv
drwx------ 2 root root    4096 Mar  1 02:21 [0m[01;34mterrorists[0m/


In [44]:
!which python

/usr/local/bin/python


In [45]:
!which python3

/usr/bin/python3


In [46]:
!echo $PATH

/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin


In [47]:
!echo $HOME

/root


In [48]:
!uname

Linux


In [49]:
!ps

    PID TTY          TIME CMD
      1 ?        00:00:04 node
     16 ?        00:00:00 tail
     49 ?        00:00:05 jupyter-noteboo
     50 ?        00:00:02 dap_multiplexer
     61 ?        00:00:22 python3
     81 ?        00:00:04 python3
    294 ?        00:00:00 bash
    295 ?        00:00:00 drive
    296 ?        00:00:00 grep
    402 ?        00:00:00 drive
    416 ?        00:00:00 fusermount <defunct>
    457 ?        00:00:00 bash
    458 ?        00:00:00 tail
    459 ?        00:00:00 python3
    772 ?        00:00:00 ps


In [50]:
!whoami

root


In [51]:
!which git

/usr/bin/git


In [52]:
!which pip

/usr/local/bin/pip


In [53]:
!which pip3

/usr/local/bin/pip3


In [54]:
!pip install missingno



### Daily best practice of DS 

`wc` stands for word count. As the name implies, it is mainly used for counting purpose. It is used to find out number of lines, word count, byte and characters count in the files specified in the file arguments

In [55]:
!wc ag_news.csv

   7601  280558 1883166 ag_news.csv


The `uniq` command in Linux is a command line utility that reports or filters out the repeated lines in a file. 
In simple words, `uniq` is the tool that helps to detect the adjacent duplicate lines and also deletes the duplicate lines

In [56]:
!uniq happy_test.txt

I'm happy for him...really, I am. She's an amazing girl, and they deserve each other. He's happy &amp; thats all that matters...right?.....
Feel so happy with no reason... Just happy... Hey my brain, am I missing something? :))
We finished our first season of @TheBEATDance &amp; I am so happy &amp; proud &amp; thankful &amp; overwhelmed &amp; lots of other good stuff! So Amazing #2013
am i allowed to be happy about something, or do yo wanna distroy the little i have left?
I am so happy right now I can't even focus on anything else
Why am I being sneaked around her fam when I'm open about us.... But we both happy shit don't add up.
Heavens suppose to be the happiest place in the world I am happy everyday with the people I love but I feel like I live in heaven everyday:)
I am  so happy since I have get an $100,00 STARBUCKS GIFT-CARD for Free. I grab it here http://t.co/cg8M1Ubq
I am one #happy girl :)
I Am So HAPPY .


`sed` perform lot’s of function on file like, searching, find and replace, insertion or deletion. Simply speaking, it's used for searching and replacement

In [57]:
!sed 's/amazing/interesting/' happy_test.txt

I'm happy for him...really, I am. She's an interesting girl, and they deserve each other. He's happy &amp; thats all that matters...right?.....
Feel so happy with no reason... Just happy... Hey my brain, am I missing something? :))
We finished our first season of @TheBEATDance &amp; I am so happy &amp; proud &amp; thankful &amp; overwhelmed &amp; lots of other good stuff! So Amazing #2013
am i allowed to be happy about something, or do yo wanna distroy the little i have left?
I am so happy right now I can't even focus on anything else
Why am I being sneaked around her fam when I'm open about us.... But we both happy shit don't add up.
Heavens suppose to be the happiest place in the world I am happy everyday with the people I love but I feel like I live in heaven everyday:)
I am  so happy since I have get an $100,00 STARBUCKS GIFT-CARD for Free. I grab it here http://t.co/cg8M1Ubq
I am one #happy girl :)
I Am So HAPPY .


In [None]:
!echo 'I like programming so much' | awk '{ print $4 $5}'

somuch


---

#### Conclusions

Due to time contraints, we can only cover these simple examples. There are really a lot more to read: `grep`, `ssh`, `scp`, `ps`, `rm`, etc... Bash command is really powerful and used extensitvely for various purposes. Below I list two tutorials for Bash that I find really helpful.

Further readings:

- http://www.bash.academy/
- https://ryanstutorials.net/bash-scripting-tutorial/

## Exercises

1. Create a new folder named `Practice`
2. In `Practice`, create a `test.txt` file named with content `Hello world`
3. In the parent directory of `Practice`, create en empty `test2.txt` file. 
4. Remove both `test.txt` and `test2.txt` 