# Introduction to Bash Scripting

The Unix command line helps users combine existing programs in new ways, automate repetitive tasks, and run programs on clusters and clouds.

Resources:
- [Data Camp - Intro to Shell](https://www.datacamp.com/courses/introduction-to-shell)
- [Data Camp - Intro to Bash Scripting](https://www.datacamp.com/learn/courses/introduction-to-bash-scripting)
- [Command on Mac](https://www.davidbaumgold.com/tutorials/command-line/#mac-os-x)
- [Unix](https://datasciencepractice.study/unix-systems.html)
- [Working with unix](https://seankross.com/the-unix-workbench/working-with-unix.html)

Advanced:
- [O'Reilly - Data Science at the Command line](https://www.oreilly.com/library/view/data-science-at/9781492087908/)

---
### Table of Content

* [Terminal](#terminal)
* [Helper](#help)
* [Manual](#manual)
* [Search](#search)
    * Current directory
    * Explore directory
    * Navigate directory
* [Edit](#edit)
    * Create & Remove
    * Copy
    * Move & Rename
    * Edit content of a file
* [View & Manipulate data](#view)
    * Helper
    * Stack
* [Run program](#run)
* [Combine command with Pipe](#pipe)
* [Count](#count)
* [Wildcards](#wildcards)
* [Sorting](#sorting)
* [Remove duplicate](#duplicate)
* [Stop a command](#stopcommand)
* [Variables](#variables)
* [For Loops](#forLoops)
* [Scripts](#script)
* [Arguments](#arguments)
* [Quotation marks](#quotation)
* [Numeric variables](#numeric)
* [Arrays](#arrays)
* [Associative Arrays](#associative)
* [IF statements](#if)
* [For and While loops](#for-part2)
* [Case statements](#case)
* [Functions](#functions)
---
---

### Terminal <a class="anchor" id="terminal"></a>

    clear = clear terminal window
    Q = Quit
    
---

### Helper <a class="anchor" id="help"></a>

    command --help
        i.e. -ls --help

---

### Manual <a class="anchor" id="manual"></a>
    
    man head
    
   * man automatically invokes less, so you may need to press spacebar to page through the information and :q to quit.

       * The one-line description under NAME tells you briefly what the command does, and the summary under SYNOPSIS lists all the flags it understands. Anything that is optional is shown in square brackets [...], either/or alternatives are separated by |, and things that can be repeated are shown by ..., so head's manual page is telling you that you can either give a line count with -n or a byte count with -c, and that you can give it any number of filenames.
        
---

### Search <a class="anchor" id="search"></a>

#### Current directory

   * <b>pwd</b> 
            = print working directory
     
---

#### Explore directory

   * <b>ls</b> 
            = list
    
        <b>ls</b> seasonal/winter/x
            = print list of all files in a relative path
            
        <b>ls</b> -lrS = list reverse order
    
        <b>ls</b> *.txt = print all txt files
            - "*" represent one or more character
            
        <b>ls</b> -R -F
            = R Recursively print all subdirectories & files
            = F formats dir with / & runnable program with * in front of the name
            

            
---
        
   * <b>find</b> . -type d -depth 1 
           find all direct directories
    
---

#### Navigate directory

   * <b>cd</b> 
           = change directory 
        (press tab for auto completion) 
        (use double quotation for file/dir with spaces)
        
        <b>cd</b> .. = moving up from directory
            - cd ../ = moving up to parent directory

        <b>cd</b> ~ = change to home directory
            The ~ is shorthand for your Home directory, so '~/Documents' is the Documents folder in your Home folder. 
            
---

### Edit <a class="anchor" id="edit"></a>

#### Create/Remove directory

   * <b>mkdir</b> dir1 
           = make directory
    
---
    
   * <b>rmdir</b> dir1 
           = remove directory
    
---

#### Create/Remove file

   * <b>touch</b> file_name 
            = create file
    
---
    
   * <b>echo</b> 'This is my content!' > my_new_file
        Creating a file with a small amount of content
        
---

   * <b>rm</b> file_name 
           = remove file
    
        rm -rf dir_name = remove all directory
            - Be careful when using rm. Unlike Windows Explorer and macOS Finder, there is no Recycle Bin when using the shell. Once you delete a file or a directory you will never be able to get it back. 
            - This is especially dangerous because it would be very easy to make a typo and run rm -rf /, which would delete every file on your computer. Make sure you never do this!
    
---

#### Copy file

   * <b>cp</b> file_name directory_name 
           = copy file
    
   * <b>cp</b> file_1 file_2 new_dir
            = copy 2 specific files from the same dir to a new dir
    
#### Copy directory
    
   * <b>cp</b> -r dir2 dir1 
           = copy recursively all files from diretory2 in dir1

   * <b>cp</b> dir1/file . 
           = copy file to current working directory (. is shortcut)
        
---



#### Move & Rename file
    
   * <b>mv</b> file_name directory_name 
           = move file
        - You can think of it as being just like the cp command, except that it deletes the original file/directory. 
        - mv can also be used to rename files & directories
        - i.e. mv main.txt dir1/new.txt
        
    <b>mv</b> file_1 file_2 new_dir
            = moves 2 specific files from the same dir to a new dir

#### Move & Rename directory
        
   * <b>mv</b> 2017-* 2017/ = move all files starting with 2017- to folder 2017

    <b>mv</b>  -v ~/Downloads/* ~/Videos/ 
            = moving all files from one folder to the other
        
---

    sudo = super user do


 ---

### Viewing & Manipulating files/data<a class="anchor" id="view"></a>

   * <b>cat</b> file1
            = print file to screen
                - The cat command lets you concatenate and print files to the screen.
                
                - The syntax is cat [file1] [file2] [file3] ... and the command will print one file after another. Of course the concatenation feature is rarely used, and the most common usage of cat is printing a single file to the screen.
                
                - when viewing multiple files you can use:
                    - spacebar to print the next page
                    - :n to pass to the next file
                    - :p to go back to the previous file
                    - :q to quite
        
 ---
 
   * <b>more</b> file1 
            = print long file using full screen
                - press enter or space to scroll down
                - press q to quit
        
---

   * <b>less</b> file1 
            = print long file to screen
                - allows to use arrow to scroll up and down
                - less is not installed on the embedded shell above, however you will find it on almost every system you encounter in real life.
        
---    

   * <b>head</b> file1 
            = print 5 first rows of file
                - head -1 file1 = print 1st row
        
---

   * <b>tail</b> file1 
           = print 5 last rows of ile
            - tail -1 file1 = print last row
    
             - tail -n +8 file1
                = print last 10 rows starting from line 8

---

   * <b>cut</b> file1 
            = print columns of certain files
    
            cut -d , -f 1-5,8 file.csv
                = print columns 1 to 5 and 8 of csv file
                NB: cut is a simple command. Hence, it will consider all , (in this example) as delimiters.

---

   * <b>grep</b> search_term file1 
                = print lines containing the search_terms in specific file
    
    common flags:

        -c: print a count of matching lines rather than the lines themselves
        
        -h: do not print the names of files when searching multiple files
        
        -i: ignore case (e.g., treat "Regression" and "regression" as matches)
        
        -l: print the names of files that contain matches, not the matches
        
        -n: print line numbers for matching lines
        
        -v: invert the match, i.e., only show lines that don't match
        
        -E: enable usage of extended regex pattern i.e. grep -E 'a|b' which filter for rows with a or b

---

* any command with output file can use **>** to save the output as a new file

        head -n 5 file_1 > first_5.csv

---

<b> MAGIC TRICK ALERT</b>:

While searching for a dir or a file, start typing the name then:
    
   * press tab, the shell will auto-complete the name the best it can
   * press tab twice, the shell will list the possible files names

 ---

### Editing a file<a class="anchor" id="edit"></a>

   * nano    
         easiest to use, but less powerful than vim. Not always installed.
    
---
    
   * vim
            very powerful, but hard to use. Installed by default on most systems.
    
---
    
   * vi 
           prehistoric precursor to vim. Only use this if nano and vim are unavailable on your system.

NB: Saving and closing is fairly straight-forward in nano (the keyboard commands are displayed on screen at all times) and basically impossible to remember in vim/vi. If you are going to need to edit files in the shell regularly it is probably worth learning at least some of the basic commands in vim, which can be done by using the vimtutor program installed alongside vim.

 ---

### Run program<a class="anchor" id="run"></a>

- python3 file.py = run python script

- jupyter console = open jupyter in terminal

---
### Re-run command<a class="anchor" id="re-run"></a>

* history
        
        will print a list of commands you have recently run
        
        !55 will run the 55th command in your history
        
        !head or !command will run the last command of the sort to have recently run

---
### Combine command with Pipe<a class="anchor" id="pipe"></a>

* **|** 
        pipe let you combine several commands by using the output of previous command for the next one
        
        example 1 :
                head -n 5 file_1.csv | tail -n 1 
                        gets the 5th row 
                        
        example 2:
                cut -f 2 -d , seasonal/summer.csv | grep -v Tooth
                output 2nd column of summer csv and select rows not containing Tooth

---
### Count<a class="anchor" id="count"></a>

* **wc** (word count)

        -c print the number of characters
        -w print the number of words
        -l print the number of lines
        
        example 1:
                grep 2017-07 seasonal/spring.csv | wc -l
                        output rows from July 2017 and count the lines
                        
        example 2 (using further commands):
        
            wc -l seasonal/* | grep -v total | sort -n | head -n 1
            
            count lines in all files in seasonal folder, remove lines with total, sort numerically in ascending order, output first line

---
### Wildcards<a class="anchor" id="wildcards"></a>

* <b> * </b>
        matches any number of characters
    
* <b> ? </b>
        matches ONE characters
    
* <b> [...] </b>

        matches any ONE characters inside the bracket i.e. [78]
    
* <b> {...} </b>

        matches any of the comma separated patterns inside the curly brackets, i.e. {*.txt, *.csv}

---
### Sorting<a class="anchor" id="sorting"></a>

* <b> sort </b>
        default is alphabetical ascending order
        
        common flags:
        
            -n numerical sort
            -r reverse (descending)
            -b ignore leading blanks
            -f fold case (= case insensitive)
            
        NB: sort is often you with grep to get rid of unwanted records then sort them

---
### Remove duplicate<a class="anchor" id="duplicate"></a>

* <b> uniq </b>
        remove adjacents duplicate NB: made to work on large files which most likely wouldn't fit in memory
        
        example 1:

                cut -d , -f 2 seasonal/winter.csv | grep -v Tooth | sort | uniq -c
                
                output 2nd column of file winter, exclude rows with Tooth, sort, output unique instance with count

---
### Text editing<a class="anchor" id="text_editing"></a>

* **Sed**
        command is mostly used to replace the text in a file. The below simple sed command replaces the word “unix” with “linux” in the file.
        
        sed s/team/team 1/ file.txt

---
### Stop a command<a class="anchor" id="stopcommand"></a>

* <b> Ctrl + C </b> or ** ^C **

---
### Variables<a class="anchor" id="variables"></a>

* **set**
        - print the complete list of environment variables (quite long)
        - use grep to filter the var you need
        
* **echo**
        - print the variable in the shell i.e. echo USER
        - if you want to print the value of the variable add a $ i.e. echo $USER
        - NB: environment variables are usually written with uppercase
        
* **local variables**
        - to assign:
            var_name=file_2.csv
            
        - example 1:
                winter_data=seasonal/winter.csv
                head -n 1 $winter_data
                
                prints the first row of winter.csv

---
### For Loops<a class="anchor" id="forLoops"></a>

* **For var in list; do something; done**
        
      - EXAMPLE 1:
      
        for filename in dir/*; do echo $filename; done
        
        loops through all files in dir assigned to variable filename, print these values to the screen
        
        
      - EXAMPLE 2 (using local variables):
      
        filename=dir/*
        for filename in $filename; do echo $filename; done
        
        loops through all **values** assigned to local variable filename, print these values to the screen
      
      - EXAMPLE 3:
        
        for file in seasonal/*csv; do grep 2017-07 $file | tail -n 1; done
        
        loop through all csv files in seasonal, assign to local variable file, for each value of file, filter row with 2017-07 and print last row

---
### Scripts<a class="anchor" id="script"></a>

* **nano**
    is a text editor
    
    create or edit a file in the shell
    
    command while editing:
        * ctrl + K = delete K
        * ctrl + U = un-delete a line
        * ctrl + O = save Output
        * ctrl + X = exit editor
        
* example 1:

    cp seasonal/spring.csv seasonal/summer.csv ~
    
    grep -h -v Tooth spring.csv summer.csv > temp.csv
    
    history | tail -n 3 > steps.sh
    
    bash steps.sh > steps.out
    
    cat steps.out
    
    
    
    copy 2 files to home directory, filter lines without a keyword (and don't print it to shell), save it in temp.csv, uses command history to saves last 3 steps, run the steps in a new command with bash, save the output to another file and print that file to the shell
    
    
<B>NB:  since the commands you type in are just text, you can store them in files for the shell to run over and over again. </B>
        

---
### Arguments<a class="anchor" id="arguments"></a>

* **ARGV**
        is the array of all the arguments given to the program

* `$`@ or `$`*

        means "all of the command-line parameters given to the script"
        
* example 1:

    nano unique-lines.sh
        (in the editor)
        sort $@ | uniq -c
    bash unique-lines.sh seasonal/*.csv
    
    
    create a script call unique-lines using the special command line parameters, save the script, run the script and specify the parameters to infer
    
* `$`1  `$`2  `$`3 

        parameters can be represented by numbers too. The script will use the parameters in order of appearance while running bash __.sh 1 2 3
        
* $#
        gives the length of arguments

---
### Quotation marks<a class="anchor" id="quotation"></a>

* **single quote**
        var1='NOW'
        var2='$var1'
        echo $var2
        
                output: $var1
                Shell interprets the content of the quote literally

* **Double quote**
        var1='NOW'
        var2="$var1"
        echo $var2
        
                output: NOW
                Shell interprets the content of the quote literally EXCEPT using $ and backticks
                
* **Backticks**

   ``$var1` 
           Shell runs the command and captures STDOUT into a variable, commonly called shell-within-a-shell
           
           example:
           
           var="The date is `date`."
           STOUD: The date is Tue 15 Feb.
           
           alternatively: $(date)

---
### Numeric variables<a class="anchor" id="numeric"></a>

* **expr**
        only accepts integers
        
        example 1:
        
            expr 3+4
        
        alternatively you can use double parentheses
        
            echo $((3+4))

* **bc**
        bc = basic calculator.
        Also accepts floats.
        
        scale arg to control how many decimal points to
        
        bc

Example of numeric variables:

>F° to C° converter:
>
>nano script.sh
>
>    #Get first ARGV into variable
>    
temp_f=`$1`
>
>    #Subtract 32
>    
temp_f2=`$`(echo "scale=2; `$`temp_f - 32" | bc)
>
>    #Multiply by 5/9
>    
temp_c=`$`(echo "scale=2; `$`temp_f2 * 5 / 9" | bc)
>
>    Print the celsius temp
>    echo `$`temp_c
>
>bash script.sh 100

---
### Arrays<a class="anchor" id="arrays"></a>

* **declare**
        declare -a my_first_array
            create an empty array
            
        alternatively
            my_first_array=(1 2 3)
            
* **append elements**
        my_first_array("Tomato")
        or
        my_first_array+=("Tomato")
            
* **array[@]**
        return all the array
        
        example 1:
            my_array=(1 3 4 5)
            echo $(my_array[@])
        
* **#array[@]**
        return length
        
* **array[i]**
        indexing similar than python (starts at 0)
        
* **array[@]:N:M**
        slicing array

* **array+=(elements)**
        add element

Example: Average temperature from 2 files

    #Create variables from the temperature data files
    temp_b="$(cat temps/region_B)"
    temp_c="$(cat temps/region_C)"

    #Create an array with these variables as elements
    region_temps=($temp_b $temp_c)

    #Call an external program to get average temperature
    average_temp=$(echo "scale=2; (${region_temps[0]} + ${region_temps[1]}) / 2" | bc)

    # Append average temp to the array
    region_temps+=($average_temp)

    #Print out the whole array
    echo ${region_temps[@]}

---
### Associative array<a class="anchor" id="associative"></a>

similar to Dictionary in Python

* **declare**
        declare -a my_first_array
            create an empty array
            
        NB: you must use declare
        
        example 1:
        
            declare -A city_details=([city_name]="New York" [population]=14000000)
            echo $(city_details["city_name"])
            
* **!**
        return keys
        
        example:
            echo ${!city_details[@]}
            
**NB: Indexing and slicing works similarly than arrays**

---
### IF<a class="anchor" id="if"></a>

* **if [CONDITION]; then #some code else #some other code fi**
        if [$x == 'A'] for string
        
        if (($x > 5)) for numerical values
        or
        if [$x -gt 5]

* **commong flags for arithmetic if**
        -eq  equal to
        -ne  not equal to
        -lt  less than
        -le  less than or equal to
        -gt  greather than
        -ge  greater than or equal to
        
* **other flags**
        -e  if file exists
        -s  if file exists and size greater than 0
        -r  if file exists and readable
        -w  if file exists and writable
        and many more
        
* **&&** and **||**
        if [[`$`x -gt 5 && `$`x -lt 10]]

Sorting models based on metadata training:

    # Extract Accuracy from first ARGV element
    accuracy=$(grep Accuracy $1 | sed 's/.* //')

    # Conditionally move into good_models folder
    if [ $accuracy -gt 90 ]; then
        mv $1 good_models/
    fi

    # Conditionally move into bad_models folder
    if [ $accuracy -lt 90 ]; then
        mv $1 bad_models/
    fi

---
### For Loop - Part 2 & While<a class="anchor" id="for-part2"></a>

* **For var in list; do something; done**


* **For loop number ranges**
        {START..STOP..INCREMENT}

        example 1:
        
        for x in {1..5..2};
        do echo x;
        done
        
        alternativetly
        
        for x ((x=2;x<=4;x+=2))
        do echo x;
        done
        
* **while [CONDITION]; do something; done**


* **if [CONDITION]; then #some code else #some other code fi**

---
### Case statement<a class="anchor" id="case"></a>

* <B>case 'STRINGVAR' in
        PATTERN1)
        COMMAND1;;
        PATTERN2)
        COMMAND2;;
        *)
        DEFAULT command;;
    esac**</B>
    
    
* example: Moving output files of models into various folders based on file name:

            # Use a FOR loop for each file in 'model_out/'
        for file in model_out/*
        do
            # Create a CASE statement for each file's contents
            case $(cat $file) in
              # Match on tree and non-tree models
              *"Random Forest"*|*GBM*|*XGBoost*)
              mv $file tree_models/ ;;
              *KNN*|*Logistic*)
              rm $file ;;
              # Create a default
              *) 
              echo "Unknown model in $file" ;;
            esac
        done


---
### Functions<a class="anchor" id="functions"></a>

* <B>function_name () {
        #function_code
        return #something
    }
    </B>
    or 
    
    <B>function function_name {
        #function_code
        return #something
    }
    </B>
    

---
**TBC**