# Day1 - Afternoon - More *nix

_Note: this is a static notebook (ie. not interactive). All exercises and commands in this session will be performed directly in a terminal window.  To launch a new terminal, click on `File>New>Terminal`._

## Bash (shell) scripting

* manage complex workflows more easily
* work on a cluster
    - distribute multiple similar jobs
    - good responsible coding practice
* beginning of reproducible research

### A simple command

```
$ echo 'Hello, world!'
```

### Make it a script

how many genes are annotated on chromosome 19 in our file, hg38genes.txt?

remember pipes?

`$ grep chr19 hg38genes.txt | wc -l`

## Looping with a script

Now lets get the number of genes on chromosomes 11, 12, 13, and 14. We could do each separately but that’s more typing and risks mistakes.

Let’s write a script for this . . .

```
$ nano countgenes.sh

#!/bin/bash

for i in chr11 chr12 chr13 chr14
do
   echo $i
   grep $i hg38genes.txt | wc -l
done
```

### A script with terminal input

```
$ nano inputscript.sh

#!/bin/bash
echo "this script takes input from the terminal."
echo "type an integer between 1 and 10:"
read MYNUM
echo "you typed $MYNUM"
```

### Practice exercises

1. Write a script that uses grep, cut, sort, and uniq with pipes to print the number of genes in hg38genes.txt annotated on each chromosome on the plus strand and the number of genes on the minus strand, for chromosomes 14, 15, 16, and 17.

2. Write a script that accepts terminal input and that will count the number of genes on the chromosome specified. The input should be only numerical, so your program will need to prepend the string chr. Be sure to use the -w option with grep so that chr1 really does only retrieve chr1 and not chr10, chr11 etc.

## While loops

`while` will continue executing the code in the loop as long as its condition is true:

```
$ nano while.sh

VAR=1
while [ $VAR -lt 5 ]
do
    echo "VAR=$VAR"
    VAR=$[ VAR+1 ]
done
```

NOTE that the test evaluation  [  is actually a program! so you need spaces before and after the square bracket. Possible tests for integers include -eq, -lt, -gt, -le, -ge. Strings can be compared with = and != . Lots of other tests are possible.


it’s easy to write a while loop that never finishes. Remember control-c will kill a running/runaway process.

example of while loop that never terminates:

```
VAR=1
while [ $VAR -lt 5 ]
do
    echo "VAR=$VAR"
done
```

## Conditions

if-then-else statements are very useful, when you want to execute code only if some condition is met. ‘fi’ terminates the if statement.

```
VAR=2
if [ $VAR -gt 5 ] ; then
    echo "VAR > 5"
else
    echo "VAR <= 5"
fi
```

additional “else” statements can be added, using elif:

```
VAR=2
if [ $VAR -gt 5 ] ; then
    echo "VAR > 5"
elif [ $VAR -lt 0 ]; then
    echo "VAR < 0"
else
    echo "0 <= VAR <= 5"
fi
```

## Conditional arguments

```
$ nano args.sh

#!/bin/bash

echo "the number of arguments is $#."
echo "the name of the program is $0."
echo "the first argument is $1."
echo "\$@ is a variable with all arguments: $@"


$ ./args.sh argument1 arg2 another argument
```

# awk
awk is an incredibly powerful unix utility for manipulating data/files!

```
$ awk '{print $1}' hg38genes.txt  # assumes tab/space delimited columns
$ awk '{if ($1=="chr2") print $5}' hg38genes.txt
$ awk '{if ($5 ~ /MIR/) print}' hg38genes.txt 
$ awk '{if ($5 ~ /MIR/ && $1!="chr3") print}' hg38genes.txt
```

# sed
The unix string editor, is also a very useful tool. There are many modes but one of the most useful works like find-and-substitute, where the pattern to find is given first and the substitution next.

```
$ sed -e s/chr/CHR/ hg38genes.txt
```

This will find the first occurrence of chr in each line and substitute it with CHR. To substitute all occurrences, specify ‘global’: 

```
$ sed -e s/chr/CHR/g hg38genes.txt
```


### Practice Exercises