# The Unix Shell

[course link](https://swcarpentry.github.io/shell-novice/)

[workshop shedule](https://indico.cern.ch/event/1190572/timetable/)


Keynotes:

- duration: 70 mins + 30 mins break + 60 mins
- Nelle’s Pipeline: A Typical Problem:
    + navigate to a file/directory
    + create a file/directory
    + check the length of a file
    + chain commands together
    + retrieve a set of files
    + iterate over files
    + run a shell script containing her pipeline

## 0. setup

#### Option 1: type every command under 'Terminal' (Linux, Mac OS, Unix) or 'Command Line' (Windows). 

Note: These tools are accessible with the operating systems.

#### Option 2: install Anaconda (or other conda-based python distribution) and use the terminal inside jupyter notebook. Follow the installation guide of Anaconda.

Setup on windows: install `Git Bash` on windows and make it accessible to the system command line tool (it is not the default setting on the first step).

#### Option 3: install bash kernel for jupyter and run this notebook directly. The setup

To install and use the bash kernel on jupyter

```!conda install -c conda-forge bash_kernel -y```

or if you have mamba package manager


```!mamba install -c conda-forge bash_kernel -y```

In [None]:
# this is a shell command in jupyter notebook
# !command with python kernel
# command with bash kernel

help

### Download course data

In [1]:
# clear current directory
cd ~/TheCarpentries/2022-09-28-upr-online
rm shell-lesson-data.zip
rm -rf shell-lesson-data

In [2]:
# download course file with wget
wget https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip

--2022-09-28 01:31:44--  https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip
Resolving swcarpentry.github.io (swcarpentry.github.io)... 2606:50c0:8001::153, 2606:50c0:8003::153, 2606:50c0:8002::153, ...
Connecting to swcarpentry.github.io (swcarpentry.github.io)|2606:50c0:8001::153|:443... 


In [3]:
# or download with curl
curl https://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip --output shell-lesson-data.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  449k  100  449k    0     0   844k      0 --:--:-- --:--:-- --:--:--  844k


In [4]:
# unzip course file shell-lesson-data.zip
# to current directory
unzip shell-lesson-data.zip -d ./ 

Archive:  shell-lesson-data.zip
   creating: ./shell-lesson-data/
   creating: ./shell-lesson-data/north-pacific-gyre/
   creating: ./shell-lesson-data/exercise-data/
   creating: ./shell-lesson-data/exercise-data/writing/
  inflating: ./shell-lesson-data/exercise-data/writing/LittleWomen.txt  
  inflating: ./shell-lesson-data/exercise-data/writing/haiku.txt  
   creating: ./shell-lesson-data/exercise-data/creatures/
  inflating: ./shell-lesson-data/exercise-data/creatures/basilisk.dat  
  inflating: ./shell-lesson-data/exercise-data/creatures/unicorn.dat  
  inflating: ./shell-lesson-data/exercise-data/creatures/minotaur.dat  
   creating: ./shell-lesson-data/exercise-data/animal-counts/
  inflating: ./shell-lesson-data/exercise-data/animal-counts/animals.csv  
 extracting: ./shell-lesson-data/exercise-data/numbers.txt  
   creating: ./shell-lesson-data/exercise-data/proteins/
  inflating: ./shell-lesson-data/exercise-data/proteins/ethane.pdb  
  inflating: ./shell-lesson-data/exercis

## 1. navigate to a file/directory

key commands

- `ls` = list
- `cd` = change directory
- `pwd` = print working directory

In [5]:
# display current directory
!pwd

pwd
/home/pi/TheCarpentries/2022-09-28-upr-online


In [6]:
# list current directory
ls

LICENSE    shell-lesson-data      unix-shell.ipynb
README.md  shell-lesson-data.zip  version-control-with-git.ipynb


In [7]:
# list home directory, ~ denotes home directory
# Question: where is your home directory?
ls ~

Bookshelf  gems                      mambaforge  scripts         Videos
Desktop    jupyterhub_cookie_secret  Music       Templates       workshop
Documents  jupyterhub-proxy.pid      Pictures    TheCarpentries
Downloads  jupyterhub.sqlite         Public      tmp


In [8]:
# list current directory, . denotes current directory
ls ./

LICENSE    shell-lesson-data      unix-shell.ipynb
README.md  shell-lesson-data.zip  version-control-with-git.ipynb


In [9]:
# list a specific directory
ls ./shell-lesson-data

exercise-data  north-pacific-gyre


In [None]:
# help on a command
# command --help
# or man command
ls --help

### Exercise: options of ls

- `ls -a`
- `ls -F`
- `ls -r`
- `ls -R`

In [11]:
ls -a

.     .gitignore          README.md              unix-shell.ipynb
..    .ipynb_checkpoints  shell-lesson-data      version-control-with-git.ipynb
.git  LICENSE             shell-lesson-data.zip


In [12]:
ls -F

LICENSE    shell-lesson-data/     unix-shell.ipynb
README.md  shell-lesson-data.zip  version-control-with-git.ipynb


In [13]:
ls -r

version-control-with-git.ipynb  shell-lesson-data.zip  README.md
unix-shell.ipynb                shell-lesson-data      LICENSE


In [14]:
ls -R

.:
LICENSE    shell-lesson-data      unix-shell.ipynb
README.md  shell-lesson-data.zip  version-control-with-git.ipynb

./shell-lesson-data:
exercise-data  north-pacific-gyre

./shell-lesson-data/exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./shell-lesson-data/exercise-data/animal-counts:
animals.csv

./shell-lesson-data/exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./shell-lesson-data/exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./shell-lesson-data/exercise-data/writing:
haiku.txt  LittleWomen.txt

./shell-lesson-data/north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt


In [15]:
cd shell-lesson-data

In [16]:
pwd

/home/pi/TheCarpentries/2022-09-28-upr-online/shell-lesson-data


### Exercise: use of relative and absolute directories

- `cd ../shell-lesson-data`
- `cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data`
- `cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data.zip`

In [17]:
# .. denotes the parent directory

cd ../shell-lesson-data

In [18]:
cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data

In [19]:
cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data.zip

bash: cd: /home/pi/TheCarpentries/2022-09-28-upr-online/shell-lesson-data.zip: Not a directory


: 1

## 2. create/change a file/directory

key commands

- `mkdir` = make a directory
- `touch` = create a new file
- `mv` = move
- `cp` = copy
- `rm` = remove

In [20]:
pwd

/home/pi/TheCarpentries/2022-09-28-upr-online/shell-lesson-data


In [21]:
rm -r thesis

rm: cannot remove 'thesis': No such file or directory


: 1

In [22]:
# create a directory
mkdir thesis

In [23]:
# check if the directory has been created
ls

exercise-data  north-pacific-gyre  thesis


In [24]:
# create a directory
mkdir thesis

mkdir: cannot create directory ‘thesis’: File exists


: 1

In [25]:
rm thesis/paper.txt

rm: cannot remove 'thesis/paper.txt': No such file or directory


: 1

In [26]:
# create a file thesis/paper.txt
touch thesis/paper.txt

In [27]:
# create the same file thesis/paper.txt
touch thesis/paper.txt

### Exercise: move and copy a file or directly

- `ls -R`
- `mv thesis/paper.txt writing.txt`
- `ls -R`
- `cp thesis/paper.txt thesis.txt`
- `ls -R`
- `rm writing.txt`
- `ls -R`
- `rm thesis`
- `rm -r thesis`

In [28]:
ls -R

.:
exercise-data  north-pacific-gyre  thesis

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt

./thesis:
paper.txt


In [29]:
mv thesis/paper.txt writing.txt

In [30]:
ls -R

.:
exercise-data  north-pacific-gyre  thesis  writing.txt

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt

./thesis:


In [31]:
cp thesis/paper.txt thesis.txt

cp: cannot stat 'thesis/paper.txt': No such file or directory


: 1

In [32]:
# show hidden files
ls -R

.:
exercise-data  north-pacific-gyre  thesis  writing.txt

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt

./thesis:


In [33]:
touch thesis/paper.txt

In [34]:
cp thesis/paper.txt thesis.txt

In [35]:
ls -R

.:
exercise-data  north-pacific-gyre  thesis  thesis.txt  writing.txt

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt

./thesis:
paper.txt


In [36]:
rm writing.txt

In [37]:
ls -R

.:
exercise-data  north-pacific-gyre  thesis  thesis.txt

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt

./thesis:
paper.txt


In [38]:
rm thesis

rm: cannot remove 'thesis': Is a directory


: 1

In [39]:
rm -r thesis

In [40]:
ls -R

.:
exercise-data  north-pacific-gyre  thesis.txt

./exercise-data:
animal-counts  creatures  numbers.txt  proteins  writing

./exercise-data/animal-counts:
animals.csv

./exercise-data/creatures:
basilisk.dat  minotaur.dat  unicorn.dat

./exercise-data/proteins:
cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb

./exercise-data/writing:
haiku.txt  LittleWomen.txt

./north-pacific-gyre:
goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt


## 3. check the length of a file

key commands

- `wc` = word count
- `echo`
- write output to a file `COMMAND > file.txt` and append to a file `COMMAND >> file.txt`
- `cat` = concatenate
- `sort -n file.txt`
- `head -n 3 file.txt`
- `tail -n 3 file.txt`

In [41]:
ls

exercise-data  north-pacific-gyre  thesis.txt


In [42]:
cd exercise-data

In [43]:
ls proteins

cubane.pdb  ethane.pdb  methane.pdb  octane.pdb  pentane.pdb  propane.pdb


In [44]:
cd proteins

In [45]:
wc cubane.pdb

  20  156 1158 cubane.pdb


In [46]:
# use wildcard
wc *.pdb

  20  156 1158 cubane.pdb
  12   84  622 ethane.pdb
   9   57  422 methane.pdb
  30  246 1828 octane.pdb
  21  165 1226 pentane.pdb
  15  111  825 propane.pdb
 107  819 6081 total


In [47]:
# write output to a file
wc -l *.pdb > lengths.txt

In [48]:
# show the content of the file
cat lengths.txt

  20 cubane.pdb
  12 ethane.pdb
   9 methane.pdb
  30 octane.pdb
  21 pentane.pdb
  15 propane.pdb
 107 total


In [49]:
# sort numerically
sort -n lengths.txt

   9 methane.pdb
  12 ethane.pdb
  15 propane.pdb
  20 cubane.pdb
  21 pentane.pdb
  30 octane.pdb
 107 total


In [50]:
# write the output back to a file
sort -n lengths.txt > sorted-lengths.txt

In [51]:
head -n 1 sorted-lengths.txt

   9 methane.pdb


## 4. chain commands together and retrieve a set of files

- `|`: pass output to another command
- `*`: wildcard character (see [regular expression](https://tldp.org/LDP/abs/html/x17129.html) as an advanced material) 

In [52]:
# Passing output to another command
sort -n lengths.txt | head -n 1

   9 methane.pdb


In [53]:
# Put together everything
wc -l *.pdb | sort -n | head -n 1

   9 methane.pdb


![](https://swcarpentry.github.io/shell-novice/fig/redirects-and-pipes.svg)

# Exercise: minimal printing

In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?

1. `wc -l * > sort -n > head -n 3`
2. `wc -l * | sort -n | head -n 1-3`
3. `wc -l * | head -n 3 | sort -n`
4. `wc -l * | sort -n | head -n 3`

In [54]:
pwd

/home/pi/TheCarpentries/2022-09-28-upr-online/shell-lesson-data/exercise-data/proteins


In [55]:
wc -l *.pdb > sort -n > head -n 3

wc: invalid option -- 'n'
Try 'wc --help' for more information.


: 1

In [56]:
wc -l * | sort -n | head -n 1-3

head: invalid number of lines: ‘1-3’
sort: fflush failed: 'standard output': Broken pipe
sort: write error


: 1

In [57]:
wc -l * | head -n 3 | sort -n

   0 head
  12 ethane.pdb
  20 cubane.pdb
wc: write error


In [58]:
wc -l * | sort -n | head -n 3

   0 head
   0 sort
   7 lengths.txt


## 5. iterate over files

- `for` loop
- use `$` to get value of a variable

In [59]:
cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data/exercise-data/creatures

In [60]:
ls

basilisk.dat  minotaur.dat  unicorn.dat


In [61]:
files=$(ls)

In [62]:
echo $files

basilisk.dat minotaur.dat unicorn.dat


In [63]:
for filename in $files
do
    head -n 2 $filename | tail -n 1
done

CLASSIFICATION: basiliscus vulgaris
CLASSIFICATION: bos hominus
CLASSIFICATION: equus monoceros


## Exercise: loop over files

loop over `*.dat` files under `shell-lesson-data/exercise-data/proteins`

How to write the loop?

In [64]:
cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data/exercise-data/proteins

In [65]:
ls

cubane.pdb  head         methane.pdb  pentane.pdb  sort
ethane.pdb  lengths.txt  octane.pdb   propane.pdb  sorted-lengths.txt


In [66]:
for filename in *.pdb
do
    echo $filename
done

cubane.pdb
ethane.pdb
methane.pdb
octane.pdb
pentane.pdb
propane.pdb


## 6. run a shell script with a pipeline

write and run a `.sh` file

In [67]:
echo 'head -n 15 octane.pdb | tail -n 5' > script.sh

In [68]:
cat script.sh

head -n 15 octane.pdb | tail -n 5


In [69]:
bash script.sh

ATOM      9  H           1      -4.502   0.681   0.785  1.00  0.00
ATOM     10  H           1      -5.254  -0.243  -0.537  1.00  0.00
ATOM     11  H           1      -4.357   1.252  -0.895  1.00  0.00
ATOM     12  H           1      -3.009  -0.741  -1.467  1.00  0.00
ATOM     13  H           1      -3.172  -1.337   0.206  1.00  0.00


## 7. Exercise

Write a script that processes each `NENE*.txt` file under `shell-lesson-data/north-pacific-gyre`.
For each file, it first outputs the filename, then outputs the minimum and maximum number in the file. 
Finally, it calls the `goostats.sh` script with `bash goostats.sh $filename stats-$filename`.


In [70]:
cd ~/TheCarpentries/2022-09-28-upr-online/shell-lesson-data/north-pacific-gyre

In [71]:
ls

goodiff.sh      NENE01736A.txt  NENE01843A.txt  NENE01978B.txt  NENE02040Z.txt
goostats.sh     NENE01751A.txt  NENE01843B.txt  NENE02018B.txt  NENE02043A.txt
NENE01729A.txt  NENE01751B.txt  NENE01971Z.txt  NENE02040A.txt  NENE02043B.txt
NENE01729B.txt  NENE01812A.txt  NENE01978A.txt  NENE02040B.txt


write these lines to a script file and run it

```
for filename in NENE*.txt
do
    echo $filename
    cat $filename | sort -n | head -n 1
    cat $filename | sort -n | tail -n 1
    bash goostats.sh $filename stats-$filename
    echo '-----'
done
```

In [72]:
echo "for filename in NENE*.txt
do
    echo $filename
    cat $filename | sort -n | head -n 1
    cat $filename | sort -n | tail -n 1
    bash goostats.sh $filename stats-$filename
    echo '-----'
done" > script.sh

In [None]:
bash script.sh

## recap

Nelle's Pipeline: A Typical Problem

- navigate to a file/directory: `cd`, `ls`, `pwd`
- create/change a file/directory: `mkdir`, `touch`, `cp`, `mv`, `rm`
- check the length of a file: `wc`, `head`, `tail`, `sort`, `echo`
- chain commands together: `|`
- retrieve a set of files: use wildcard `*`
- iterate over files: `for` loop
- run a shell script containing her pipeline: `bash script.sh`

## how to navigate through a new command

In [None]:
# check how to use the zip command
man zip

In [None]:
# what data do we have?
ls

In [None]:
# zip all txt files
zip nene-data.zip *.txt 

In [None]:
# observe the result
ls

## check running processes

- get all running processes `ps -aux`
- select by matching term `grep term`
- select all processes running by `doris`:  `ps -aux | grep doris`
- select all python processes:  `ps -aux | grep python`

In [None]:
ps -aux

In [None]:
man grep

In [None]:
ps -aux | grep doris

In [None]:
ps -aux | grep python