# Basic Unix Survival Skills

## Configuring your environment

The first thing you should do when you set up a unix machine is configure your shell (since you'll be working in it a lot). I'm assuming we'll be using a `bash` shell, but if you're using `zsh` or `sh` or any other shell, just replace `bash` with that in what follows.

The user defined settings that determine where unix will look for executables will be stored in 
```
~/.bash_profile
```

on OSX and 
```
~/.bashrc
``` 
on most linux distros. Generally speaking, files starting with a `.` contain configuration information and shouldn't be messed with unless you need to!!!

In [1]:
%%bash

# This won't show any dot-files

ls ~

Applications
Coursework
Desktop
Documents
Downloads
Library
Movies
Music
Pictures
Projects
Public
anaconda
nltk_data


In [2]:
%%bash

# This will

ls -A ~

.CFUserTextEncoding
.DS_Store
.Trash
.atom
.bash_history
.bash_profile
.bash_profile-anaconda.bak
.bashrc
.continuum
.cups
.gitconfig
.graphlab
.ipython
.jupyter
.matplotlib
.ssh
.viminfo
Applications
Coursework
Desktop
Documents
Downloads
Library
Movies
Music
Pictures
Projects
Public
anaconda
nltk_data


## Man Pages

In [11]:
%%bash

man ls


LS(1)                     BSD General Commands Manual                    LS(1)

NNAAMMEE
     llss -- list directory contents

SSYYNNOOPPSSIISS
     llss [--AABBCCFFGGHHLLOOPPRRSSTTUUWW@@aabbccddeeffgghhiikkllmmnnooppqqrrssttuuwwxx11] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     For each operand that names a _f_i_l_e of a type other than directory, llss displays its name as well as any requested, associated information.  For each oper-
     and that names a _f_i_l_e of type directory, llss displays the names of files contained within that directory, as well as any requested, associated informa-
     tion.

     If no operands are given, the contents of the current directory are displayed.  If more than one operand is given, non-directory operands are displayed
     first; directory and non-directory operands are sorted separately and in lexicographical order.

     The following options are availabl

## Exercise

Using the man page for `ls`, find the most recently changed file in your home directory.

In [8]:
%%bash 

#Suppose have have a script say_hello that prints the word hello to the terminal

echo 'echo "hello"' > say_hello
chmod u+x say_hello

#I can run it from here by typing

./say_hello

hello


In [9]:
%%bash

# But if I try to run it without the ./

say_hello

# or from somewhere else

cd ~
say_hello

bash: line 4: say_hello: command not found
bash: line 9: say_hello: command not found


In [10]:
%%bash

# In order to run an exectuable file, unix needs to know where it is
# That's what the PATH variable is for: it stores a list of locations
# where unix will look for commands that you type in

echo "Path"
echo $PATH

# You can add directories to your PATH by adding the following line to your .bash_profile

export PATH="$PATH:/Users/$USER/Coursework/fundamentals/lectures/"

# Now, this directory will be in your path everytime to start a bash shell

echo "New Path"
echo $PATH

# And I can run my function from anywhere
echo "Saying Hello!"
say_hello

rm say_hello 

Path
/Users/brianmann/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
New Path
/Users/brianmann/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/brianmann/Coursework/fundamentals/lectures/
Saying Hello!
hello


### Side Note

The `$` operator is used to access the value of a bash variable.

In [6]:
%%bash

echo $PATH

# versus

echo PATH

/usr/local/Cellar/opencv/2.4.10/lib/python2.7/site-packages/:/Users/brianmann/spark-1.4.1-bin-hadoop2.4/python:/usr/local/Cellar/pandoc/1.16/bin:/Library/TeX/texbin:/usr/local/bin/:~/Applications/Postgres.app/Contents/Versions/9.4/bin/:/Users/brianmann/anaconda/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
PATH


In [11]:
%%bash

echo "hello"

hello


The function `cat` does the same thing, except if takes a file as input.

In [12]:
%%bash

cat ../data/truncated.txt

TEXT FLY WITHIN 
THE BOOK ONLY 



Tight Binding Book 



CO > UJ 

ft <OU_1 68052 >m 



OUP 2273 19-1 1-79 10,000 Copies. 

OSMANIA UNIVERSITY LIBRARY 

Call No ^^L Accession No ^95 gg 

Author -j- 4.STt> y^ 

Title W<U *'<P4>*' ( *~' 

This book should bc^rcturned on or before the date last marked below. 



War and Peace 



BY LEO TOLSTOY 



Translated b\ LOUISE and AYLMER MAUDE 




WILLIAM BENTON, Publisher 



ENCYCLOPEDIA BR1TANNICA, INC. 





In [13]:
%%bash

# There are other environment variables like PATH

echo "Your username:"
echo $USER

echo "Your home directory:"
echo $HOME

echo "Your shell:"
echo $SHELL

Your username:
brianmann
Your home directory:
/Users/brianmann
Your shell:
/bin/bash


## Permissions

In Unix, a user can have 3 types of permissions on a file: Read, Write, and Execute. 

In [14]:
%%bash

#The ls command will list all the files and directories contained in a directory

ls

bss-correct.pdf
unix_lecture.ipynb


In [15]:
%%bash

# You can also add the -l flag to see more information

ls -l

total 496
-rw-r--r--  1 brianmann  staff  226758 Oct 22 15:23 bss-correct.pdf
-rw-r--r--  1 brianmann  staff   21655 Oct 23 09:49 unix_lecture.ipynb


The first 10 characters of the output above tells you the permissions:

The first character is either `d` or `-`. This tells you whether it is a directory or not (a file). 

In [16]:
%%bash

# .. refers to the directory one level up

ls -l ..

total 96
-rw-r--r--  1 brianmann  staff   7514 Oct 22 15:23 Data science workflow.ipynb
-rw-r--r--  1 brianmann  staff  13336 Oct 22 15:23 EDA.ipynb
-rw-r--r--  1 brianmann  staff   5688 Oct 22 13:35 README.md
drwxr-xr-x  5 brianmann  staff    170 Oct 22 21:48 data
drwxr-xr-x  4 brianmann  staff    136 Oct 22 13:35 img
-rw-r--r--  1 brianmann  staff   5501 Oct 22 13:35 individual.md
drwxr-xr-x  5 brianmann  staff    170 Oct 23 09:51 lectures
-rw-r--r--  1 brianmann  staff   3439 Oct 22 13:35 miniquiz.md
-rw-r--r--  1 brianmann  staff     28 Oct 22 13:35 pair.md


The next 9 characters are in blocks of 3: 
```
user permissions | group permissions | other permissions
```

The first character is `'r'` or `'-'` : read permission.

The second is `'w'` or `'-'` : write permissions.

The third is `'x'` or `'-'` : execute permissions.

In each line above, the owning user and the group that user is in are displyed in columns 3 and 4. So, for example, `brianmann` has `rwx` permissions on `say_hello`, but no one else has permissions to write to it or execute it. 

The `chmod` command will change permssions. The command looks something like 

```
chmod 400 filename
``` 
or
```
chmod 777 filename
```


`7` : read, write and execute || rwx

`6` : read and write || rw-

`5` : read and execute || r-x

`4` : read only || r--

`3` : write and execute || -wx

`2` : write only || -w-

`1` : execute only || --x

`0` : none || ---


## Exercise

Figure out what it means for a directory to have read, write, or execute permissions.

## Navigating the Filesystem

So, how do you navigate to, move, copy, or delete files and directories?

`ls` [dir] : List the contents of a directory. Use current directory is dir is missing.

`pwd` : Print current directory

`cd` [dir] : Change directory. Uses$HOME if dir is missing.

`mkdir` dir : Make a new directory

`rmdir` dir : Delete directory

`rm` file : Delete file

`rm -rf` * : Nuke everything, file or an entire directory. This cannot be undone. It will not give you a warning if you're about to delete something important. Never do this.

`mv` from to : Move a file or directory. Can also be used to rename a file or directory.

`cp` : Copy files or directories.

`find` : Traverse directory tree and execute commands

`chmod` : Change permissions

`chown` : Change ownership

In [41]:
%%bash 

mkdir new
ls -l

total 496
-rw-r--r--  1 brianmann  staff  226758 Oct 22 15:23 bss-correct.pdf
drwxr-xr-x  2 brianmann  staff      68 Oct 23 11:00 new
-rw-r--r--  1 brianmann  staff   22125 Oct 23 11:00 unix_lecture.ipynb


## Examining files

What if you want to look at / know something about a file?

`less` : View contents of a file and page through it.

`grep` : Search lines for matches of a regular expression or string. Probably you want to use `grep -E` all the time, since it accepts more modern regex syntax. 

`cat` : Write one or more files to `stdout` one after the other.

`head` : Print the first few lines of a file to `stdout`.

`tail` : Print the last few lines of a file to `stdout`.

`wc` : Print the number of character, words, and lines in a file.

`cmp` : Tell if files are the same.

`diff` : Show difference between files.

`sum` : Compute checksum of file.

In [62]:
%%bash

ls

bss-correct.pdf
say_hello
unix_lecture.ipynb


## Doing things with files

All unix functions are based on the idea that text files are made up of lines or text. 

`sort` : Sorts the lines in a file

`uniq` : Outputs file with adjacent identical lines collapsed to one

`cut` : Extract parts of each line of input

`paste` : Joins files by outputing sequentially corresponding lines of each file (horizontal version of `cat`)

`sed` : Find and replace with regex.

`tr` : A simple character find and replace. Slightly less useful than `sed` but it allows you to replace things with newlines `'\n'`, which doesn't work in `sed`. 

`awk` : Command for more advanced line-by-line file manipulation.

`grep` : Command for searching files with regular expressions.

In [14]:
%%bash

grep -E "pipelines" unix_lecture.ipynb

    "## More advanced pipelines\n",


## More advanced pipelines

What if I want to chain these commands together? Do I need to write the result to a file each time and then read that in to the next command? NO!

You can use IO redirection and pipes: `>, <, >>, |`

`>` : Redirect output to a file. WILL OVERWRITE IF EXISTS.

`>>` : Append output to and existing file.

`<` : Read a command's input from disk.

`|` : Pass output from one command as the input to another.

In [40]:
%%bash

# A simple pipeline to count the number of appearances of the word 'war' in War and Peace

cat ../data/war_and_peace.txt | tr [A-Z] [a-z] | tr ' ' '\n' | sed 's/\.|,//g' | grep -E '^war$' | wc -l

     536


In [39]:
%%bash

# We can also compare the number of occurances of 'peace' with 'war'

cat ../data/war_and_peace.txt | tr [A-Z] [a-z] | tr ' ' '\n' | sed 's/\.|,//g' |
grep -E '^(peace|war)$' | sort | uniq -c

 408 peace
 536 war


In [25]:
%%bash

# Using the < operator

grep -E '[Pp]ublisher' < ../data/war_and_peace.txt

WILLIAM BENTON, Publisher 


## A Simple Bash Script

Let's write a simple program that will count the number of times your name appears in a text file.

```
#!/bin/bash

NAME=$(echo $1 | tr [A-Z] [a-z])
cat $2 | tr [A-Z] [a-z] | tr ' ' '\n' | grep ^$NAME$ | wc -l
```

Bash scripting is kind of a lost art. If you want to write a complicated bash script, be prepared to do a lot of googling.

## Remote machines

### ssh

If you need to access a machine remotely, for example an EC2 instance on AWS, you'll need to use `ssh`.

The general syntax is 

```
ssh user@remote-hostname
```

This will log you into the host and open a command shell. There are some other options, like `-i` for credentials to connect securely.

### scp

If you don't want access to a terminal, but just to transfer files to and from a remote host, use `scp`.

The basic syntax is

```
scp source_file_name username@destination_host:destination_folder
```

### sftp

Using syntax similar to `ssh` we can run

```
sftp user@remote-hostname
```

to open a secure file transfer protocol. This opens a sftp shell which allows you to use a limited set of commands to navigate the remote filesystem and transfer files back and forth.



## Text Editors

If you `ssh` into a host, you need to do everything via the command line. That means no Atom or SublimeText or Gedit to edit your code. You'll need to use a terminal based text editor like Vi, Vim, Emacs, Nano, or Pico. 

People debate which is 'best' with nearly religious fervor. I like Vim. Lots of people like Emacs. I'm not very familiar with Nano or Pico. Pick one and learn the keyboard shortcuts - they'll all work just fine.

## Regular Expressions (Regex)

Regular expressions are way to search text for patterns instead of just matching strings.

For example, if you wanted to search a file for phone numbers, you might do something like:

```
[0-9]{3}-[0-9]{4}
```

`\d` : Matches a numeric character 0-9. Equivalent to `[0-9]`

`{n}` : Means 'match the previous pattern exactly n times'

In [27]:
%%bash 

echo "345-5678" | grep -E '\d{3}-\d{4}'
echo "345-567" | grep -E '\d{3}-\d{4}'

echo 'world' | grep -E world

ps -A | grep python

345-5678
world
 2430 ttys001    0:00.00 /bin/bash /Users/brianmann/anaconda/bin/python.app /Users/brianmann/anaconda/bin/ipython notebook
 2431 ttys001    0:04.10 /Users/brianmann/anaconda/python.app/Contents/MacOS/python /Users/brianmann/anaconda/bin/ipython notebook
 2441 ttys001    0:01.98 /Users/brianmann/anaconda/bin/python -m ipykernel -f /Users/brianmann/Library/Jupyter/runtime/kernel-0c73ae62-3dba-4185-9b23-a3b02477490d.json
 2798 ttys001    0:00.00 grep python


This regex is pretty simplistic though. What about phone numbers like (123) 456-7890 or 123-456-7890?

Turns out you need something like 

```
^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
```

Regular expressions can get pretty complicated if you need to search for something general. It can be a puzzle to create one that works. Here's a [tutorial](http://www.tutorialspoint.com/python/python_reg_expressions.htm). The best way to learn is just to practice!

![Regex](https://imgs.xkcd.com/comics/regex_golf.png)