# Unix Basics

Leverage a bit of content from Ryan Henning and Brian Mann

## `vim` survival skills

From [Vim's Website](http://www.vim.org/about.php):
> Vim is often called a "programmer's editor," and so useful for programming that many consider it an entire IDE. ... Vim isn't an editor designed to hold its users' hands. It is a tool, the use of which must be learned.

`vim` is the "`vi` i**M**proved" editor. `vi` is an old-school Unix text-editor program.

In `vim` you are either in:
- _insert mode_, or
- _command mode_

In _insert mode_, you get to type into the document.

In _command mode_, you get to move the cursor around and instruct vim to do high-level commands.

By default, you start in _command mode_. This leads to confusion for all new users who want to immediately begin editing the file. There are many ways to enter _insert mode_ from _command mode_, the most simple way is to press `i` on the keyboard ('i' for insert).

Now that you are in _insert mode_, you can type into the document. When you are ready, return to _command mode_ by pressing _escape_ on the keyboard.

Now that you are in _command mode_, let's learn a command. Enter `:w` then press enter. You just saved the file. Now enter `:q` and press enter. You just exited vim. (Btw, you can save _and_ exit with the command `:wq`.)

If you ever get totally lost in `vim`, just hit _escape_ a bunch then type `:q!`, which tells vim to exit _without_ saving the file.

# Basic Unix Survival Skills

## Configuring your environment

The first thing you should do when you set up a unix machine is configure your shell (since you'll be working in it a lot). I'm assuming we'll be using a `bash` shell, but if you're using `zsh` or `sh` or any other shell, just replace `bash` with that in what follows.

The user defined settings that determine where unix will look for executables will be stored in 
```
~/.bash_profile
```

on OSX and 
```
~/.bashrc
``` 
on most linux distros. Generally speaking, files starting with a `.` contain configuration information and shouldn't be messed with unless you need to!!!

In [None]:
%%bash

# This won't show any dot-files

ls ~

In [None]:
%%bash

# This will

ls -A ~

## Man Pages

In [None]:
%%bash

man ls

## Exercise

Using the man page for `ls`, find the most recently changed file in your home directory.

In [None]:
%%bash 

#Suppose have have a script say_hello that prints the word hello to the terminal

#We're chaging the output to this say_hello file (instead of stdout)
echo 'echo "hello"' > say_hello

#change permissions on the file use that the user is the only one able to execute
chmod u+x say_hello

#I can run it from here by typing

./say_hello

In [None]:
%%bash

# But if I try to run it without the ./

say_hello

# or from somewhere else

cd ~
say_hello

In [None]:
%%bash

# In order to run an exectuable file, unix needs to know where it is
# That's what the PATH variable is for: it stores a list of locations
# where unix will look for commands that you type in

echo "Path"
echo $PATH

In [None]:
import os
path = os.getcwd()

In [None]:
%%bash
# You can add directories to your PATH by adding the following line to your .bash_profile

export PATH='$PATH:/{}'.format(path)

# Now, this directory will be in your path everytime to start a bash shell

echo "New Path"
echo $PATH

In [None]:
%%bash
# And I can run my function from anywhere
echo "Saying Hello!"
say_hello

rm say_hello 

### Side Note

The `$` operator is used to access the value of a bash variable.

In [None]:
%%bash

echo $PATH

# versus

echo PATH

In [None]:
%%bash

echo "hello"

The function `cat` does the same thing, except if takes a file as input.

In [None]:
%%bash

cat /Users/sversage/Desktop/test.txt

In [None]:
%%bash

# There are other environment variables like PATH

echo "Your username:"
echo $USER

echo "Your home directory:"
echo $HOME

echo "Your shell:"
echo $SHELL

## Permissions

In Unix, a user can have 3 types of permissions on a file: Read, Write, and Execute. 

In [None]:
%%bash

#The ls command will list all the files and directories contained in a directory

ls

In [None]:
%%bash

# You can also add the -l flag to see more information

ls -l

The first 10 characters of the output above tells you the permissions:

The first character is either `d` or `-`. This tells you whether it is a directory or not (a file). 

In [None]:
%%bash

# .. refers to the directory one level up

ls -l ..

The next 9 characters are in blocks of 3: 
```
user permissions | group permissions | other permissions
```

The first character is `'r'` or `'-'` : read permission.

The second is `'w'` or `'-'` : write permissions.

The third is `'x'` or `'-'` : execute permissions.

In each line above, the owning user and the group that user is in are displyed in columns 3 and 4. So, for example, `brianmann` has `rwx` permissions on `say_hello`, but no one else has permissions to write to it or execute it. 

The `chmod` command will change permssions. The command looks something like 

```
chmod 400 filename
``` 
or
```
chmod 777 filename
```


`7` : read, write and execute || rwx

`6` : read and write || rw-

`5` : read and execute || r-x

`4` : read only || r--

`3` : write and execute || -wx

`2` : write only || -w-

`1` : execute only || --x

`0` : none || ---


## Exercise

Figure out what it means for a directory to have read, write, or execute permissions.

## Navigating the Filesystem

So, how do you navigate to, move, copy, or delete files and directories?

`ls` [dir] : List the contents of a directory. Use current directory is dir is missing.

`pwd` : Print current directory

`cd` [dir] : Change directory. Uses$HOME if dir is missing.

`mkdir` dir : Make a new directory

`rmdir` dir : Delete directory

`rm` file : Delete file

`rm -rf` * : Nuke everything, file or an entire directory. This cannot be undone. It will not give you a warning if you're about to delete something important. Never do this.

`mv` from to : Move a file or directory. Can also be used to rename a file or directory.

`cp` : Copy files or directories.

`find` : Traverse directory tree and execute commands

`chmod` : Change permissions

`chown` : Change ownership

In [None]:
%%bash 

mkdir new
ls -l

## Examining files

What if you want to look at / know something about a file?

`less` : View contents of a file and page through it.

`grep` : Search lines for matches of a regular expression or string. Probably you want to use `grep -E` all the time, since it accepts more modern regex syntax. 

`cat` : Write one or more files to `stdout` one after the other.

`head` : Print the first few lines of a file to `stdout`.

`tail` : Print the last few lines of a file to `stdout`.

`wc` : Print the number of character, words, and lines in a file.

`cmp` : Tell if files are the same.

`diff` : Show difference between files.

`sum` : Compute checksum of file.

In [None]:
%%bash

ls

## Doing things with files

All unix functions are based on the idea that text files are made up of lines or text. 

`sort` : Sorts the lines in a file

`uniq` : Outputs file with adjacent identical lines collapsed to one

`cut` : Extract parts of each line of input

`paste` : Joins files by outputing sequentially corresponding lines of each file (horizontal version of `cat`)

`sed` : Find and replace with regex.

`tr` : A simple character find and replace. Slightly less useful than `sed` but it allows you to replace things with newlines `'\n'`, which doesn't work in `sed`. 

`awk` : Command for more advanced line-by-line file manipulation.

`grep` : Command for searching files with regular expressions.

In [None]:
%%bash

grep -E "pipelines" unix_lecture.ipynb

## More advanced pipelines

What if I want to chain these commands together? Do I need to write the result to a file each time and then read that in to the next command? NO!

You can use IO redirection and pipes: `>, <, >>, |`

`>` : Redirect output to a file. WILL OVERWRITE IF EXISTS.

`>>` : Append output to and existing file.

`<` : Read a command's input from disk.

`|` : Pass output from one command as the input to another.

In [None]:
%%bash

# We can also compare the number of occurances of 'peace' with 'war'

cat /Users/sversage/Galvanize/demo_data/war_and_peace.txt | tr [A-Z] [a-z] | tr ' ' '\n' | sed 's/\.|,//g' |
grep -E '^(peace|war)$' | sort | uniq -c

In [None]:
%%bash

# Using the < operator

grep -E 'published'  < /Users/sversage/Galvanize/demo_data/war_and_peace.txt

## Remote machines

### ssh

If you need to access a machine remotely, for example an EC2 instance on AWS, you'll need to use `ssh`.

The general syntax is 

```
ssh user@remote-hostname
```

This will log you into the host and open a command shell. There are some other options, like `-i` for credentials to connect securely.

### scp

If you don't want access to a terminal, but just to transfer files to and from a remote host, use `scp`.

The basic syntax is

```
scp source_file_name username@destination_host:destination_folder
```

### sftp

Using syntax similar to `ssh` we can run

```
sftp user@remote-hostname
```

to open a secure file transfer protocol. This opens a sftp shell which allows you to use a limited set of commands to navigate the remote filesystem and transfer files back and forth.



## Text Editors

If you `ssh` into a host, you need to do everything via the command line. That means no Atom or SublimeText or Gedit to edit your code. You'll need to use a terminal based text editor like Vi, Vim, Emacs, Nano, or Pico. 

People debate which is 'best' with nearly religious fervor. I like Vim. Lots of people like Emacs. I'm not very familiar with Nano or Pico. Pick one and learn the keyboard shortcuts - they'll all work just fine.

## Regular Expressions (Regex)

Regular expressions are way to search text for patterns instead of just matching strings.

For example, if you wanted to search a file for phone numbers, you might do something like:

```
[0-9]{3}-[0-9]{4}
```

`\d` : Matches a numeric character 0-9. Equivalent to `[0-9]`

`{n}` : Means 'match the previous pattern exactly n times'

In [None]:
%%bash 

echo "345-5678" | grep -E '\d{3}-\d{4}'
echo "345-567" | grep -E '\d{3}-\d{4}'

echo 'world' | grep -E world


This regex is pretty simplistic though. What about phone numbers like (123) 456-7890 or 123-456-7890?

Turns out you need something like 

```
^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$
```

Regular expressions can get pretty complicated if you need to search for something general. It can be a puzzle to create one that works. Here's a [tutorial](http://www.tutorialspoint.com/python/python_reg_expressions.htm). The best way to learn is just to practice!

![Regex](https://imgs.xkcd.com/comics/regex_golf.png)