# Unix Basics

Leverage a bit of content from Ryan Henning and Brian Mann

## `vim` survival skills

From [Vim's Website](http://www.vim.org/about.php):
> Vim is often called a "programmer's editor," and so useful for programming that many consider it an entire IDE. ... Vim isn't an editor designed to hold its users' hands. It is a tool, the use of which must be learned.

`vim` is the "`vi` i**M**proved" editor. `vi` is an old-school Unix text-editor program.

In `vim` you are either in:
- _insert mode_, or
- _command mode_

In _insert mode_, you get to type into the document.

In _command mode_, you get to move the cursor around and instruct vim to do high-level commands.

By default, you start in _command mode_. This leads to confusion for all new users who want to immediately begin editing the file. There are many ways to enter _insert mode_ from _command mode_, the most simple way is to press `i` on the keyboard ('i' for insert).

Now that you are in _insert mode_, you can type into the document. When you are ready, return to _command mode_ by pressing _escape_ on the keyboard.

Now that you are in _command mode_, let's learn a command. Enter `:w` then press enter. You just saved the file. Now enter `:q` and press enter. You just exited vim. (Btw, you can save _and_ exit with the command `:wq`.)

If you ever get totally lost in `vim`, just hit _escape_ a bunch then type `:q!`, which tells vim to exit _without_ saving the file.

# Basic Unix Survival Skills

## Configuring your environment

The first thing you should do when you set up a unix machine is configure your shell (since you'll be working in it a lot). I'm assuming we'll be using a `bash` shell, but if you're using `zsh` or `sh` or any other shell, just replace `bash` with that in what follows.

The user defined settings that determine where unix will look for executables will be stored in 
```
~/.bash_profile
```

on OSX and 
```
~/.bashrc
``` 
on most linux distros. Generally speaking, files starting with a `.` contain configuration information and shouldn't be messed with unless you need to!!!

In [1]:
%%bash

# This won't show any dot-files

ls ~

Applications
Desktop
Documents
Downloads
Galvanize
Library
Movies
Music
Pictures
Public
Versage.pem
anaconda
config-bash
day2
nltk_data
zipfian


In [2]:
%%bash

# This will

ls -A ~

.CFUserTextEncoding
.DS_Store
.Trash
.anaconda
.atom
.bash_history
.bash_profile
.bash_profile-anaconda.bak
.bash_sessions
.bashrc
.conda
.condarc
.continuum
.cups
.dbshell
.ipython
.jupyter
.kite
.matplotlib
.mongorc.js
.oracle_jre_usage
.pgadmin
.psql_history
.pylint.d
.ssh
.viminfo
Applications
Desktop
Documents
Downloads
Galvanize
Library
Movies
Music
Pictures
Public
Versage.pem
anaconda
config-bash
day2
nltk_data
zipfian


## Man Pages

%%bash

man ls

## Exercise

Using the man page for `ls`, find the most recently changed file in your home directory.

## The PATH variable

When you type a command into your shell, it must first _find_ the program you're trying to run. E.g. when you type `python my_script.py` into your shell, the shell must find the `python` program before it can run it. Where does it look?

Does it look _everywhere_? No... that would be bad for many reasons.

Does it look only in your current directory? No... actually by default it _doesn't_ look there even.

Well, someone at some point decided it would look at all the paths in the `PATH` environment variable. Run the command `echo $PATH` to see which directories are in your system's `PATH` variable.

Run the command `which python` to see which directory the system find which contains the python command. The `which` command also looks through the PATH variable to find commands, which is handly for diagnosing some things when they go wrong.

## The PS1 variable

Another special variable which `bash` looks at is the `PS1` variable. It tells `bash` what to print as your prompt!

Run this in your bash shell `PS1="talk to me > "`

See what happend?

If you want a cool prompt (with colors and handy info), check out [Ryan's bash config repo](https://github.com/acu192/config-bash).

## The `.bashrc` file (and friends)

When `bash` starts (e.g. when you open a new terminal window), it loads a few config files before showing the first prompt. This is your chance to configure bash the way you like it. You can do stuff like add directories to the PATH (like `psql`) and set the PS1 variable to something sensible.

Let's use `vim` to edit your `.bashrc` file. Type `vim ~/.bashrc`

You can also take a look at `.profile` and `.bash_profile`. Google for the difference between these three files.

Btw, whenever you modifiy any of those three files, **be sure to restart your shell!** (e.g. close your terminal window and open a new one)

Btw #2, we'll be editing these files periodically throughout the rest of the DSI (and your life as a data scientists / software engineer), so be sure to get familiar with it now.

Again, if you want a coolish bash config, check out [Ryan's bash config repo](https://github.com/acu192/config-bash).

## Aliases

Typing sucks, so let's not do that so much. Aliases are one way to type less!

Add the following alias to your `.bashrc` file:
```
alias lla='ls -alhF'
```

Close and reopen your terminal and try out the alias by typing `lla` and hitting enter.

The function `cat` does the same thing, except if takes a file as input.

In [16]:
%%bash

head /Users/sversage/Desktop/test.txt

helllo,

tesing out cat.


## Permissions

In Unix, a user can have 3 types of permissions on a file: Read, Write, and Execute. 

In [18]:
%%bash

#The ls command will list all the files and directories contained in a directory

ls

EDA-Versage.ipynb
Workflow-Versage.ipynb
unix_lecture-versage.ipynb


In [19]:
%%bash

# You can also add the -l flag to see more information

ls -l

total 576
-rw-r--r--  1 sversage  admin  232702 Jul 20 15:27 EDA-Versage.ipynb
-rw-r--r--  1 sversage  admin   10350 Jul 20 12:49 Workflow-Versage.ipynb
-rw-r--r--  1 sversage  admin   45259 Jul 21 10:34 unix_lecture-versage.ipynb


The first 10 characters of the output above tells you the permissions:

The first character is either `d` or `-`. This tells you whether it is a directory or not (a file). 

In [20]:
%%bash

# .. refers to the directory one level up

ls -l ..

total 0
drwxr-xr-x  8 sversage  admin  272 Jul 20 13:44 ben_skrainka
drwxr-xr-x  5 sversage  admin  170 Jul 20 13:44 brian_mann
drwxr-xr-x  4 sversage  admin  136 Jul 11 12:23 jack_bennetto
drwxr-xr-x  6 sversage  admin  204 Jul 11 12:23 miles_erickson
drwxr-xr-x  9 sversage  admin  306 Jul 17 15:08 ryan_henning
drwxr-xr-x  7 sversage  admin  238 Jul 21 10:34 sky_versage
drwxr-xr-x  3 sversage  admin  102 Jul 17 15:08 tzeiske


The next 9 characters are in blocks of 3: 
```
user permissions | group permissions | other permissions
```

The first character is `'r'` or `'-'` : read permission.

The second is `'w'` or `'-'` : write permissions.

The third is `'x'` or `'-'` : execute permissions.

In each line above, the owning user and the group that user is in are displyed in columns 3 and 4. So, for example, `brianmann` has `rwx` permissions on `say_hello`, but no one else has permissions to write to it or execute it. 

The `chmod` command will change permssions. The command looks something like 

```
chmod 400 filename
``` 
or
```
chmod 777 filename
```


`7` : read, write and execute || rwx

`6` : read and write || rw-

`5` : read and execute || r-x

`4` : read only || r--

`3` : write and execute || -wx

`2` : write only || -w-

`1` : execute only || --x

`0` : none || ---


## Exercise

Figure out what it means for a directory to have read, write, or execute permissions.

## Navigating the Filesystem

So, how do you navigate to, move, copy, or delete files and directories?

`ls` [dir] : List the contents of a directory. Use current directory is dir is missing.

`pwd` : Print current directory

`cd` [dir] : Change directory. Uses$HOME if dir is missing.

`mkdir` dir : Make a new directory

`rmdir` dir : Delete directory

`rm` file : Delete file

`rm -rf` * : Nuke everything, file or an entire directory. This cannot be undone. It will not give you a warning if you're about to delete something important. Never do this.

`mv` from to : Move a file or directory. Can also be used to rename a file or directory.

`cp` : Copy files or directories.

`chmod` : Change permissions

`chown` : Change ownership

In [None]:
%%bash 

mkdir new
ls -l

## Examining files

What if you want to look at / know something about a file?

`less` : View contents of a file and page through it.

`cat` : Write one or more files to `stdout` one after the other.

`head` : Print the first few lines of a file to `stdout`.

`tail` : Print the last few lines of a file to `stdout`.

## More advanced pipelines

What if I want to chain these commands together? Do I need to write the result to a file each time and then read that in to the next command? NO!

You can use IO redirection and pipes: `>, <, >>, |`

`>` : Redirect output to a file. WILL OVERWRITE IF EXISTS.

`>>` : Append output to and existing file.

`<` : Read a command's input from disk.

`|` : Pass output from one command as the input to another.

## stdin, stdout, stderr

Every process on Unix gets a special input "file" named _stdin_, a special output "file" name _stdout_, and another special output "file" named _stderr_. I put "file" in quotes because everything in Unix is a file, but these are necessarily normal files. For example, all three of these (stdin, stdout, and stderr) are _usually_ connected to your terminal screen!

Let's do an example in Python:
```python
import sys

while True:
    line = sys.stdin.readline()
    if not line: break
    line = line[0:-1]   # stip new line character
    revsd = line[::-1]  # reverse the string
    sys.stdout.write(revsd + '\n')
```

Name the script `reverse.py`.

Notice we don't have to open stdin or stdout, we just import them and use them. Btw, `raw_input` and `print` use stdin and stdout, respectively.

In [25]:
%%bash

# We can also compare the number of occurances of 'peace' with 'war'

cat /Users/sversage/Galvanize/demo_data/war_and_peace.txt | tr [A-Z] [a-z] | tr ' ' '\n' | sed 's/\.|,//g' |
grep -E '^(peace|war)$' | sort | uniq -c

  57 peace
 175 war


In [26]:
%%bash

# Using the < operator

grep -E 'published'  < /Users/sversage/Galvanize/demo_data/war_and_peace.txt

published,” recounted Bítski, emphasizing certain words and opening


## Remote machines

### ssh

If you need to access a machine remotely, for example an EC2 instance on AWS, you'll need to use `ssh`.

The general syntax is 

```
ssh user@remote-hostname
```

This will log you into the host and open a command shell. There are some other options, like `-i` for credentials to connect securely.

### scp

If you don't want access to a terminal, but just to transfer files to and from a remote host, use `scp`.

The basic syntax is

```
scp source_file_name username@destination_host:destination_folder
```

### sftp

Using syntax similar to `ssh` we can run

```
sftp user@remote-hostname
```

to open a secure file transfer protocol. This opens a sftp shell which allows you to use a limited set of commands to navigate the remote filesystem and transfer files back and forth.



## Text Editors

If you `ssh` into a host, you need to do everything via the command line. That means no Atom or SublimeText or Gedit to edit your code. You'll need to use a terminal based text editor like Vi, Vim, Emacs, Nano, or Pico. 

People debate which is 'best' with nearly religious fervor. I like Vim. Lots of people like Emacs. I'm not very familiar with Nano or Pico. Pick one and learn the keyboard shortcuts - they'll all work just fine.