# Shell
<img src="Images/bash.png" alt="Bash" align="right">

Let's pick up where we left off. Last time we finished by learning how to use git, I emphasized the use of **terminal (aka shell)** over GUI.
Don't be scared of the shell! Even if you learn only the most basic commands (which you will, in the next few minutes) it will make your life much easier. The key skills you need and will learn are:
- Navigating the file system (aka folders)
- Basic file manipulation (copying, moving, deleting, renaming)
- Creating and viewing simple text files
- Writing simple scripts
- Downloading files ***from the terminal***
- Some miscalenous things, like file permissions and customizing bash

## Basic file system navigation and file manipulation

I won't write any explanatin here, because the readings bellow do it so much better. **Take your time and read through them!** Also, don't be afraid of trying out some things that they show you in the tutorial -- you literally just have to write a few words in your terminal and hit Enter :)

1. [Introduction to the shell](http://swcarpentry.github.io/shell-novice/01-intro/index.html)
2. [Navigating files and directories](http://swcarpentry.github.io/shell-novice/02-filedir/index.html)
3. [Working with files and directories](http://swcarpentry.github.io/shell-novice/03-create/index.html)

### Exercises

Now that you have carefully read through the readings, tried some stuff out on your own, it's time to test your knowledge with exercises! As usual, write out your solutions in the code bellow, and don't forget the proper code formatting (consult the previous lecture if needed). There are a lot of exercises, but all of them are really simple.

1. Open a new terminal window (or, if you're a real pro, open a new tab in the current terminal window - the one running Jupyter lab). Check what is the current directory and check its contents.
2. Navigate to the tutorial folder.
3. Ok, let's go to your Downloads folder - in one `cd` command, without using `..`.
4. Now let's go back to the tutorial folder - in one command of course, without using `~`.
6. Go to the images folder. Check out how big are the files, in a human readable format (hint -- this was shown in the readings). 
7. Now go back **2** directories. You should end up in the Documents folder.
8. Create a directory called `simple-text`, and inside it a text file called `text1.txt`, write whatever in it. Rename the file `whatever.txt`. Print the contents of the file (hint: use the `cat` command).
9. Ok, navigate to the tutorial folder. From there, copy the text file created in the previous step, and after that, delete the entire folder created in the previous step (hint: this one is just a bit more sophisticated than the rest).
10. Delete the file you copied in the previous step - and done!


----
1. Write solutions (the `code`) to your exercises here, and don't forget proper
```
code formatting
```

## Writing scripts, downloading, and the rest

Now you'll lear a few independent, yet very useful skills (execpt for piping/filtering, you'll need to use all of them later in this course). First off, you'll take a look at pipes and filters, which enable you to combine shell commands. This isn't something you will write a lot on your own, but you will see it in many places, so it's important that you understand it. Next, you'll learn about file permissionions in Unix systems (you'll bump against this when you will be creating scripts (executables) or public keys!). After that, you'll learn how to download files from the terminal (the main use of this is in automating installations).

And finally, you'll learn how to write basic Shell scripts. This is one of the main features of the Shell, as it enables you to automate a lot of tasks. As you will see, the bash scripting language is, well, a programming language, so it is very powerful. But you will very rarely use anything more than a simple sequence of commands (perhaps with some use inputs), as it's much simpler to do everything else in Python.

So with no further redo, the readings:
1. [Pipes and filters](http://swcarpentry.github.io/shell-novice/04-pipefilter/index.html)
2. [Permissions](http://swcarpentry.github.io/shell-extras/04-permissions/)
3. [Downloading files](http://swcarpentry.github.io/shell-extras/03-file-transfer/)
4. [Writing scripts](https://www.taniarascia.com/how-to-create-and-use-bash-scripts/)

### Exercises

#### Piping
1. Open the terminal, navigate to the project directory. With a single command, write `Hello world` to a file called `intro.txt`.
2. Check out what the command `cal` does (it's fun!). Ok, now using this command, **append** the current year and month to the `intro.txt` file (hint: `head`).
3. Check out what the command `date` does. Ok, now using this command, **append** the current month and day (eg: `May 11`) to the `intro.txt` file (hint: `cut`).

#### Permissions
4. In the last chapter of this lecture, we will download a key to access the amazon servers (called (`my-key-pair.pem`). After that, we will need to set its permissions to `chmod 400 my-key-pair.pem`. Explain exactly what these permissions mean for each of the 3 user groups (hint: you would know how to do this if letters were used instead of numbers. But numbers are easy too - just open the manual for the `chmod` command and you will get all neccesary information there).
5. If you look for tutorials on how to write bash scripts online, some will tell you to make the file executable with `chmod +x`, and some with `chmod u+x`. What is the difference between them?
6. Go to the project directory, and check permissions for `README.md`. Copy them to the solutions, and explain exactly what they mean.
7. Make sure only the current user can do stuff with `README.md` - remove everyone elses permissions (but leaving the current user's permissions untouched).
8. Now give everyone read/write permissions for `README.md`.

#### Downloading
9. Let's download the famous diamonds (you'll see...) diamonds dataset. Download it from `https://github.com/mwaskom/seaborn-data/raw/master/diamonds.csv`, and use `curl`.
10. Rename the file `diamonds_curl.csv`. You've done this using `mv`, right? Ok, now open the `curl` manual and find an option that enables you to rename the file as you download it, and do that!
11. Repeat the previous two steps, except using `wget` (so rename `diamonds_wget`).
12. As a foreshadowing of things to come, go to [this link](https://docs.conda.io/en/latest/miniconda.html) and copy the download link for Python 3.7 64-bit **bash** installar for your operating system. Now download this link, and rename the download `miniconda.sh` (in one command). Then, **make the file executable**. After that, delete the file.

#### Writing scripts
No need to write solutions here, just make sure to save all the `.sh` files in the project directory (no need to save them in `bin`, as was done in the tutporial), so git can save them. Also, always make them executable. And run them to make sure they work.
13. Write a script that does both commands of exercise 11 - save it as `ex11.sh`.
14. Suppose you're done with the exercises, and are ready to commit and push them. Write a script that does all of this (hint: if you've read the reading, this is too easy), name it `git.sh`.
15. Write a script that asks the user "What day is it?", reads what writes in the terminal, and **appends** "Today is {whatever the user answered}" to `intro.txt`. Name it `date.sh`.

----
1. Write solutions (the `code`) to your exercises here, and don't forget proper
```
code formatting
```

## Bonus: terminal from Jupyter Lab

A very cool feature of Jupyter Lab is that you can run the terminal from within it. There are two ways to do this:

### Full-fledged terminal

To open a full fledged terminal, click on the + button in the sidebar. You will see some choices, from "Other" select the Terminal. And voila, you have a terminal! This will be very useful when you are working on Jupyter Lab from the server (later in this lecture), because this way if you need to do some Terminal work you don't need to reconnect again in a new window.

### Single commands

Opening a new terminal window in Jupyter Lab was easy, but things get even easier. If you want to execute simple commands you can just start the (code) cell with a `!`, and then write the command. Try it out on the cell bellow by executing it.

In [None]:
! ls

# Conda (and virtual environments)

<img src="Images/conda.png" alt="Bash" align="right">

First, what is Conda? Conda is mainly a package manager, that is, it installs and manages Python and its packages for you (well, works for some other languages too). So instead of installing Python directly on your machine (and using `pip` as the package manager), you would do it all through Conda. Why would you do that? Well, first because Conda is really easy to use, and second, because of its virtual environments.

Let me start with a quick description of what environments are, and why I think they are useful (readings fill in the details). Virtual environments enable you to have multiple different "installations" of python (meaning versions, packages...) at the same time, and switch between them easily. For example, one such virtual environment might be called `data_analysis`, and you would install Python, Pandas and Seaborn. Another would be `machine_learning`, and you would install Python, Pandas, Scipy and Scikit-learn. You might be wondering, why not just install the union of all these packages in a single environment? To answer this question, let me explain some use cases for virtual environments:
- You are collaborrating with multiple people on a project, and to minimize potential problems, you need to ensure that you all have the exact same packages (and exact same versions of those packages) installed on your machines
- You mainly work on your local machine, but for some tasks you need to go to the server and to the work there.

For that reason, you want only the packages you really use to be installed - otherwise you will have bloat, which will slow down installation on new machines, and may complicate debugging. Said in another way, the main advantages of using environments are:
- **Consistency**: You can achieve the same setup on multiple machines
- **Portability**: You can *quickly and easily* achieve the same setup on multiple machines.

Ok, now for the readings. The first one gives a great introduction to the concept of virtual environments, but is quite wordy, so not suitable for a reference. The second one is purely technical - just describes how to use environments with conda. This is what you'll come back to once you already understand how virtual environments work, and just need a refresher on how to create one in Conda.

1. [Introduction to Python environments with Conda](https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c)
2. [Manage environments - conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)

## Exercises

OK, now that you know about conda and conda environments, it's time for you to do the entire installation process **properly** yourself, so that you know how to do it in the future (future = next section, when you will need to repeat all this on the server). As we will be messing with the conda installation, you will have to save and close this notebook. But this doesn't mean you can't view it anymore - go to the GitHub of your fork, and open the notebook there (yup, GitHub can open Jupyter notebooks). Write the answers in a text file meanwhile, and when you are done with the exercises, paste them  in the cell bellow. As always, everything should be done in the Terminal, and you should show all the commands that you use in the solutions (even the "navigate to..." stuff).

1. Let's start by deleting conda first (so that we start from a clean slate). You delete it by deleting `miniconda3` folder in your home directory.
2. Do exercise 12 from the previous part (i.e., download miniconda and make it executable) - you should do that from your home directory, not the project directory.
3. Now, execute the file you downloaded in the previous step - congratulations, you have installed miniconda! 
4. Let's create an environment in which you will be working. Name the environment `data`, and make sure it has `jupyterlab` and `pandas` packages installed (they willl make sure all the dependencies - including python, are installed). Do this in one command.
5. Cool, we have the environment! Go activate it, run jupyterlab to see that things work. Now close it again, and export the environment to `data.yml` (in the project folder, so I can see it).

Alright, now let's step things up a bit, and use what we have learned from the previous part to cut out the interactive part of the installation (no more confirmations).
6. Repeat exercises 1 and 2.
7. Now let's install conda 'silently'. Install it with
```
. ~/miniconda.sh -b -p $HOME/miniconda3
```
(if you're wondering what `-b -p` does, see [this](https://conda.io/projects/conda/en/latest/user-guide/install/macos.html#install-macos-silent)). After this, execute a command that **appends** 
```
. ~/miniconda3/etc/profile.d/conda.sh \nconda activate base
```
to `~/.bashrc` (hint: previous section, because of the newline character `\n` you will need the `-e` option for `echo`). What this does is to activate conda when you open a new terminal window. After that, delete `miniconda.sh`. Close and reopen the terminal window after this step.
8. Create the `data` environment from `data.yml` file. 
9. Add the conda forge channel (which has a lot more packages than normal channel) with `conda config --add channels conda-forge`. 
10. Install the `requests` package.

Now you are proficient in conda and understand how environments work, and most importantly, how to install conda and packages properly, congrats!
11. Just to make sure you know how to use what you have installed, show which 3 commands (navigate to ..., activate ..., jup...) you would use when you turn on your computer, open a new terminal window and want to start working on the exercises from this tutorial (hint: you should **always** be using the environment you just installed). 

----
1. Write solutions (the `code`) to your exercises here, and don't forget proper
```
code formatting
```

# AWS Servers

<img src="Images/aws.png" alt="Bash" align="right">

Now let's put the skills we just learned to good use! We'll be working with **Amazon Web Services** to create a simple server, connect to it and run a Jupyter notebook from there.

And the good thing about AWS is that is has a [free tier](https://aws.amazon.com/free/?all-free-tier.sort-by=item.additionalFields.SortRank&all-free-tier.sort-order=asc&awsf.Free%20Tier%20Types=categories%23featured), which enables you *1 year* of free use of certain basic services - which includes the server we will be working with. So relax, you won't have to pay anything! And prices are quite low even for some very high-performing machines - for example, if you look at the [price list](https://aws.amazon.com/ec2/pricing/on-demand/) you can see that even for a GPU optimized machine (which you would use for some deep learning stuff) with 61GB of RAM, you pay only 3`$` per hour - compare to at least a 1000`$` if you were to build such a machine yourself

## Creating an account

So let's get started. The first thing you should do is to create an account. Just go to the [AWS website](https://aws.amazon.com/) and do it from there.

## Creating a server

Ok, now that you have an account, sing in to AWS. Bellow the page you should see some handy shortcuts, one of them saying "Launcha a virtual machine". Select that (you will be creating an EC2 instance).

You will see some options - select the "Ubuntu Server" one. On the next page, make sure that the type is `t2.micro`, and then click "Review and Launch" (we don't need anything else for now). So what you are doing here is creating a very basic server that will already come with the Ubuntu system installed.

Click "Launch" on the next page. This will bring up a pop-up menu asking you to select a key-pair. The key-pair method is a way of authenticating yourself when you access the server - you will save this key on your computer. You will need this key to access the server, and you can **only download it once**. Name the key `data`, and download it. Save the key I suggest you save it on your dropbox - this way you can't lose it if your hard drive crashes. Launch the instance after that.

After that, go back to the AWS Console, and from the "Services" dropdown at the top select "EC2". From there, select "Instances" from the sidebar. You can see your server being alive here, nice!

### Exercise

1. Just a small exercise here. Your `data.pem` key is hopefully safely stored (in your dropbox) by now, but that's not actually the place we will use it from. The key will be used to connect to the server by SSH (more on this later), so it should be stored in the appropriate folder - which is `~/.ssh`. Copy the key to that folder (using terminal). After that, set the permissions for the key in the following way: the user (u) should have all the permissions, and everyone else (g and a) should have no premissions at all (this is actually needed, otherwise you can't connect). Do this with just one command, and using numbers (hint: exercise 4 from the first section of this lecture).

----
1. Write solutions (the `code`) to your exercises here, and don't forget proper
```
code formatting
```

## Connecting to the server

Now we are ready to connect to the server. As I said, we'll be connecting using SSH. What is SSH, and how does it work? Read all about it in the reading bellow

1. [SSH reading](http://swcarpentry.github.io/shell-extras/02-ssh/)

OK, now that we understand what SSH does in theory, let's see how to use it in practice. If you read [this link](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html), you will see that to connect to your instance, you need to execute
```
ssh -i ~/.ssh/data.pem ubuntu@<PUBLIC-DNS-NAME>,
```

where you need to use the public DNS name of your instance (just go to Console > EC2 > Instances, and copy it from there) - for example, for me this is `ec2-3-19-79-219.us-east-2.compute.amazonaws.com`. Notice also that I have changed (compared to the link) the user name from `ec2-user` to `ubuntu` - this is because we are using an Ubuntu, and not Amazon Linux AMI. Now you should be in, and your promt should look like this:

```
Welcome to Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-1032-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sun May 12 16:06:55 UTC 2019

  System load:  0.0               Processes:           86
  Usage of /:   13.7% of 7.69GB   Users logged in:     0
  Memory usage: 14%               IP address for eth0: 172.31.23.237
  Swap usage:   0%


  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.


Last login: Sun May 12 16:05:17 2019 from 174.62.235.225
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

ubuntu@ip-172-31-23-237:~
```

Congrats, you have a functioning server! Now have some fun with it - try out a few terminal commands that you know on the server, or try doing a few of exercises from the first section on the server. Log out of the server using `exit`, and then log back in.

## Conda/Jupyter lab on the server

Ok enough fun, let's get back to work. Let's install Conda on the server, and create the `data` environment - just repeat what you did in exercises 6-10 of the previous section. Now we need to run the notebook on the server - but you should somehow be able to use it from the browser on your machine. The way we'll do this is with SSH tunnelling. What this does is that you will connect a certain port between your machine and the server (this is called a **tunnel**), and "cast" the Jupyter lab to that port (if you look at the URL right now it will be something like `http://localhost:8888/lab` - meaning that the Jupyter lab is on port 8888 of localhost - your local machine). That way, when you type `http://localhost:9000/lab` in your browser, you are actually connecting to port 9000 on the server! We do this in two steps:

1. On your local machine, execute (this should all be one command)
```
ssh -N -f -L 9000:localhost:9000 ubuntu@<PUBLIC-DNS-NAME>\
       -i ~/.ssh/data.pem
```
This connects the port 9000 on your machine with the port 9000 on the server.
2. On the server, first activate the `data` environment, and then execute 
```
jupyter-lab --no-browser --port=9000
```
which will run the server on the port 9000 on the server (and not open it in a browser window, since the server does not have a browser installed). This command will output a link, you can either Ctrl+click the link or copy/paste it in your server to open it. 

Now you should have the Jupyter lab **from the server** running in your browser on port 900, awesome! Notice that I set the port number as 9000 intentionally, so as not to conflict with the port 8888 that Jupyter lab uses on your machine. This way you can run the Jupyter lab both on your local machine and the server at the same time! 

### Exercises

Remember how I told you that Jupyter environments and shell scripts are very useful for automating things? It's time to demonstrate this.

1. Create an SSH key for Github on your server and add it to your Github, following instructions [here](https://help.github.com/en/enterprise/2.15/user/articles/adding-a-new-ssh-key-to-your-github-account)
2. On your local machine, create a `data.yml` file for the `data` environment, and save it to your repository (just like in exercise 8. of the previous section)
3. Create a script called `conda_install.sh` (and save it to your project folder), that:
    1. Navigates to the home directory, `~`
    2. From there, silently installs conda (hint: exercises 6-7 from the previous section)
    3. Executes `. ~/.bashrc` (this is the same as closing and opening the terminal)
    4. Navigates to your project folder
    5. From there, creates the environment based on the `data.yml` file in your project folder. 
4. Commit and push changes to your repository.
5. Terminate your server (google how) and create a new one - this time don't create a new key, just select `data.pem` from the list of existing keys.
6. What you will do is to create a step-by-step guide for someone working on your project about how to 
    1. Connect to the server, create an SSH key (it's ok to just tell the user to follow instructions on the link here) and then clone the repository to the server 
    2. Install conda (and the environment) using the script you have just created, 
    3. Run Jupyter lab on the server (and use it on his browser). 
Make sure to test the instructions yourself first, on the newly created server.

I hope now you can appreciate how environments and scripts can make your life easier.

----
1. Write solutions (the `code`) to your exercises here, and don't forget proper
```
code formatting
```

<img src="Images/tmux.png" alt="Bash" align="right">

## Running jobs on the server while you sleep

You may have noticed that if you close the terminal window with which you connected to server and ran Jupyter lab from, Jupyter lab (on the server) stops working. But if you can't run a computationally intensive job on the server while you sleep, what's the point????

Fortunately, there is a way to let the tasks run "in the background". The way we will do this is with `tmux` (terminal multiplexer) - you may see some people do it with `screen`, the choice between them is a matter of taste.

### How to use tmux

Let's connect to the server. First thing you need to do is to create a new session (basically, a terminal), where you will write the commands. So let's create a session and name it `jupyterlab`:
```
tmux new -s jupyterlab
```
Ok, now we have this session. Now in this session, navigate to your project directory, activate the `data` environment and run Jupyter lab (as shown in the previous part, don't forget the tunnel) - keep it open in your browser. Now what we want is to leave the session without killing jupyter lab. We do this in two steps:
1. First, we need to **detach** the session. To do this, press Crtl+B and after that D (for detach). Now you should be back in the main terminal.
2. In this terminal, execute command `exit`. This will close the terminal session and disconnect you from ther server 

Now you are out of the server, but Jupyter lab should still work! Make sure that it does.

Next thing we want to do is to go back to the server and kill Jupyter lab. This is done in three steps:
1. Connect back to the server. Execute `tmux ls` to see which sessions are running - you should see `jupyterlab`.
2. Now you need to attach the session again. To do this, execute
```
tmux a -t jupyterlab
```
3. Now you are back in the session. To kill Jupyter Lab, press Crtl+C, and confirm. After it shuts down, execute `exit` - this will kill the session (but won't disconnect you from the server, for that you need to execute `exit` again).

Alternatively, you could have also used `tmux kill-session -t jupyterlab` to kill the session, but the first approach is better.

# Up next...

This was the last non-Python lecture in this course. In the next lecture, I'll show you some Python basics (loops, if-statements, data types...) and also how to use regular expressions (regex), a very useful skill.