# Afternoon: Spinning up a cluster

To do this the long way, see this walkthrough: https://github.com/gSchool/dsi-spark-aws/blob/master/pair_part1.md

To do this the short way, we've written a script that uses the AWS CLI to start up a cluster:
https://github.com/gSchool/dsi-spark-aws/blob/master/scripts/launch_cluster.sh

The script requires you to have:
- AWS CLI set up
- an S3 bucket
- a PEM key pair, with the PEM file stored in `~/.ssh/` (if you need to create one, go [here](https://console.aws.amazon.com/ec2/v2/home#KeyPairs))
- the accompanying file [`bootstrap-emr.sh`](https://github.com/gSchool/dsi-spark-aws/blob/master/scripts/bootstrap-emr.sh) in the same folder as `launch_cluster.sh`

When running the script, you specify the name of the bucket, the name of the PEM key, and the number of worker nodes to have in your cluster. e.g.,
```bash
bash launch_cluster.sh mybucket mypem 4
```

### AWS Command Line interface

``` pip install awscli ```

``` aws configure ```

 - leave `AWS Access Key ID` and `AWS Secret Access Key` as `None`, since you should have already put them in your  `~/.bash_profile` (`~/.bashrc` on Linux)
 - make sure `Default region name` matches the location of your cluster. [This page](https://www.npmjs.com/package/aws-regions) lists region codes.
 - leave `Default output format` as whatever it is

### S3 buckets with AWS CLI
- Create a bucket: 
  - `aws s3 mb s3://mynewbucketname`
- List files in a bucket: 
  - `aws s3 ls s3://bucketname`
- Copy local file to bucket: 
  - `aws s3 cp path/to/localfile s3://mybucketname`
- Copy from bucket to local current directory: 
  - `aws s3 cp s3://mybucket/path/to/file .`
- [AWS CLI S3 management reference](https://docs.aws.amazon.com/cli/latest/userguide/using-s3-commands.html)

### EC2: setting up ssh
- example `~/.ssh/config` entry:

```bash
Host host_alias
    HostName ec2-54-219-176-90.us-west-1.compute.amazonaws.com # see AWS console for public DNS
    User hadoop # depending on your machine, 'user' may be 'ubuntu' or 'ec2-user' instead
    IdentityFile ~/.ssh/key_file.pem # make sure this is the same key you chose when you set up the instance

```

- logging in to remote terminal:
 - `ssh host_alias`
   - (This is shorthand for `ssh -i ~/.ssh/key_file.pem <User>@<HostName>`)
- copying a file to a remote machine's home directory (note the colon!)
 - `scp path/to/local/file host_alias:`
   - (This is shorthand for `scp -i ~/.ssh/key_file.pem path/to/local/file <User>@<HostName>:`)
- copying a file to a remote machine
 - `scp path/to/local/file host_alias:path/to/target/directory`


### Tmux - set it and forget it!
Many processes are tied to your terminal. If you `ssh` into your remote machine and run a process in that terminal, that process will break if you suddenly lose your connection.

`tmux` is a tool for "terminal multiplexing". It's great for managing many processes in many terminals. Here's the process for starting a process in a terminal that is detached from your `ssh`ed terminal using `tmux`:

- `ssh` into your remote machine
- start a tmux session with `tmux new -s some_name`
- start your process (notebook or script or whatever)
- type `<ctrl>-b d` to detach (now you are back in your `ssh`ed terminal)
- exit or shut down or go to sleep or whatever
- `ssh` back in to your remote machine
- type `tmux a -t some_name` to check on that process
- [handy tmux reference](https://gist.github.com/MohamedAlaa/2961058)