<h1 align="center">Basic AWS Cluster Setup</h1> 
<h3 align="center">Author: Guorong Xu (g1xu@ucsd.edu) </h3>
<h3 align="center">2019-7-17</h3> 

## The notebook is an example that tells you how to call API to install, configure ParrallelCluster package, create a cluster, and connect to the master node. Currently we only support Linux, Mac OS platforms.

## <font color='red'>Notice:</font> First step is to fill in the AWS account access keys and then follow the instructions to install ParallelCluster package and create a cluster. 

In [1]:
import os
import sys

sys.path.append("../../src/cirrus_ngs")

## Input the AWS account access keys
aws_access_key_id = "AKIXXXXXXXXXXXXXXXXMBA" 
aws_secret_access_key = "1irasdasdfsfafwefafeasfasdsdf+5Ob"

## ParallelCluster name
your_cluster_name = "clustername"

## The private key pair for accessing cluster.
private_key = "/path/to/your_aws_key.pem"

## If delete ParallelCluster after job is done.
delete_cluster = False

## 1. Install Cluster

### Notice: The cluster package can be only installed on Linux box which supports pip installation.

In [None]:
sys.path.append("../../src/cirrus_ngs")
from awsCluster import ClusterManager
ClusterManager.install_cluster()

## 2. Upgrade ParallelCluster

In [None]:
from awsCluster import ClusterManager
ClusterManager.upgrade_cluster()

## 3. Configure ParallelCluster

### To configure ParallelCluster settings, you need to import the package ParallelCluster. The below functions tell you how to insert AWS access keys, configure instance types, spot price and S3 resource.

In [6]:
from awsCluster import ClusterManager

## Configure ParallelCluster settings
## aws access keys are the AWS credentials section (required). These settings apply to all clusters.
ClusterManager.insert_access_keys(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)

## "private_key" is to specify the name of an existing EC2 KeyPair to enable SSH access to the instances
ClusterManager.config_key_name(private_key)

## "master_instance_type" is to specify the EC2 instance type use for the master node.
## "compute_instance_type" is to specify the EC2 instance type used for the cluster compute nodes.
ClusterManager.config_instance_types(master_instance_type="t2.medium", compute_instance_type="r3.8xlarge")

## "initial_cluster_size" is to specify the inital number of EC2 instances to launch as compute nodes in the cluster.
ClusterManager.config_initial_cluster_size(initial_cluster_size="0")

## "spot_price" is to specify the maximum spot price for the ComputeFleet.
ClusterManager.config_spot_price(spot_price="1.5")

## "volume_size" is to specify the size of volume to be created (if not using a snapshot)
ClusterManager.config_volume_size(volume_size="300")

## "ebs_snapshot_id" is to specify the EBS volume which contains all prebuilt pipeline and software.
## "snap-047728f70680eae54" is the released snapshot ID for the LATEST version. 
ClusterManager.config_ebs_snapshot_id(ebs_snapshot_id="snap-0c9df91b9c0aff12c")

## "aws_region_name" is to specify the aws region where the cluster will be created in.
ClusterManager.config_aws_region_name(aws_region_name="us-west-2")

## "post_install" is a URL to a postinstall script. This is executed after creation of cluster.
#ClusterManager.config_post_install(post_install="s3://path/to/postinstall.sh")

## "master_subnet_id" is to specify ID of an existing subnet you want to provision the Master server into.
## "vpc_id" is to specify ID of the VPC you want to provision cluster into.
ClusterManager.config_vpc_subnet_id(master_subnet_id="subnet-00000000", vpc_id="vpc-00000000")

## s3_read_resource is to specify S3 bucket for which ParallelCluster nodes will be granted read-only access
## s3_read_write_resource is to specify S3 resource for which ParallelCluster nodes will be granted read-write access
ClusterManager.config_s3_resource(s3_read_resource="bucket_name", s3_read_write_resource="bucket_name")


### After you finish configuration, you can call the below function to double check if your settings are correct.

### Before you create a new cluster, you can check the current running clusters to avoid to use the different cluster name by call the below function.

In [7]:
ClusterManager.view_cluster_config()

[aws]
aws_region_name = us-west-2
aws_access_key_id = AKIXXXXXXXXXXXXXXXXMBA
aws_secret_access_key = 1irasdasdfsfafwefafeasfasdsdf+5Ob

[cluster cluster]
vpc_settings = ucsd
key_name = your_aws_key
master_instance_type = t2.medium
compute_instance_type = r3.8xlarge
initial_queue_size = 0
cluster_type = spot
spot_price = 1.5
ebs_settings = custom
#s3_read_resource = arn:aws:s3:::bucket_name
#s3_read_write_resource = arn:aws:s3:::bucket_name/*
#post_install = s3://bucket_name/path/to/postinstall.sh

[vpc ucsd]
master_subnet_id = subnet-00000000
vpc_id = vpc-00000000

[global]
update_check = true
sanity_check = true
cluster_template = cluster

[ebs custom]
ebs_snapshot_id = snap-088438396378400a8
#volume_size = 300


In [None]:
ClusterManager.list_aws_cluster()

### To create a new cluster, you need to set a cluster name and then call the below function. After the creation is complete, you will see the output information about your cluser IP address.

In [None]:
master_ip_address = ClusterManager.create_aws_cluster(cluster_name=your_cluster_name)

## 4. Manage cluster

### To manage your new created cluster, you need to import  ConnectionManager. The ConnectionManager can create the connection to the master node, execute commands on the master node, transfer files to the master. To create a connection to the master node, you need to set the hostname, username and your private key file. The hostname IP address (MasterPublicIP) can be found when your cluster creation is complete. The private key file should be the same when you configure ParallelCluster. 

In [None]:
from awsCluster import ConnectionManager
ssh_client = ConnectionManager.connect_master(hostname=master_ip_address,
               username="ec2-user",
               private_key_file=private_key)

### After the job is done, you can call the below function to close the connection. 

In [None]:
ConnectionManager.close_connection(ssh_client)

### To delete the cluster, you just need to set the cluster name and call the below function.

In [None]:
from awsCluster import ClusterManager

if delete_cluster == True:
    ClusterManager.delete_aws_cluster(cluster_name=your_cluster_name)