# CFNCluster Set-up 
* Guorong Xu, Center for Computational Biology and Bioinformatics, UCSD (g1xu@ucsd.edu)

## Introduction

The notebook provides steps to install and configure the CFNCluster package, create a cluster, and connect to the master node. 

<div class="alert alert-info">

Before running this notebook, ensure:

* You are running it on a linux or Mac OSX platform
* You have installed the `paramiko` and `scp` packages in the environment where it is running
* You have an existing AWS account, and have the access keys and a private key pair file for this account

</div>

## Input Parameters

<div class="alert alert-warning">
<h4>Analyst note:</h4>
The values shown below are example settings, and <strong>MUST</strong> be replaced with appropriate values for your cluster.
</div>

In [None]:
# new CFNCluster name
your_cluster_name = "myclustername"

# AWS account access keys
aws_access_key_id = "AKIXXXXXXXXXXXXXXXXMBA" 
aws_secret_access_key = "1irasdasdfsfafwefafeasfasdsdf+5Ob"

# private key pair file for accessing the new cluster
private_key = "/path/to/your_aws_key.pem"

<div class="alert alert-warning">
<h4>Analyst note:</h4>
The values shown below are settings usually shared by all users within an organization, which, once set, <strong>SHOULD NOT</strong> be modified without a clear understanding of what change should be made and why it is necessary.
</div>

In [None]:
# "spot_price" is the maximum spot price, in dollars per hour, for the ComputeFleet
spot_price="1.5"
# "volume_size" is the size of volume to be created (if not using a snapshot)
volume_size="300"
# "aws_region_name" is the aws region in which the cluster will be created
aws_region_name="us-west-2"
# "master_subnet_id" is the ID of an existing subnet into which you want to provision the Master server
master_subnet_id="subnet-c86788ad" 
# "vpc_id" is the ID of the VPC into which you want to provision the cluster
vpc_id="vpc-c7e503a2"
# s3_read_resource is the S3 bucket to which cfncluster nodes will be granted read-only access
s3_read_resource="ucsd-ccbb-projects"
# s3_read_write_resource is the S3 resource to which cfncluster nodes will be granted read-write access
s3_read_write_resource="ucsd-ccbb-projects"

<div class="alert alert-warning">
<h4>Analyst note:</h4>
The values shown below are standard settings for ALL users, and <strong>SHOULD NOT</strong> be modified without a clear understanding of what change should be made and why it is necessary.
</div>

In [None]:
# "ebs_snapshot_id" is the EBS volume which contains all prebuilt pipeline and software
ebs_snapshot_id="snap-088438396378400a8"
# "master_instance_type" is the EC2 instance type use for the master node
master_instance_type = "t2.medium"
# "compute_instance_type" is the EC2 instance type used for the cluster compute nodes
compute_instance_type="r3.8xlarge"
# "initial_cluster_size" is the inital number of EC2 instances to launch as compute nodes in the cluster
initial_cluster_size="0"
# "volume_size" is the size of volume to be created (if not using a snapshot)
volume_size="300"

## CFNCluster Installation

Import the scripts to support CFNCluster installation:

In [None]:
import os
import sys
sys.path.append("../../src/cirrus_ngs")

from cfnCluster import CFNClusterManager

Install the cfncluster package (if the package is already installed, this command will simply report that all the installation requirements are already satisfied):

In [None]:
CFNClusterManager.install_cfn_cluster()

Check for and install any upgrades to the cfncluster package.  Again, if no new updates are available, the command will report that all requirements are satisfied and/or up-to-date.

In [None]:
CFNClusterManager.upgrade_cfn_cluster()

## Cluster Configuration

Set cluster configuration values from the input parameters:

In [None]:
## Configure cfncluster settings
## aws access keys are the AWS credentials section (required). These settings apply to all clusters.
CFNClusterManager.insert_access_keys(aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
## "private_key" is to specify the name of an existing EC2 KeyPair to enable SSH access to the instances
CFNClusterManager.config_key_name(private_key)
CFNClusterManager.config_instance_types(master_instance_type=master_instance_type, 
                                        compute_instance_type=compute_instance_type)
CFNClusterManager.config_initial_cluster_size(initial_cluster_size=initial_cluster_size)
CFNClusterManager.config_spot_price(spot_price=spot_price)
CFNClusterManager.config_volume_size(volume_size=volume_size)
CFNClusterManager.config_ebs_snapshot_id(ebs_snapshot_id=ebs_snapshot_id)
CFNClusterManager.config_aws_region_name(aws_region_name=aws_region_name)
CFNClusterManager.config_vpc_subnet_id(master_subnet_id=master_subnet_id, 
                                       vpc_id=vpc_id)
CFNClusterManager.config_s3_resource(s3_read_resource=s3_read_resource, 
                                     s3_read_write_resource=s3_read_write_resource)

View the configuration settings:

In [None]:
CFNClusterManager.view_cfncluster_config()

<div class="alert alert-warning">
<h4>Analyst note:</h4>

Examine the output above; if any of the settings look surprising given the input parameters, address these before creating the cluster.

</div>

## Cluster Creation

Examine the existing list of cluster names on the AWS account:

In [None]:
CFNClusterManager.list_cfn_cluster()

Examine the input name for the new cluster:

In [None]:
your_cluster_name

<div class="alert alert-warning">
<h4>Analyst note:</h4>

Each cluster must have  unique name.  If the chosen new cluster name already exists in the list of clusters for this AWS account, return to the Input Parameters section, set a new name, and rerun the Input Parameters cell.  Then return to this section and rerun the check above.

</div>

Create the new cluster (note that this may take a few minutes to run):

In [None]:
master_ip_address = CFNClusterManager.create_cfn_cluster(cluster_name=your_cluster_name)

<div class="alert alert-warning">
<h4>Analyst note:</h4>

Examine the output above and find the value shown for `MasterPublicIP` (e.g., `52.38.87.227` if the output includes the line `MasterPublicIP: 52.38.87.227`).  This is the IP address of the new cluster, which will be necessary for connecting to the cluster to run jobs on it and manage it.  <strong>Make a note of this IP.</strong>

</div>

## Example Cluster Usage

To interact with your new cluster from python code, you need to import ConnectionManager, which can create the connection to the master node, execute commands on it, and transfer files to it. Create a connection to the master node with code like that shown below.  The hostname should be the cluster IP address identified above, the username should be `ec2-user`, and the private key file should be the one you specified in Input Parameters above.

    # example cluster connection code
    from cfnCluster import ConnectionManager
    ssh_client = ConnectionManager.connect_master(hostname="52.38.87.227",
                   username="ec2-user",
                   private_key_file="/path/to/your_aws_key.pem")

After the job you are running on the cluster is done, you can use code like the below to close the connection:

    ConnectionManager.close_connection(ssh_client)

When you are **completely** done using a cluster, you can **permanently delete** it by running code like the below and specifying the cluster name:

    CFNClusterManager.delete_cfn_cluster(cluster_name="myclustername")