The aim of this notebook is to show how to use awscli together with the boto3 Python library to do some basic management of AWS EC2 instances. A lot of these code snippets can be turned into python scripts that can be hooked up with further shell commands to get to simple interactions like  
`aws-ssh MyJupyterInstance` 

Necessary setup:
* Python3 (I recommend Anaconda, #https://www.anaconda.com/download/)
* An AWS account, https://aws.amazon.com/
* AWS Access Key ID and Secret Access Key. You obtain these when creating a user in the AWS IAM
* An instance created via the EC2 dashboard

Install the python libraries:
  
`  pip install awscli`  
`  pip install boto3`

From the shell/command line, call  
  
`aws configure`  
  
This will ask you for the Access Key and Secret Access Key, as well as your default region.

Now that we have the setup and configuration out of the way, let's get started with some code! First, let's import boto

In [None]:
import boto3

The main boto3 class we will be working with for the basic EC2 operations is the boto3 client, which gives low-level access to the AWS CLI. The full documentation can be found here: http://boto3.readthedocs.io/en/latest/reference/core/session.html#boto3.session.Session.client, but for now we need at most one of the extra arguments, *region_name*. This defaults to the region that you gave when calling `aws configure`, so can be safely ignored if you only have instances that you want to manage in this particular region. Let's assume that you do have instances spread across different regions, say the US West Coast and Western Europe. Let's connect to the Oregon data centre for now:

In [None]:
ec2 = boto3.client('ec2', region_name='us-west-2')

To get a list of all available region names, you could also drop the region_name, which will connect to your default, and then call `describe_regions()`:

In [None]:
regions = [x['RegionName'] for x in boto3.client('ec2').describe_regions()['Regions']]
regions

To obtain a list of your instances in the current region, simply call `describe_instances()`. The response will be in JSON format, and we can parse the relevant part with a simple list comprehension.

In [None]:
response = ec2.describe_instances()

instances = [x for r in response['Reservations'] for x in r['Instances'] ]

So far so good. However, the response object contains all kinds of data, most of which we don't really need right now:

In [None]:
instances[0]

Let's pick out the few relevant pieces of information and put the whole thing into a nested dictionary keyed on the instance name:

In [None]:
keys = ['InstanceId', 'InstanceType', 'State', 'PublicDnsName']
instance_info = {instance['Tags'][0]['Value']: { k: instance[k] for k in keys } for instance in instances}

In [None]:
instance_info

We can wrap this all into a nice `get_instance_info` function:

In [None]:
def get_instance_info(ec2_client):
    response = ec2_client.describe_instances()
    instances = [x for r in response['Reservations'] for x in r['Instances'] ]
    keys = ['InstanceId', 'InstanceType', 'State', 'PublicDnsName']
    return {instance['Tags'][0]['Value']: { k: instance[k] for k in keys } for instance in instances}

In [None]:
get_instance_info(ec2)

Now we can easily start and stop one of our instances, say the 'MyJupyter' one 

In [None]:
ec2.start_instances(InstanceIds=[instance_info['MyJupyter']['InstanceId']])

In [None]:
ec2.stop_instances(InstanceIds=[instance_info['MyJupyter']['InstanceId']])

Again, let's put these commands into wrapper functions that also do some basic checking of the response 

In [None]:
def start_instance_simple(ec2_client, name):
    instance_info = get_instance_info(ec2_client)
    response = ec2.start_instances(InstanceIds=[instance_info[name]['InstanceId']])
    si = response['StartingInstances']
    if len(si) != 1 or si[0]['CurrentState']['Name'] not in ['running', 'pending']:
        print("Something went wrong!", response)
        return False
    else:
        return True

def stop_instance(ec2_client, name):
    instance_info = get_instance_info(ec2_client)
    response = ec2.stop_instances(InstanceIds=[instance_info[name]['InstanceId']])
    si = response['StoppingInstances']
    if len(si) != 1 or si[0]['CurrentState']['Name'] not in ['stopping', 'stopped']:
        print("Something went wrong!", response)
        return False
    else:
        return True

In [None]:
stop_instance(ec2, 'MyJupyter')

The `stop_instance` is already doing what it's supposed to do, but there is something that can be improved when spinning up an instance. 99% of the time when I spin up an EC2 instance, I want to ssh into it and start some web service, do some (big) data analysis that my notebook cannot cope with, or gain access to a GPU for training some ML model. To do so, we need the Public DNS name, which, as you will have spotted, is not included in the response to our `start_instances` call. It does appear in the response to describe_instances once the instance is up and running. As the DNS name is a much more useful piece information than a simple "True" when starting an instance, let's obtain this using a delay and repeated calls to get_instance_info

In [None]:
def start_instance(ec2_client, name):
    from time import sleep
    instance_info = get_instance_info(ec2_client)
    response = ec2.start_instances(InstanceIds=[instance_info[name]['InstanceId']])
    si = response['StartingInstances']
    if len(si) != 1 or si[0]['CurrentState']['Name'] not in ['running', 'pending']:
        print("Something went wrong!", response)
        return False
    else:
        while True:
            sleep(0.5)
            dns_name = get_instance_info(ec2_client)[name]['PublicDnsName']
            if dns_name:
                return dns_name

In [None]:
start_instance(ec2, 'MyJupyter')

In [None]:
stop_instance(ec2, 'MyJupyter')