<a id="top"></a>
Following Workflow

### [File Storage](#File_Storage)
* [Create](#File_Storage_Create) an **S3 Bucket**
* [Load](#File_Storage_Load) csv file and script to execute
* [Delete](#File_Storage_Delete) object and content
* [List](#File_Storage_List) objects

### [Execute Code](#Execute_Code)
#### [EC2](#Execute_Code_EC2) Miniconda Image
* [Create](#Execute_Code_EC2_Create) the EC2 Instance
* [Mount](#Execute_Code_EC2_Mount) S3 Storage
* [Execute](#Execute_Code_EC2_Script) Script
* [Stop](#Execute_Code_EC2_Stop) \ Terminate server

#### [SageMaker](#Execute_Code_SM) Instance
* [Create](#Execute_Code_SM_Create) SageView
* Mount S3 Storage
* Execute Script

### Requirements
pycloud environment
```
conda env create -f pycloud.env.ymp --force
```

From Command line execute `aws configure` in order to setup your AWS Access Key and AWS Secret Key

Local Test Code is located under `example_src`

[Go back to top](#top)
## File Storage<a id="File_Storage"></a>
Code to create an S3 Storage
https://realpython.com/python-boto3-aws-s3/#creating-a-bucket

### Create S3 Object<a id="File_Storage_Create"></a>

In [4]:
import uuid
def create_bucket_name(bucket_name, include_uid = False):
    # The generated bucket name must be between 3 and 63 chars long
    if include_uid:
        return ''.join([bucket_name, str(uuid.uuid4())])
    else:
        return bucket_name

def create_bucket(bucket_name, s3_connection):
    session = boto3.session.Session()
    current_region = session.region_name
    # Create Configurations
    configs = {}
    if current_region != "us-east-1":
        config["LocationConstraint"] = current_region

    bucket_name = create_bucket_name(bucket_name, True)
    
    if current_region == "us-east-1":
        bucket_response = s3_connection.create_bucket(
            Bucket=bucket_name)
    else:
        bucket_response = s3_connection.create_bucket(
            Bucket=bucket_name,
            CreateBucketConfiguration={"LocationConstraint":current_region})
        
    print(bucket_name, current_region)
    return bucket_name, bucket_response

In [5]:
import boto3

s3_resource = boto3.resource('s3')
bucket_name, first_response = create_bucket(
    bucket_name='iris-train', 
    s3_connection=s3_resource.meta.client)

iris-train358f23b3-22ee-4d49-972b-293fb7d33344 us-east-1


### Load Files into S3 Object<a id="File_Storage_Load"></a>
There is no native folder sync within Python SDK, so using aws command to solve for this problem!

In [6]:
import os
import subprocess
path_to_src = "../example_src"
path_to_s3 = "s3://" + bucket_name

# result = subprocess.run(["aws","s3", "sync" ,path_to_src, path_to_s3,"--acl","private"], stdout=subprocess.PIPE)

result = os.popen("aws s3 sync "+path_to_src+" "+path_to_s3+" --acl private").read()

print(result)
# [ print( l ) for l in result.stdout.decode('utf-8').split('\n') ]


Completed 3.6 KiB/9.7 KiB (20.2 KiB/s) with 5 file(s) remaining
upload: ../example_src/data/iris.csv to s3://iris-train358f23b3-22ee-4d49-972b-293fb7d33344/data/iris.csv
Completed 3.6 KiB/9.7 KiB (20.2 KiB/s) with 4 file(s) remaining
Completed 3.8 KiB/9.7 KiB (11.4 KiB/s) with 4 file(s) remaining
upload: ../example_src/requirements.yml to s3://iris-train358f23b3-22ee-4d49-972b-293fb7d33344/requirements.yml
Completed 3.8 KiB/9.7 KiB (11.4 KiB/s) with 3 file(s) remaining
Completed 6.1 KiB/9.7 KiB (18.0 KiB/s) with 3 file(s) remaining
upload: ../example_src/sm_train.py to s3://iris-train358f23b3-22ee-4d49-972b-293fb7d33344/sm_train.py
Completed 6.1 KiB/9.7 KiB (18.0 KiB/s) with 2 file(s) remaining
Completed 6.1 KiB/9.7 KiB (18.0 KiB/s) with 2 file(s) remaining
upload: ../example_src/requirements.txt to s3://iris-train358f23b3-22ee-4d49-972b-293fb7d33344/requirements.txt
Completed 6.1 KiB/9.7 KiB (18.0 KiB/s) with 1 file(s) remaining
Completed 9.7 KiB/9.7 KiB (26.6 KiB/s) with 1 file(s) re

[Go to EC2](#Execute_Code_EC2)
[Go to SageMaker](#Execute_Code_SM)

### Delete S3 Object and Content<a id="File_Storage_Delete"></a>
Remove all resources and delete bucket!<br>
**This does not back-up anything!**

In [2]:
import boto3
s3_resource = boto3.resource('s3')
def delete_all_objects(bucket_name):
    res = []
    bucket=s3_resource.Bucket(bucket_name)
    for obj_version in bucket.object_versions.all():
        res.append({'Key': obj_version.object_key,
                    'VersionId': obj_version.id})
    print(res)
    if len(res) > 0:
        bucket.delete_objects(Delete={'Objects': res})

In [14]:
delete_all_objects(bucket_name)    

s3_resource.Bucket(bucket_name).delete()

[{'Key': 'data/training/iris.csv', 'VersionId': 'null'}, {'Key': 'model/iris-randomforest.pkl', 'VersionId': 'null'}, {'Key': 'requirements.yml', 'VersionId': 'null'}, {'Key': 'train.py', 'VersionId': 'null'}]


{'ResponseMetadata': {'RequestId': '47F879E285BFA452',
  'HostId': 'g1Ue8rg/KEvdhrCMP6mJLjm0C8wsDkzikdkswbQ7ajcpnaVSepVP6xEK+g4GT/a+G6RkTg41m3Q=',
  'HTTPStatusCode': 204,
  'HTTPHeaders': {'x-amz-id-2': 'g1Ue8rg/KEvdhrCMP6mJLjm0C8wsDkzikdkswbQ7ajcpnaVSepVP6xEK+g4GT/a+G6RkTg41m3Q=',
   'x-amz-request-id': '47F879E285BFA452',
   'date': 'Thu, 07 Nov 2019 00:51:55 GMT',
   'server': 'AmazonS3'},
  'RetryAttempts': 0}}

### List S3 Buckets<a id=File_Storage_List></a>
Quickly list and clean up all buckets created by with iris-trian in name

In [3]:
import boto3
s3_resource = boto3.resource('s3')
for bucket in s3_resource.buckets.all():
  if "iris-train" in bucket.name:
    print(bucket.name)
    delete_all_objects(bucket.name)  
    s3_resource.Bucket(bucket.name).delete()

iris-traind73bd071-eb96-4720-bf80-a7dbf7887340
[{'Key': 'data/iris.csv', 'VersionId': 'null'}, {'Key': 'requirements.txt', 'VersionId': 'null'}, {'Key': 'requirements.yml', 'VersionId': 'null'}, {'Key': 'sm_train.py', 'VersionId': 'null'}, {'Key': 'train.py', 'VersionId': 'null'}]


[Go back to top](#top)
## Execute Code<a id=Execute_Code></a>

## EC2 Instance<a id=Execute_Code_EC2></a>

### Create EC2 Instance<a id=Execute_Code_EC2_Create></a>
https://blog.ipswitch.com/how-to-create-an-ec2-instance-with-python

You need to bring your own:
* Security Group
* pem key

We will be building the following EC2
* MiniConda - ami-062c42cbecc1d5ec0
* t2.medium

I built my own security group and granted ssh access.

In [7]:
import uuid
def create_ec2_name(bucket_prefix):
    # The generated bucket name must be between 3 and 63 chars long
    return ''.join([bucket_prefix, str(uuid.uuid4())])

Use a bash script to create the S3 Mount, go [here](#Execute_Code_EC2_Mount) to see details

### Mount S3 onto EC2 Instance<a id=Execute_Code_EC2_Mount></a>
https://cloudkul.com/blog/mounting-s3-bucket-linux-ec2-instance/

Using existing EC2 in AWS
Need to leverage API to create EC2 and mount determining setup

**Required setup on EC2**
```
sudo yum update
sudo yum install automake fuse fuse-devel gcc-c++ git libcurl-devel libxml2-devel make openssl-devel
git clone https://github.com/s3fs-fuse/s3fs-fuse.git
cd s3fs-fuse
./autogen.sh
./configure --prefix=/usr --with-openssl
make
sudo make install
```

You must create an IAM role for S3 Mounting, for sake of simplicity, i'm using my Admin IAM Access
```
sudo touch /etc/passwd-s3fs
sudo vim /etc/passwd-s3fs
```
Provide `Your_accesskey:Your_secretkey` inside the file
```
sudo chmod 640 /etc/passwd-s3fs
```

Let's mount it!, replace iris-trainc0e3588c-d9bb-4699-821c-1883670ace42 with your bucket name
uid=500 is ec2-user account
```
sudo mkdir /mys3bucket
sudo chown ec2-user:ec2-user /mys3bucket
s3fs iris-trainc0e3588c-d9bb-4699-821c-1883670ace42 -o use_cache=/tmp -o allow_other -o uid=500 -o mp_umask=002 -o multireq_max=5 /mys3bucket
```

Validate
```
df -Th
```

Mount at reboot
```
vi /etc/rc.local
/usr/bin/s3fs iris-trainc0e3588c-d9bb-4699-821c-1883670ace42 -o use_cache=/tmp -o allow_other -o uid=500 -o mp_umask=002 -o multireq_max=5 /mys3bucket
```
**or** add it to the User Data at execution!

In [8]:
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path="../.env")

user_data = [
    "#cloud-boothook",
    "#!/bin/bash",
    "yum update -q -y",
    "yum install automake fuse fuse-devel gcc-c++ git libcurl-devel libxml2-devel make openssl-devel -q -y",
    "git clone https://github.com/s3fs-fuse/s3fs-fuse.git /tmp/s3fs-fuse",
    "cd /tmp/s3fs-fuse",
    "./autogen.sh",
    "./configure --prefix=/usr --with-openssl",
    "make",
    "make install",
    'echo "' + os.getenv("AWS_ACCESS_ID") + ':' + os.getenv("AWS_SECRET_ACCESS_KEY") + '" > /tmp/passwd-s3fs',
    "mv -f /tmp/passwd-s3fs /etc",
    "chmod 640 /etc/passwd-s3fs",
    "mkdir -p /mys3bucket",
    "chown ec2-user:ec2-user /mys3bucket",
    "s3fs " + bucket_name + " -o use_cache=/tmp -o uid=500 -o mp_umask=002 -o multireq_max=5 -o allow_other /mys3bucket",
    # "while read requirement; do conda install --yes $requirement; done < /mys3bucket/requirements.txt"
]
user_data = "\n".join(user_data)

In [9]:
from IPython.display import display

import boto3
import socket
import time
from time import sleep

ec2 = boto3.resource('ec2')

# create a new EC2 instance
ec2_name = create_ec2_name("iris-train")
security_group = ['sg-0d24aec64507df8b5'] # SSH allowed
pem_key =  os.getenv("AWS_PEM_KEY")

instances = ec2.create_instances(
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                {
                    'Key': 'Name',
                    'Value': ec2_name
                },
            ]
        }
    ],
    ImageId='ami-062c42cbecc1d5ec0',
    MinCount=1,
    MaxCount=1,
    InstanceType='t2.micro',
#     InstanceType='t2.medium',
    KeyName=pem_key,
    SecurityGroupIds=security_group, #Bring your own!
    UserData=user_data
)

instance = instances[0]
print("instance id: ",instance.id)

#Provide status when instance is finally up!
retries = 10
retry_delay = 10
retry_count = 0

print("Wait till instance state changes to running")
instance.wait_until_running()
instance = ec2.Instance(id=instance.id)
print("Instance State Up, waiting for boot-up")

waiting_status = "instance is still loading retrying . . . "
dh = display(waiting_status,display_id=True)

while retry_count <= retries:
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    result = sock.connect_ex((instance.public_ip_address,22))
    if result == 0:
        print("Instance is UP & accessible on port 22, the IP address is:  ",instance.public_ip_address)
        break
    else:
        if len(waiting_status) < 50:
            waiting_status += ". "
        else:
            waiting_status = waiting_status[0:41]

        dh.update(waiting_status)
        time.sleep(retry_delay)

instance id:  i-02076ade24fd81199
Wait till instance state changes to running
Instance State Up, waiting for boot-up


'instance is still loading retrying . . . '

Instance is UP & accessible on port 22, the IP address is:   52.91.206.192


Run the following commands via SSH

In [11]:
import boto3
import botocore
# import boto
import paramiko
import os
from dotenv import load_dotenv
load_dotenv(dotenv_path="../.env")

pem_key = os.getenv("AWS_PEM_KEY")
pem_location = os.getenv("AWS_PEM_LOCATION")
pub_dns_name = instance.public_ip_address

def exec_cmd_ec2(public_dns_name,cmd):
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    privkey = paramiko.RSAKey.from_private_key_file(pem_location+pem_key+'.pem')
    ssh.connect(public_dns_name,username='ec2-user',pkey=privkey)
    stdin, stdout, stderr = ssh.exec_command(cmd)
    stdin.flush()
    data = stdout.read().splitlines()
    ssh.close()
    return data

From what I can tell miniconda app happens after UserData is complete, thus no installing conda environment

We will execute the following command through ssh client via root access
```
sudo su
while read requirement; do conda install --yes $requirement; done < /mys3bucket/requirements.txt
```

In [12]:
ssh_result = exec_cmd_ec2(pub_dns_name,"mkdir /mys3bucket/output")
ssh_result = exec_cmd_ec2(pub_dns_name,"mkdir /mys3bucket/model")
ssh_result = exec_cmd_ec2(pub_dns_name,"echo 'for requirement in `cat /mys3bucket/requirements.txt` ; do  conda install --yes  ${requirement}  ;  done' > /mys3bucket/setup.sh")
ssh_result = exec_cmd_ec2(pub_dns_name,"chmod +x /mys3bucket/setup.sh")
ssh_result = exec_cmd_ec2(pub_dns_name,"sudo  -i /mys3bucket/setup.sh")

### Execute Python Script<a id=Execute_Code_EC2_Script></a>

Execute the train.py file!
```
cd /mys3bucket
python train.py
```

Execute the train.py file!
```
cd /mys3bucket
python train.py
```

In [21]:
ssh_result = exec_cmd_ec2(pub_dns_name,"cd /mys3bucket/; python /mys3bucket/train.py")

[ print(l) for l in ssh_result]

b'Starting the training.'
b'Fitting 5 folds for each of 36 candidates, totalling 180 fits'
b'Training complete.'


[None, None, None]

### Stop \ Terminate EC2 Instance<a id=Execute_Code_EC2_Stop></a>



In [22]:
import boto3
client = boto3.client('ec2')

response = client.stop_instances(
    InstanceIds=[
        instances[0].id,
    ]
)

In [23]:
response

{'StoppingInstances': [{'CurrentState': {'Code': 64, 'Name': 'stopping'},
   'InstanceId': 'i-02076ade24fd81199',
   'PreviousState': {'Code': 16, 'Name': 'running'}}],
 'ResponseMetadata': {'RequestId': 'e75439cf-78b4-4287-863f-c5cd7515f6ac',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'text/xml;charset=UTF-8',
   'content-length': '579',
   'date': 'Tue, 12 Nov 2019 02:43:59 GMT',
   'server': 'AmazonEC2'},
  'RetryAttempts': 0}}

[Go back to top](#top)
## Sagemaker<a id=Execute_Code_SM></a>
Sagemaker requires setting up the S3 Folder Structure alittle differently
```
example_src/
+----data/
    +--iris.csv
+----output/
```

### Setup S3<a id=Execute_Code_SM_S3></a>
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_iris/Scikit-learn%20Estimator%20Example%20With%20Batch%20Transform.ipynb

In [39]:
import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

# from sagemaker import get_execution_role
# role = get_execution_role()
# role
role = "arn:aws:iam::741519135447:role/service-role/AmazonSageMaker-ExecutionRole-20191018T183794"

In [43]:
import numpy as np
import os
from sklearn import datasets

# Load Iris dataset, then join labels and features
iris = datasets.load_iris()
joined_iris = np.insert(iris.data, 0, iris.target, axis=1)

# Create directory and write csv
os.makedirs('./data', exist_ok=True)
np.savetxt('./data/iris.csv', joined_iris, delimiter=',', fmt='%1.1f, %1.3f, %1.3f, %1.3f, %1.3f')

WORK_DIRECTORY = 'data'

train_input = sagemaker_session.upload_data(WORK_DIRECTORY, key_prefix="{}/{}".format("iris-train", WORK_DIRECTORY) )

print(train_input)

s3://sagemaker-us-east-1-741519135447/iris-train/data


In [47]:
from sagemaker.sklearn.estimator import SKLearn

script_path = '../example_src/sm_train.py'

sklearn = SKLearn(
    entry_point=script_path,
    train_instance_type="ml.c4.xlarge",
    role=role,
    sagemaker_session=sagemaker_session,
    hyperparameters={'max_leaf_nodes': 30})

In [48]:
sklearn.fit({'train': train_input})

2019-11-10 22:56:06 Starting - Starting the training job...
2019-11-10 22:56:11 Starting - Launching requested ML instances.........
2019-11-10 22:57:42 Starting - Preparing the instances for training...
2019-11-10 22:58:36 Downloading - Downloading input data...
2019-11-10 22:58:58 Training - Downloading the training image..[31m2019-11-10 22:59:18,182 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[31m2019-11-10 22:59:18,185 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-11-10 22:59:18,195 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m
[31m2019-11-10 22:59:18,495 sagemaker-containers INFO     Module sm_train does not provide a setup.py. [0m
[31mGenerating setup.py[0m
[31m2019-11-10 22:59:18,495 sagemaker-containers INFO     Generating setup.cfg[0m
[31m2019-11-10 22:59:18,495 sagemaker-containers INFO     Generating MANIFEST.in[0m
[31m2019-11-10 22:59:18,4

In [49]:
predictor = sklearn.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*

UnexpectedStatusException: Error hosting endpoint sagemaker-scikit-learn-2019-11-10-22-56-06-322: Failed. Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

In [50]:
import itertools
import pandas as pd

shape = pd.read_csv("data/iris.csv", header=None)

a = [50*i for i in range(3)]
b = [40+i for i in range(10)]
indices = [i+j for i,j in itertools.product(a,b)]

test_data = shape.iloc[indices[:-1]]
test_X = test_data.iloc[:,1:]
test_y = test_data.iloc[:,0]

In [51]:

print(predictor.predict(test_X.values))
print(test_y.values)

NameError: name 'predictor' is not defined