# Admin Setup
In this notebook we play the role of Data and DevOps engineers who are in charge of maintaining S3 resources (datalake and project buckets) as well as the code repository, Docker image for model training/hosting and IAM credentials respectively.

**This sample is provided for demonstration purposes, make sure to conduct appropriate testing if derivating this code for your own use-cases!**


### Step0: Give a title to be used in the name of all project artifacts (code repos, S3 buckets, ML artifacts, Lambda functions, automated pipeline, etc.)

In [None]:
WORKFLOW_NAME = <Give a name to your project>

### Step 1: Create CodeCommit Repo and Push Code to it
The cell below creates an AWS CodeCommit repo for this demo. It then adds, commits and pushs our code to this repo.

In [None]:
import pandas as pd
import boto3
from botocore.exceptions import ClientError
import logging
import json
import sagemaker
from sagemaker.s3 import S3Uploader
session = sagemaker.Session()
region = session.boto_session.region_name

codecommit_client = boto3.client('codecommit')

repo_name = WORKFLOW_NAME
repo_desc = "Automated model (re)training/tuning/hosting via AWS Lambda and Step Functions"


response = codecommit_client.create_repository(
    repositoryName=repo_name,
    repositoryDescription=repo_desc,
    tags={
        'author': "author name"
    }
)

#### Clone the above (empty) CodeCommit repo. This repo will be ingested by our pipeline automation step.

In [None]:
!git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/{repo_name} /home/ec2-user/SageMaker/{repo_name}

#### Copy code to the repo (local)

In [None]:
!cp -r ./* /home/ec2-user/SageMaker/{repo_name}/

#### Add and Commit code files to local repo, then Push them to master branch of the remot repo on CodeCommit

In [None]:
%cd /home/ec2-user/SageMaker/{repo_name}/
!git add .
!git commit -m "add your comment here..."

!git push

#### Go back to dev folder

In [None]:
%cd /home/ec2-user/SageMaker/serverless-mlops-with-aws-sagemaker-lambda-and-stepfunctions/

### Step 2: Create a toy "datalake" S3 and a "my-project" bucket
The following cell creates two S3 buckets `my-datalake` and `my-project`, feel free to choose whatever name you like. It then:
* Uploads our labeled but pre-processed dataset to the `my-datalake` bucket... we will not mess with this bucket, it's job of a data engineer to maintaine it.
* We however will work with the `my-project` bucket. We will store everything in it, from our processed datasets to our source code, model binaries and model performance statitics, all of which will be versioned with respect to the date at which the workfolow was launched.

In [None]:
def create_bucket(bucket_name, region=None):
    """Create an S3 bucket in a specified region

    If a region is not specified, the bucket is created in the S3 default
    region (us-east-1).

    :param bucket_name: Bucket to create
    :param region: String region to create bucket in, e.g., 'us-west-2'
    :return: True if bucket created, else False
    """

    # Create bucket
    try:
        if region is None:
            s3_client = boto3.client('s3')
            s3_client.create_bucket(Bucket=bucket_name)
        else:
            s3_client = boto3.client('s3', region_name=region)
            location = {'LocationConstraint': region}
            s3_client.create_bucket(Bucket=bucket_name,
                                    CreateBucketConfiguration=location)
    except ClientError as e:
        logging.error(e)
        return False
    return True


# Create S3 Buckets for this project (pass if they exist)
account_id = boto3.client('sts').get_caller_identity().get('Account')
source_bucket = "{}-datalake-{}".format(WORKFLOW_NAME, account_id)
project_bucket = "{}-project-{}".format(WORKFLOW_NAME, account_id)

create_bucket(bucket_name=source_bucket, region=None)
create_bucket(bucket_name=project_bucket, region=None)
print("*****************************Storage*******************************")
print("Source Bucket: {bucket}".format(bucket=source_bucket))
print("Project Bucket: {bucket}".format(bucket=project_bucket))

#### Upload toy data to the source-bucket "datalake". Here we use the Boston house prices dataset

In [None]:
from sklearn.datasets import load_boston
data = load_boston()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['PRICE'] = data.target
df.to_csv("boston.csv", index=None)
data_source = S3Uploader.upload(local_path='boston.csv',
                               desired_s3_uri="s3://{}/{}".format(source_bucket, "data"),
                               #session=session
                               )
print("Source Dataset: {} \n".format(data_source))

### Step 3: Create Docker image for model training/hosting

We have to create a Docker Image for our training container and submit it to Amazon ECR. In this demo, we will use the pre-built [sagemaker-scikit-learn](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-docker-containers-frameworks.html) image.

In [None]:
# Create Docker Image for Training Job
def container_arn(region):
    image_registry_map = {
        'us-west-1': '746614075791',
        'us-west-2': '246618743249',
        'us-east-1': '683313688378',
        'us-east-2': '257758044811',
        'ap-northeast-1': '354813040037',
        'ap-northeast-2': '366743142698',
        'ap-southeast-1': '121021644041',
        'ap-southeast-2': '783357654285',
        'ap-south-1': '720646828776',
        'eu-west-1': '141502667606',
        'eu-west-2': '764974769150',
        'eu-central-1': '492215442770',
        'ca-central-1': '341280168497',
        'us-gov-west-1': '414596584902',
        'us-iso-east-1': '833128469047',
    }
    return (image_registry_map[region] + '.dkr.ecr.' + region 
            + '.amazonaws.com/sagemaker-scikit-learn:0.20.0-cpu-py3')

print("**************************Training Image***************************")
print("Docker Training Image: {} \n".format(container_arn(region)))

### Step 4: Create IAM Policies and Roles
We'll finally create an IAM role for our workflow. The IAM roles grant the services permissions within your AWS environment. For this demo, we will use the same role that we are using in this notebook.

In [None]:
# Create IAM Policies and Roles for this project (pass if they exist)
iam = boto3.client('iam')

# Set Role/Policy names
workflow_role_name = WORKFLOW_NAME+"-iam-role"
workflow_role_description = "Role to allow a step function systematic access to invoke Lambda functions, run data processing jobs in Glue, run SageMaker training jobs and update endpoints"

# Get Admin Policy ARN in order to give our workflow role the permission
# it needs to run the entire workflow on its own
aws_managed_admin_policy_ARN = "arn:aws:iam::aws:policy/AdministratorAccess"
# Note: This is not a recommended practice in dev/prod evirements. Please work with
# your admin on an IAM setup that is addresses your security requirements. 


# This trust policy allows other services to use this role (lambda/step-functions)
trust_policy = {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "states.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
        "Effect": "Allow",
        "Principal": {
            "Service": "sagemaker.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

# Create Step Function Role (pass if it already exists)
try:
    step_function_role = iam.create_role(
        RoleName=workflow_role_name,
        Description=workflow_role_description,
        AssumeRolePolicyDocument=json.dumps(trust_policy),
        MaxSessionDuration=3600
    )
except Exception as e:
    print(e)
    print("Unable to create {}".format(workflow_role_name))

    
# Attach the Admin Full Access Policy
try:
    iam.attach_role_policy(
        PolicyArn=aws_managed_admin_policy_ARN,
        RoleName=workflow_role_name
    )
except Exception as e:
    print(e)
    print("Unable to attach {} to {}".format(aws_managed_admin_policy_ARN, workflow_role_name))


# Get Role ARN in order to print
workflow_role_ARN = "arn:aws:iam::{account}:role/{name}".format(account=account_id, name=workflow_role_name)
print("Step Function Role ARN: {arn}".format(arn = workflow_role_ARN))

### Write the setup ARNs to disk

In [None]:
admin_setup = {
    "workflow_name": WORKFLOW_NAME,
    "source_bucket":source_bucket,
    "raw_data_path":data_source,
    "project_bucket":project_bucket,
    "repo_name":repo_name,
    "docker_image": container_arn(region),
    "workflow_execution_role": workflow_role_ARN
}

with open('admin_setup.txt', 'w') as filehandle:
    filehandle.write(json.dumps(admin_setup))
    
admin_setup