# Building a Scikit-Learn base Docker Image
We will start our MLOps journey here by creating an abstract Docker Image for supporting Scikit-learn algorithms/models.


So, after we create and test locally our Dockerfile, we'll send it to our first pipeline that will build this image and make it available in ECR.

![Docker Diagram](../../imgs/DockerScikit_A.jpg)

## First, lets create a Dockerfile

In [None]:
%%writefile Dockerfile
FROM python:3.6-jessie

RUN apt-get update -y && apt-get install -y libev-dev
RUN pip install bottle bjoern opencv-python pandas numpy scipy scikit-learn

RUN mkdir -p /opt/program
RUN mkdir -p /opt/ml

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

COPY app.py /opt/program
WORKDIR /opt/program

EXPOSE 8080
ENTRYPOINT ["python", "app.py"]

## Then, the a basic application that will host our model code

Please, notice that we're creating a WebService application with two methods: **ping** and **invocations**. Ping is for healthcheck and invocations is for calling your model.

For a production environment it is important to use a **WSGI** solution. We will use a combo of **bottle** and **bjoern**. Bottle is our webservice api and bjoern our WSGI server. Since bjoern is single threaded, you can't run multiple predictions at the same time. If you need something like that, maybe you need gunicorn and a reverse proxy to protect your endpoint.

In [None]:
%%writefile app.py
import json
import pickle
import sys
import os
import bjoern
import bottle

from bottle import run, request, post, get
from sklearn.externals import joblib

# adds the model.py path to the list
prefix = '/opt/ml'
model_path = os.path.join(prefix, 'model')
sys.path.insert(0,model_path)

print(os.listdir(model_path))
import model

@get('/ping')
def ping():
    return ""

@post('/invocations')
def invoke():
    # load image from POST and convert it to json
    req = json.loads(request.body.read())
    algo = request.get_header('X-Amzn-SageMaker-Custom-Attributes')
    return json.dumps(model.predict(req, algo))
    
if __name__ == '__main__':
    if len(sys.argv) < 2 or ( not sys.argv[1] in [ "serve", "train", "test"] ):
        raise Exception("Invalid argument: you must inform 'train' for training mode or 'serve' predicting mode") 

    train = sys.argv[1] == "train"
    test = sys.argv[1] == "test"
    
    if train:
        model.train()

    elif test:
        algo = sys.argv[2]
        req = eval(sys.argv[3])
        print( model.predict(req, algo) )
       
    else:
        bjoern.run(bottle.app(), "0.0.0.0", 8080)

## Finally, let's create the buildspec
This file will be used by CodeBuild for creating our base image

In [None]:
%%writefile buildspec.yml
version: 0.2

phases:
  install:
    runtime-versions:
      docker: 18
        
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG

  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker image...
      - echo docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
      - echo $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG > image.url
      - echo Done!
artifacts:
  files:
    - image.url
  name: image_url
  discard-paths: yes

### Building the image locally, first

In [None]:
!sudo docker build -f Dockerfile -t base-image:latest .

### Before we push our code to the repo, let's check the building process

In [None]:
import boto3

sts_client = boto3.client("sts")
session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name
credentials = session.get_credentials()
credentials = credentials.get_frozen_credentials()

repo_name='model'
image_tag='test'

In [None]:
!mkdir -p tests
!cp app.py Dockerfile buildspec.yml tests/
with open('tests/vars.env', 'w') as f:
    f.write("AWS_ACCOUNT_ID=%s\n" % account_id)
    f.write("IMAGE_TAG=%s\n" % image_tag)
    f.write("IMAGE_REPO_NAME=%s\n" % repo_name)
    f.write("AWS_DEFAULT_REGION=%s\n" % region)
    f.write("AWS_ACCESS_KEY_ID=%s\n" % credentials.access_key)
    f.write("AWS_SECRET_ACCESS_KEY=%s\n" % credentials.secret_key)
    f.write("AWS_SESSION_TOKEN=%s\n" % credentials.token )
    f.close()

!cat tests/vars.env

In [None]:
%%time

!/tmp/aws-codebuild/local_builds/codebuild_build.sh \
    -a "$PWD/tests/output" \
    -s "$PWD/tests" \
    -i "samirsouza/aws-codebuild-standard:2.0" \
    -e "$PWD/tests/vars.env"

## Ok, now it's time to push everything to the correct repo

In [None]:
%%bash

cd ../../docker
cp $OLDPWD/buildspec.yml $OLDPWD/app.py $OLDPWD/Dockerfile .

git add --all
git commit -a -m " - build docker base image"
git push

### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline