# Introduction to Deployment

This section is as platform-agnostic as possible but application part focuses on Amazon Web Services (AWS). 

Contents

- Cloud Computing
- Machine Learning in the Workplace
- Deployment 

After getting familiar with machine learning deployment we'll put these ideas to practice using [Amazon SageMaker](https://aws.amazon.com/sagemaker/) as one way to deploy machine learning models. 

Questions to answer: 

- What is the machine learning workflow?
- How does **deployment** fit into the machine learning workflow?
- What is cloud computing?
- Why are we using cloud computing for deploying machine learning models?
- Why isn't deployment a part of many machine learning curriculums?
- What does it mean for a model to be deployed?
- What are the essentail characteristics associated with the code of deployed models?
- What are different cloud computing platform we might use to deploy our machine learning models?

## Machine Learning Workflow

Consists of three components: 

1. Explore & Process Data
 - Retrieve data
 - Clean & Explore: Explore patterns, remove any outliers
 - Transform and prepare: Data Normalization, train-validation-test split
2.  Modeling
 - Develop & Train Model
 - Validate / Evaluate Model
3. Deployment 
 - Deploy to Production
 - Monitor and Update Model & Data 

References: 

- AWS discusses their definition of the [ML Workflow](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html)
- Google Cloud Platform (GCP) and their definition of the [ML Worklflow](https://cloud.google.com/ml-engine/docs/tensorflow/ml-solutions-overview)
- Microsoft Azure on their definition of the [ML Workflow](https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml)

## Cloud Computing

Can be thought of as transforming an IT product into a service. 

> Using an internet connected device to log into a cloud computing service to access an IT resource. These IT resources are stored in the clud provider's data center. 

Other cloud services than cloud storage: 

- Cloud applications, databases, virtual machines, SageMaker, etc. 

#### Why use cloud computing?

Opt to use cloud computing services due time and cost constraints of building own capacities (see capacity utilization graph below). The graph shows schematically how cloud computing compares to traditinoal infrastructure related to customer demand.

<img src="../images/curve3.png">

Building up or laying off infrastructure takes time and money, whereas cloud computing is easily scalable to the current demand. As it is assumed to very costly to have excess demand as well as it is costly to be capacity restricted it is economically reasonable to rely on cloud computing services. 

- AWS: [What is Cloud Computing?](https://aws.amazon.com/de/what-is-cloud-computing/)

Benefits: 

1. Reduced investments and proportional costs (cost reduction)
2. Increased scalability (providing simplified capacity planning)
3. Increased availability and reliability (providing organizatinal agility)

Risks:

1. (Potential) Increase in Security Vulnerabilities
2. Reduced Operational Governance Control (over cloud resources)
3. Limited Portability Between Cloud Providers
4. Multi-regional Compliance and Legal Issues

### Deployment to Production

- Integrate machine learning model into an existing production environment 
- Model needs to be provided to those responsible for deployment. 

In the following we will assume taht the machine learning model was developed in Python. 

Three primary methods used to transfer a model from the modeling component to the deployment component (least to most commonly used): 

- Python model is recorded into the programming langauge of the production environment
- Model is coded in Predictive Model Markpu Language (PMML) or Portable Format Analytics (PFA)
- Python model is converted into a format that can be used in the production environment (i.e. SageMaker).
 - Use libraries and methods that convert the model into code that can be used in the production environment like PyTorch, TensorFlow, Scikit-Learn, etc. that convert Python models intot he intermediate standard format, such as [Open Neural Network Exchange](https://onnx.ai/) format.
 - This standard format can be converted into the software native of the production environment. 
 
The last one is the easierst and fastesst way to move a Python Model from modeling directly to deployment: 

- Typical way to move models into the production environoment
- Technologies like *containers*, *endpoints* and *APIs* (Application Programming Interfaces) also help ease the work required for deploying a model into production environment. 

In earlier stages development was typically handled by analysts, whereas operations (deployment) was handled by software developers responsible for the production environment. 

Recently, this division between development and operations softens enabling analysts to handle certain aspects of deployment and enables faster updates to faltering models. 

Advances in cloud services, like [SageMaker](https://aws.amazon.com/sagemaker/) and [ML Engine](https://cloud.google.com/ml-engine/), and deployment technologies, like Containers and REST APIs, allow for analysts to easily take on the responsibilities of deployment. 

### Production Environments

- Endpoint: Interface to the model
- The interface (enpoint) facilitates an ease of communication between the modle and the application.

<img src="../images/endpoint2.png">

One way to think of the **endpoint** that acts as this interface: 

- **endpoint** itself if like a function call
- the **function** itself would be the model and
- the **Python program** is the application

Similar to the example above:

- **Endpoint** accepts user data as the **input** and **returns** the model's prediction based upon this input through the endpoint (similar to a function call)
- In the example, the user data is the input argument and the prediction is the returned value from the function call. 
- The **application**, here is the **python program**, displays the model's prediction to the application user. 

The endpoint itself is just the interface between the model and the application. 

- interface enables users to get predictions from the deployed model based on their user data. 

#### How does the endpoint (interface) facilitates communication between application and model?

Application and model communicate throught he endpoint (interface). The enpoint is an Application Programming Interface (API). 

- API: set of rules that enable programs (here the application and the model) to communicate with each other

Here, the *API* uses a **RE**presentational **S**tate **T**ransfer, **REST** architecture that provides a framework for the set of rules and constraints that mus be adhered to for communication betweeen programs. 

- Hypertext Transfer Protocol (HTTP): application protocol for distributed, collaborative, hypermedia information systems. Foundation of data communication for WWW. 
- **REST API** is one that uses HTTP requests and responses to enable communication between the application and the model through the endpoint (interface). 
- **HTTP request** and **HTTP response** are communications sent between the application and model. 

#### HTTP request

HTTP request sent from applicaion to model consists of four parts: 

- Enpoint: Endpoint in the form of a Uniform Resource Locator (URL), aka web address
- HTTP method: Four **HTTP methods**. For deployment of our application we'll use the **POST method**.
- HTTP Headers: The **headers** will contain additional information (like data format within the message) that's passed to the receiving program. 
- Message (Data or Body): The final part is the **message** (data or body); for deployment this will contain the user's data which is input into the model. 

<img src="../images/httpmethods.png">

#### HTTP response

Sent from model to your application and is composed of three parts: 

- HTTP Status Code: If successfully received and processed the user's data that was sent in the **message** status code should start with a 2 (i.e. 200)
- HTTP Headers: The headers will contain additional information, like format of the data within the message, thats passed to the receiving program. 
- Message (Data or Body): What's returnes as the data within the message is the prediction that's provided by the model. 

The prediction is then presented to the application user through the application. The enpoint is the interface that enables communication between the application and the model using a **REST API**. 

#### Whats's application's reponsibility?

- Format the user's data to put into the HTTP request message and be used by the model
- Translate predictions from the HTTP response message in a way that's easy for the application user's to understand. 

Information included in the HTTP messages sent between **application** and **model**: 

- User's data will need to be in a CSV or JSON format with a specific ordering of the data. Ordering depends on the used model. 
- Often predictions will be returned in CSV or JSON format with a specific ordering of the returned predictions. Ordering depends on the used model. 

## Containers

So far, two primary programs, the **model** and the **application**, that communicate with each other through the **endpoint (interface)**

<img src="../images/endpoint3.png">

- What is the **model**? The model is the Python model that's created, trained, and evaluated in the modeling component of the machine learning workflow.
- What is the **application**? The application is a web or software that enables the users to use the model to retrieve predictions.

Both, model and application, require a computing environment. One way to create this environment is to use **containers**. Containers are created using a script that contains instructions on which software packages, libraries, and other computing attributes are needed in order to run a software application, in our case either the model or application. 

####  But what is a container?

> A container can be thought of as a standardized collection/bundle of software that is to be used for the specific purpose of running an application. 

A common container software is [Docker](www.docker.com). 

### Containers, explained

Shipping container analogy:

- Shipping container can contain a wide variety of products
- Structure of a shipping container provides the ability to hold different types of products

Docker containers:

- Can contain all types of different software. 
- Structure of a Docker **container** enables the **container** to be created, saved, used, and deleted through a set of common tools. 
- The common tool set works with **any container** regardless of the software the **container** contains. 

The image below shows three containers running three different applications

<img src="../images/container.png">

This architecture provides the following advantages:

- Isolates the application, which increases security
- Requires only software neede to run the application, which uses computational resources more efficiently and allows for faster application deployment. 
- Makes application creation, replication, delection, and maintenance easier and the same across all applicatinos that are deployed using containers. 
- Provides a more simple and secure way to replicate, save, and share containers. 

A container script file is used to create a container. 

- Can easily be shared with others, provides a simple method to replicate a particular container. 
- The container script is simply the instructiuons (algorithm) that is used to create a container. For *Docker*, these files are called *dockerfiles*. 

<img src="../images/container2.png">

- Container engine uses a container script to create a container for an application to run within.
- These container script files can be stored in repositories, which provide a simple means to share and replicate containers. 
- Docker: [Docker Hub](https://hub.docker.com/explore/) is the official repository for storing and sharing dockerfiles. 
- Example of a dockerfile: [Link](https://github.com/pytorch/pytorch/blob/master/docker/pytorch/Dockerfile)
 - The dockerfile creates a docker container with Python 3.6 and PyTorch installed.

## Characteristics of Deployment and Modeling

#### What is Deployment?

Method that integrates a machine learnin model into an existing production environment so that the model can be used to make decisions or predictions based upon data input into this model. 

#### Whas is a production environment?

A production environment can be thought of as a web, mobile, or other software application that is currently being used by many people and must respond quickly to those users' requests. 

### Characteristics of modeling

#### Hyperparameters

In ML, a hyperparameters is a parameter whose value cannot be estimated from the data: 

- Not learned through the estimators. 
- Must be set by the developer
- Hyperparameter tuning is an important part of model training. 
- Cloud platform machine learning services often provide methods that allow for automatic hyperparameter tuning for use with model training
- Without automatic hyperparameter option, one option is to use methods from scikit-learn Python library for hyperparameters tuning ([link](https://scikit-learn.org/stable/modules/grid_search.html#).

### Characteristics of Deployment

#### Model Versioning

- Saving model version as model's metadata in database
- deployment platform should indicate a deployed model's version. 

#### Model Monitoring

- Monitor the performance of the model
- Application may need to be updated

#### Model Updating and Routing

Another characteristic: 

- Ability to update deployed model
- If the monitoring process shows that performance metrics are not met the model requires updating. 
- Change in the data generating process: Collect these data to update the model
- Routing: To allow comparison of performance between the deployed model variants, routing should be supported.

#### Model Predictions

Two common type of predictions provided by the deployed model. 

- On-demand predictions (online, real-time, synchronous predictions) 
 - Predictions are returned in the response from the request. Often, these requests and responses are done through an API using JSON or XML formatted strings.
 - Commonly used to provide real-time, online responsen based upon a deployed model. 
- Batch predictions (asynchronous, batch-based predictions)
 - One expects high volume of requests with more periodic submissions, latency won't be an issue. 
 - Batch request points to specifically formatted data file or request and will return the predictions to a file. Cloud services require these files will be stored in the cloud provider's cloud. 
 - Batch predictions are commonly used to help make business decisions (i.e. for weekly reports). 

<img src="../images/mlworkflow.png">

## Comparing Cloud Providers

Focus on [Amazon's SageMaker](https://aws.amazon.com/sagemaker/). Similar to SageMaker is [Google's ML Engine](https://cloud.google.com/ml-engine/). 

### Amazon Web Services (AWS)

Amazon's cloud service to build, train, and deploy ML models. 

Advantages: 

- Use of any programming language or software framework for building, training, and deploying amchine learning model in AWS
- [Built-in algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html) - Various built-in algorithms, e.g. 
 - for discrete classification or quantitative analysis using [linear learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) or 
 - [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html)
 - item recommendations using [factorization machine](https://docs.aws.amazon.com/sagemaker/latest/dg/fact-machines.html), 
 - grouping based upon attributes using [K-Means](https://docs.aws.amazon.com/sagemaker/latest/dg/k-means.html), 
 - an algorithm for [image classification](https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html)
 - Time Series Analysis with [DeepAR](https://docs.aws.amazon.com/de_de/sagemaker/latest/dg/deepar.html)
- Custom Algorithms - Different programming languages and software frameworks that can be used to develop custom algorithms
 - [PyTorch](https://docs.aws.amazon.com/sagemaker/latest/dg/pytorch.html), [TensorFlow](https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html), [Apache Spark](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html), and [Chainer](https://docs.aws.amazon.com/sagemaker/latest/dg/chainer.html)
- [Own algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html) - Use your own algorithm when it isn't included within the built-in or custom algorithms above

In addition, the use of [Jupyter Notebooks](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) is enabled and there are the following additional features and automated tools that make modeling and deployment easier:

- [Automatic Model Tuning: SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) - Feature for hyperparameter tuning of built-in and custom algorithms. In addition, SageMaker provides evaluation metrics for buil-in algorithms
- [Monitoring Models in Sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-overview.html) - Features to monitor your deployed models. One can choose how much traffic to route to each deployed model (model variant). 
 - More information on routing: [here](https://docs.aws.amazon.com/sagemaker/latest/dg/API_ProductionVariant.html) and [here](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateEndpointConfig.html)
- Type of Predictions - SageMaker allows for [On-demand](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-test-model.html) type of predictions whre each prediction request can contain one to many requestst. SageMaker also allows for [Batch](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html) predictions, and request data size limits are based upon S3 object size limits. 

### Google Cloud Platform (GCP)

[Google cloud Platform (GCP) ML Engine](https://cloud.google.com/ml-engine/) is Google's cloud service. **Similarities** and **differences:** 

- Prediction costs: [ML Engine pricing](https://cloud.google.com/ml-engine/docs/pricing#node-hour) vs. Sagemaker pricing.
- Ability to explore and process data: Jupyter Notebooks are not available within ML Engine
 - To use Jupyter Notebooks within GCP, one would use [Datalab](https://cloud.google.com/datalab/docs/) can be used to explore and transform raw data into clean data for analysis and processing, 
 - [DataFlow](https://cloud.google.com/dataflow/docs/) can be used to deploy batch and streaming dta processing pipelines
 - AWS also has data processing and transformation pipeline services: [AWS Glue](https://aws.amazon.com/glue/) and [AWS Data Pipeline](https://aws.amazon.com/datapipeline/)
- Machine Learning Software: [Google's ML Engine](https://cloud.google.com/ml-engine/) has less flexibility in available software frameworks for building, training, and deploying machine learning models in GCP, compared to Amazon's SageMaker. 

The two available software frameworks for modeling within **ML Engine**: 

- [Google's TensorFlow](https://cloud.google.com/ml-engine/docs/tensorflow/) - Keras is a higher level API written in Python taht runs on top of TF. 
 - [TensorFlow examples](https://cloud.google.com/ml-engine/docs/tensorflow/samples)
 - [Keras example](https://cloud.google.com/ml-engine/docs/tensorflow/samples#census-keras)
- [Google's Scikit-learn](https://cloud.google.com/ml-engine/docs/scikit/) and [XGBoost Python package](https://xgboost.readthedocs.io/en/latest/python/index.html) can be used together for creating, training, and deploying machine learning models. 
 - In [Google's example](https://cloud.google.com/ml-engine/docs/scikit/training-xgboost) XGBoost is used for modeling and Scikit-learn is used for processing the data. 

Flexibility in Modeling and Deployment

- [Automatic Model Tuning](https://cloud.google.com/ml-engine/docs/tensorflow/hyperparameter-tuning-overview)
- [Monitoring Models](https://cloud.google.com/ml-engine/docs/tensorflow/monitor-training)
- Type of predictions - ML Engine allows for [Online](https://cloud.google.com/ml-engine/docs/tensorflow/online-predict) type of predictions whre each prediction request can contain one to many requests. ML Engine also allows for [Batch](https://cloud.google.com/ml-engine/docs/tensorflow/batch-predict) predictions. For more information: [Online and Batch predictions](https://cloud.google.com/ml-engine/docs/tensorflow/online-vs-batch-prediction)

### Other frameworks

- Microsoft Azure
 - [Azure AI](https://azure.microsoft.com/en-us/overview/ai-platform/#platform)
 - [Azure Machine Learning Studio](https://azure.microsoft.com/en-us/services/machine-learning-studio/)
- [Paperspace](https://www.paperspace.com/ml) - simply provides GPU-backed virtual machines with industry standard software tools
 - Claims to provide more powerful and less expensive virtual machines than AWS, GCP or Azure
- [Cloud Foundry](https://www.cloudfoundry.org/) - open source cloud application platform

## Summary - Cloud Computing

- Cloud computing - Transforming an IT product into a service
- Deployment - Making model available for predictions through applications

## Cloud Computing Defined

<img src="../images/nistcloud.png">

The graphic above is from the Naional Institute of Standards and Technology (NIST) and its definition of cloud computing has three levels: 

- Service Models
- Deployment Models
- Essential Characteristics

### Service Models

#### Software as a Service (SaaS)

<img src="../images/cloud_saas.png">

The yellow dashed line in the graphic shows with SaaS, the only customer responsibilities are those attributed to a "user" and all other responsibilties are placed on the cloud provider. 

Software as a product (i.e. a physical copy like a cd) has become rare. 

Other examples of SaaS:

- email applications
- storage applications

#### Platform as a Service (PaaS)

<img src="../images/cloud_paas.png">

Examples: 

- Services that allow to easily build, host, monitor, and scale their applications using their platform. 
- i.e. build and host an e-commerce website. 

#### Infrastructure as a Service (IaaS)

with IaaS the customer has most responsibility beyond those associated with running secure data centers and maintaining the hardware and software that enables IaaS. 

<img src="../images/cloud_iaas.png">
    
Examples:

- AWS, Rackspace

IaaS enables the customer to provisioning computer processing, storage, networks, other fundamental computing resources

### Deployment Models of Cloud Computing

<img src="../images/deploymentmodels.png">

### Essential Characteristics

<img src="../images/essentialcharacteristics.png">

## ...

(left out the second optional part)

In [1]:
!!jupyter nbconvert "Introdution to Deployment".ipynb

['[NbConvertApp] Converting notebook Introdution to Deployment.ipynb to html',
 '[NbConvertApp] Writing 280499 bytes to Introdution to Deployment.html']