# ML Architecture
![](images/architecture.PNG)

In this section we will build the componenets related to development environment.As shown in the figure we will work on:
1. Training the Model
2. Building Feature Extractor
3. Building APIs for connecting ML services to the world wide web. 

# Environment Configuration

This module involves the list of things required to start our ML model deployment.
1. Github Account
2. Git bash terminal
3. Folking repository from [link]
4. Creating Virtual Environmnet
5. Installing Text Editor

Below are the series of steps that can be followed to configure them.

1. Create github account
2. Install Git bash terminal from [https://git-scm.com/downloads]
3. Go to Command propt and cofigure name and email_id
    * git config --global user.name "your name"
    * git config -- global user.email youremailaddress@x.com
    <br>
Check on cmd to verify config by typing:-
    * git config user.name
    * git config user.email
    
4. Folk a repository
5. Opening Pull request to your repo instead of the original repo
    * git remote set-url origin [link]
6. Create a branch 
    * git checkout -b test-branch-2 
7. Do commit
    * git commit --allow-empty -m "opening project"
8. Pull request 
    * git push origin test-branch-2
    
9. Creating virtual env---
Go into the required folder
    * python -m venv deploy
---Check throug typing
    * dir
10. Activate your Virtual Environment 
    * [env_name==deploy]\Scripts\activate
11. Deactivate your Virtual Environment
    * deactivate
12. Installing requirement files
    * pip install -r requirements.txt
13. Select any text editor
    * Subime
    * Vim
    * Emac
    * Pycharm

# Building Our Regression Package

## Directory

![](images/dir_structure.PNG)

The above files can be categorized based on the task they intend to do.We can broadly divide our task in the following categories:

1. Package Building
2. Versioning and Logging
3. Preprocessing
4. Feature Engineering
5. Building ML Pipeline
6. Model training
7. Model Prediction
8. Utility Modules 

## Saving the dataset for training and testing

1. Download the train.csv and test.csv files from the [Kaggle Competiton link](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
2. Save train.csv and test.csv files in the datasets folder of your directory

## Versioning and Logging of deployed model/package

Important for:
1. Reproducibility
Can store information about:
    * Input data
    * Time frmae in which predicitons were made
    
2. Clues for Debugging 
3. Conduct audits to meet Regulatory Requirements on the predictions we are making.



**Version file**
<br>
The Version file contains the specified version in the format [Major.Mino.Patch] eg: 0.1.0.
<br>
**init.py**
<br>
To set the version by reading from a file

**Config/logging_config.py**

1. get_console_handler()<br>
   for logging into a console
2. get_file_handler()<br>
    for logging into a file<br>
3. Format to store the metadata which contains datetime , logging function etc
4. get_logger()<br>
    For calling logger from different modules.

Check **data_management.py** file for usage

5. errors.py<br>
    Custom errors to give us more specific erros at the time of logging.

Check **features.py** file for usage



## Creating and loading package

We will be taking top down approach where we will first install the built regression package and then will go in details on how can we build this package

1. requirements.txt<br>
    List of libraries to be installed
2. setup .py<br>
    List all the details about the package
    
**To build package**<br>
python packages\regression_model\setup.py sdist bdist_wheel


sdist - Source Distribution<br>
bdist_wheel - Wheel distribution
*Note-Modern pip uses wheel distribution*

Check the dist folder to find the created package
![](images/package.PNG)

**To install package locally**<br>
pip install -e packages\regression_model

![](images/loading_pckg.PNG)


## Running training pipeline

![](images/training.PNG)

**Running training pipeline**<br>
We can see the role of logger in the output
![](images/train_pipeline.PNG)

The pickle file is saved under the trained_models folder

## Preprocessing

**preprocessors.py**<br>
    Lists all the preprocessing task involved in this model building exercise.
    
    * Categorical Imputer
    * Numerical Imputer
    * Temporal Variable estimator
    * RareLabel Categorical Encoder
    * Categorical Encoder
    * Drop Unnecessary Features
**validation.py**<br>
    This module validate inputs and it's basically another layer of checking and safety to make sure that any values that come into our model all handled in a way that allows us to continue with prediction.

*Check predict module for usage*

## Feature Engineering

**1. features.py**<br>
    List all the feature engineering work.Here we have just a single feature engineering task of log transformation.However this module can be far more complex than this example for eg:
    * Accessing a database to pick precalclulated features.
    * Third party API call to gather information .eg - Weather
    * A separate model to generate features that will be an input feature to our current model.

The features can be a very complicated section of your application and it could indeed be imported as a totally separate package with its own versioning.




## ML Pipeline

One of the often reason for ML models to break in production is the reproducibility in offline and online environment.Hence, it is necessary to collate all the preprocessing tasks and create a pipeline that can be leverage both at the time of training and at the time of inference.

Current Pipeline does the following tasks:
1. Categorical Imputer
2. Numerical Imputer
3. Temporal Variable
4. Rare Label Encoder
5. Categorical Encoder
6. Log transform
7. Drop Features
7. MinMax Scalar
8. Model Selection

## Training

**train_pipeline.py**<br>
   This module involves training the model leveraging all the modules defined above.
   
   Major tasks performed in this module:
   1. Load Dataset
   2. Train-Test Split
   3. Trasformation on Target Variable
   4. Running Pipeline
   5. Saving Pipeline



## Predict

## Utility modules

1. **Data Management.py**<br>
Lists all the utility functions required in model training.
2. **config.py**<br>
Configuration file with all the parameters

# Good practices

1. Use version control
2. Write tests ! Unit, Integration , Acceptance tests 
3. Trunk based development and peer reviews
4. Understand your system dependencies
5. Use CI/CD


# REST API

Representational State Transfer(REST) Application Program Interface(API)

Serving our model using API has the following advantages:
1. Serve predictions on the fly to multiple clients ( websites,phone,API etc)
2. Separate model development from client facing layer
3. Combine multiple models at different API endpoints
3. Bring Scale ..   by adding more instances of the API application behind any load balancer 

We will build our API using the Flask microframework.

Alternatives to look for:
1. Django
2. Pyramid
3. Bottle
4. Tornada
5. API Star etc

Before going and building the API for our model let us undertsand what an API is???

### API 

An API is an application programming interface. It is a set of rules that allow programs to talk to each other. The developer creates the API on the server and allows the client to talk to it.

### REST

REST determines how the API looks like.It is a set of rules that developers follow when they create their API.One of these rules states that you should be able to get a piece of data (called a **resource**) when you link to a specific URL.

Each URL is called a **request** while the data sent back to you is called a **response**.


**JSON (JavaScript Object Notation)** a common format for sending and requesting data through a REST API. 

### REQUEST

request is made up of four things:

1. **The endpoint** -The endpoint (or route) is the url you request for.
    1. **path** - determines the resource you’re requesting for. Think of it like an automatic answering machine that asks you to press 1 for a service, press 2 for another service, 3 for yet another service and so on.
2. **The method** -The method is the type of request you send to the server. You can choose from these five types below:
    1. **GET**   - Read a resource on a server
    2. **POST**  - Creates a new resource on a server
    3. **PUT and PATCH**  - Update a resource on a server
    4. **DELETE** - Delete a resource from a server.
3. **The headers**    - Headers are used to provide information to both the client and server.eg authentication and providing information about the body content
4. **The data (or body)** - The data (sometimes called “body” or “message”) contains information you want to be sent to the server

# Building Our API Package

![](images/api_package_content.PNG)

## requirements.txt

**Note- Comment Neural Network and other unrequired packeages

Lists the package to build our API Package.

Run the requirements file<br>
pip install -r packages\ml_api\requirements.txt

Ignore the neural network package and error for now

## run.py

It is the entry point to start flask.

**create_app()** is defined under **api\app.py** which creates the flask api and the blueprint.Right now the blueprint is creating multiple endpoint which is defined under **api\controller.py**.



## controller.py

Lists the different endpoint made in this api.

1. Health
2. Version
    1. Model Version
    2. API Version
3. Regression Prediction
    1. Get the json data
    2. Validate the input format of the data
    3. Make preictions
    4. Send predctions , version , errors as json back to client
4. NN Prediction

## validation.py


List the schema details of the data and the validation methods required at the time of running validations.

## Run the flask api

**To test the running of flask webapp**

- cd packages
- cd ml_api
- set FLASK_APP=run.py
- python run.py

![](images/run_flask.PNG)
Flask app is running at 127.0.0.1:5000

Check the **health endpoint** 

    http://127.0.0.1.5000/health



![](images/health_endpoint.PNG)
![](images/version.PNG)

## config.py

Similar to the config file for logging format and setup code used in regression package.

It has:
1. Loggers with right handlers
2. Config objects to set particular flask properties.<br>
    See usage in app.py<br>
    flask_app.config.from_object(config_object)
![](images/config.PNG)



## tests

1. **tests/conftest.py**

Creating test fixtures which can then later be passed in an argument in tests where they return values.

They can be used to test any endpoint.

2. **tests/test_validation.py**

Test file for the validation of validation.py file

3. **tests/test_controller.py**

Tests different configured endpoints
    1. Health endpoint
    2. Version endpoint
    3. Regression endpoint
    4. NN endpoint

Lets test our ***health*** endpoint

**Note-Comment all other tests for now except health endpoint**<br>
tests/test_controller.py
![](images/test_controller.PNG)


Run - pytest packages/ml_api/tests

![](images/tests.PNG)


Under test controller you can find :
    - test_prediction_endpoint_predictions
    Do make a note that all the heavy lifting has been excluded from api and is present in the regression package.So the model package contains all the things to test aswell.In that way we are ruling out a scenario where we update our model but fail to update our test data in the api.

# FLASK Crash Course

1. @ -Decorators in flask used for defining endpoint or root. Here we are definining a health endpoint and we can access that using http **GET**.

![](images/define_endpoint.PNG)

2. Blueprint - They are like a code template that record operations to execute when registered on the application.

![](images/blueprint.PNG)
3. Register the template to the application 
![](images/register.PNG)

# Continuos Integration and Continuos Deployment

![](images/CICD.PNG)


It talks about automating the stages of development


CI/CD pipeline in case of Machine Learning Model Deployment

![](images/CICD pipeline.PNG)


## Prerequisite -Creating a Github repo

Create a git repository 
Link your local Deployment codes to git repository

Steps:
1. Goto github and create a repo 
2. Got to git terminal - > Deploymnet folder
3. Make this folder a git repo by typing
    - git init
    
4. Commit the repo
    - git add .
    - git commit -m "update"
    
4. Setup remote instance
    - Copy the https link from github repo
    - git remote
    - git remote add origin [link]
    - git push origin master 

## Cicle CI

We will be building our CI/CD pipeline on CircleCI - a CI/CD platform.

Features:
1. Hosted Platform  i.e will rely on their servers
2. Easy github integration
3. Can take up 1 free project

Alternatives:
1. Jenkins
2. Travis CI
3. Bamboo
4. Gitlab CI
5. Team City

## Setup Circle CI

1. Login to Cicle CI using github account.
2. Add a project "Deployment" from your github repo.

# Reference

1. [Git CheatSheet](https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet)
2. [Git course](https://www.pluralsight.com/courses/code-school-git-real)
3. [Working with folks](https://stackoverflow.com/questions/25545613/how-can-i-push-to-my-fork-from-a-clone-of-the-original-repo)
4. [Testing](https://landing.google.com/sre/sre-book/chapters/testing-reliability/)
5. [Trunk based Development](https://trunkbaseddevelopment.com/)
6. Fluent Python
7. The Devops Handbook<BR>
**PACKAGING**    
8. [Python Packaging](https://packaging.python.org/)
9. [Python Versioning](https://packaging.python.org/guides/single-sourcing-package-version/)
10. [ Python Logging](https://docs.python.org/3/library/logging.html)
9. [Python packaging and PyPI](https://www.youtube.com/watch?v=na0hQI5Ep5E)
10. [Setuptools documentation](https://setuptools.readthedocs.io/en/latest/)
11. [Wheel Documentation](https://wheel.readthedocs.io/en/stable/)
12. [Pytest Documentation](https://docs.pytest.org/en/latest/)<BR>
**REST API**
13. [REST API Principles](https://restfulapi.net/rest-architectural-constraints/)
14. [REST API Walkthrough](https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/)
15. [Flask Tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world)
16. [Web Frameworks](https://github.com/vinta/awesome-python#web-frameworks)