# ML Architecture
![](images/architecture.PNG)

In this section we will build the componenets related to development environment.As shown in the figure we will work on:
1. Training the Model
2. Building Feature Extractor
3. Building APIs for connecting ML services to the world wide web. 

# Environment Configuration

This module involves the list of things required to start our ML model deployment.
1. Github Account
2. Git bash terminal
3. Folking repository from [link]
4. Creating Virtual Environment
5. Installing Text Editor

Below are the series of steps that can be followed to configure them.

1. Create github account
2. Install Git bash terminal from [https://git-scm.com/downloads]
3. Go to Command propt and cofigure name and email_id
    * git config --global user.name "your name"
    * git config -- global user.email youremailaddress@x.com
    <br>
Check on cmd to verify config by typing:-
    * git config user.name
    * git config user.email
    
4. Folk a repository
5. Opening Pull request to your repo instead of the original repo
    * git remote set-url origin [link]
6. Create a branch 
    * git checkout -b test-branch-2 
7. Do commit
    * git commit --allow-empty -m "opening project"
8. Pull request 
    * git push origin test-branch-2
    
9. Creating virtual env---
Go into the required folder
    * python -m venv deploy
---Check throug typing
    * dir
10. Activate your Virtual Environment 
    * [env_name==deploy]\Scripts\activate
11. Deactivate your Virtual Environment
    * deactivate
12. Installing requirement files
    * pip install -r requirements.txt
13. Select any text editor
    * Subime
    * Vim
    * Emac
    * Pycharm
    
Note: In windows the pytest is very buggy.Make sure your versions are inline with what is mentioned below:

1. pytest==4.0.2
2. py==1.7.0
3. pluggy==0.8.1
4. attrs==19.1.0

# Commit-1-train_model

![](images/dir1.png)

## Saving the dataset for training and testing

1. Download the train.csv and test.csv files from the [Kaggle Competiton link](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data)
2. Save train.csv and test.csv files in the datasets folder of your directory

## Preprocessing

**preprocessors.py**<br>
    Lists all the preprocessing task involved in this model building exercise.
    
    * Categorical Imputer
    * Numerical Imputer
    * Temporal Variable estimator
    * RareLabel Categorical Encoder
    * Categorical Encoder
    * Drop Unnecessary Features


## ML Pipeline

One of the often reason for ML models to break in production is the reproducibility in offline and online environment.Hence, it is necessary to collate all the preprocessing tasks and create a pipeline that can be leverage both at the time of training and at the time of inference.

Current Pipeline does the following tasks:
1. Categorical Imputer
2. Numerical Imputer
3. Temporal Variable
4. Rare Label Encoder
5. Categorical Encoder
6. Log transform
7. Drop Features
7. MinMax Scalar
8. Model Selection

## Training

**train_pipeline.py**<br>
   This module involves training the model leveraging all the modules defined above.
   
   Major tasks performed in this module:
   1. Load Dataset
   2. Train-Test Split
   3. Trasformation on Target Variable
   4. Running Pipeline
   5. Saving Pipeline


## Running train_pipeline

Run the train_pipeline which will do the following:

1. Load dataset
2. Split into train and test
3. Do Feature Processing
4. Train Model
5. Save model under trained_model directory


# Commit-2-Predict

## Restructuring in directories

This is important when we are packaging and publishing our regression model.You will find the relevance as we go ahead
![](images/dir2.png)

## Utility modules

Our train_pipeline has been modified with restructuring our code using **data_management.py** and **config.py**

1. **Data Management.py**<br>
Lists all the utility functions required in model training.
2. **config.py**<br>
Configuration file with all the parameters

## Addition in the requirements.txt

Install pytest

## Addition of Python Path to Environment Variables

Make sure you add the regression_model parent directory to your Pythonpath

Go to environment variables and add a new User Variable

**Name** - PYTHONPATH <br>
**PATH** - C:\Users\u6yuv\Documents\Courses\dev\ml-package\regression_model;

## Run train_pipeline again

Rerun the training script to ensure the .pkl file is present

For this , we will create a training file as shown below

![](images/run_train_pipeline.png)

![](images/run_train_pipeline_terminal.png)

or add the syspath in the exisiting train_pipeline and run from regression_model parent directory
![](images/run_train_pipeline_parent_code.png)

![](images/run_train_pipeline_parent.png)

## Running tests

We added a test_predict file which will test for a single prediction

![](images/test_predict.png)

## Running tesing

![](images/test1.png)

# Data Validation

## Some more addition into config

config file

![](images/config_update.png)

## modification in pipeline.py

As now we are using config file for interacting with our ml modulde, references to variables used in pipeline has to be passed through config file.

<b>change-<b> 
config.CATEGORICAL_VARS_WITH_NA

![](images/pipeline_update.png)

## preprocessors.py

In the earlier commit version , preprocessors.py was under the regression_model directory but now we are moving it under processing directory.

![](images/shifting_preprocessors.png)

## validation script

Addition of validation script which has the necessary validation rules as our training pipleline.

In our test_data we have NA values which needs to be handled before making predictions. 


![](images/validation.png)

## Additing validation in prediction.py

Adding validation rules as our test data has NA values

![](images/val_predict.png)

## Adding test for multiple rows

![](images/test2_multiple.png)

## train pipeline again

we have changed the directory structure so to make sure everything works correctly , we are running the training pipeline again.

![](images/train_pipeline_again.png)

## Running test for multiple rows

![](images/test_predict3.png)

# Feature Engineering

Until now , in our preprocessors.py file had a mix of both preprocessing operations and feature engineering. We will separate out the feature engineering operations from preprocessing operations to have a clear distinction between the two. It is important for the following reasons:

1. In our current assignment  we are dealing with just one feature engineering step i.e. Log Transformation .However this module can be far more complex than this example for eg:
    * Accessing a database to pick precalclulated features.
    * Third party API call to gather information .eg - Weather
    * A separate model to generate features that will be an input feature to our current model
2. The features can be a very complicated section of your application and it could indeed be imported as a totally separate package with its own versioning.

Hence we will be creating a sperate file **features.py** that will list all the feature engineering work

## Update preprocessor.py 

It will no longer contain Log transformation operation.

## Add features.py

It will contain all the operations related to feature engineering.

![](images/features.png)

## Update pipeline.py

Things to add: 

1. Import features.py file
2. Change the log transformer in the pipeline

![](images/pipeline_update1.png)

## Run training again

![](images/train_pipeline_again1.png)

## Run tests again

![](images/test_predict4.png)

# Versioning and Logging

Logging versioning and logging play a really important role in production machine learning systems.
They're very important for reproducibility because they provide insights and information about for example which inputs went into making a given prediction and the timeframe when certain predictions were made and generally give clues to allow us to track down bugs in our code as well as meeting things like regulatory requirements and then being able to conduct audits on the sorts of machine learning predictions we're making.

## Adding version file

VERSION file - 0.1.0 [MAJOR.MINOR.PATCH]

![](images/version.png)

## Read the version file in __init__

![](images/read_version.png)

## Adding logging config

![](images/logging_config.png)

## Adding logging in multiple files

1. data_management.py

![](images/data_management_log.png)

2. train_pipeline.py

3. pipeline.py 

4. predict.py

## Add errors.py


![](images/errors.png)

## Use custom errors in features.py

![](images/features_custom_error.png)

## Update config.py

![](images/config_update1.png)

## Run train_pipeline again

![](images/train_pipeline_version.png)

## Run test

![](images/test_predict5.png)

# Package building

## requirements.txt

![](images/requirements.png)

## setup.py

![](images/setup.png)

## Manifest.in

![](images/manifest.png)

## Install the requirments

![](images/install_requirements.png)

## Build the source distribution and wheel distribution

![](images/run_setup.png)

## Import the package and test it

![](images/install_library.png)

![](images/package_test.png)

# Serving Our Model-Introduction

## A little about REST API..

### API
An API is an application programming interface. It is a set of rules that allow programs to talk to each other. The developer creates the API on the server and allows the client to talk to it.

### REST

REST determines how the API looks like.It is a set of rules that developers follow when they create their API.One of these rules states that you should be able to get a piece of data (called a **resource**) when you link to a specific URL.

Each URL is called a **request** while the data sent back to you is called a **response**.


**JSON (JavaScript Object Notation)** a common format for sending and requesting data through a REST API. 

### REQUEST

Request is made up of four things:

1. **The endpoint** -The endpoint (or route) is the url you request for.
    1. **path** - determines the resource you’re requesting for. Think of it like an automatic answering machine that asks you to press 1 for a service, press 2 for another service, 3 for yet another service and so on.
2. **The method** -The method is the type of request you send to the server. You can choose from these five types below:
    1. **GET**   - Read a resource on a server
    2. **POST**  - Creates a new resource on a server
    3. **PUT and PATCH**  - Update a resource on a server
    4. **DELETE** - Delete a resource from a server.
3. **The headers**    - Headers are used to provide information to both the client and server.eg authentication and providing information about the body content
4. **The data (or body)** - The data (sometimes called “body” or “message”) contains information you want to be sent to the server

## Advantages of Model Serving via REST API

1. Serve predictions to client on the fly to multiple clients
2. Decouple our model development from the client facing layer
3. Potentially combine multiple models at different API endpoints
4. Scale by adding more instances of the application behind the load balancer

## Testing API skeleton

Adding ml-api folder 
![](images/api_skeleton.png)

### Adding requirements.txt

![](images/api_requirements.png)

Install the required dependencies
![](images/api_requirements_execute.png)

### app.py

Create flask app
![](images/create_flask_app.png)

### controller.py

The api create above interacts through an endpoint.The endpoint is defined in controller.py file

![](images/controller.png)

### run.py

It is the entrypoint to start flask
![](images/api_run.png)

**Tell the flask what is the entrypoint to start the api**

**Run the flask app**

![](images/run_api1.png)



### Test the endpoint

![](images/health_app.png)

# API Config and Logging

Addition of the following:
1. Config file
2. Updation in the app.py and controller.py using config and logging
3. testing module

![](images/api_log_config.png)

## config.py
![](images/api_config.png)

Look for multiple Config objects at the end of the file that will be used to set flask properties while creating app

![](images/api_config_object.png)

## tests/conftest.py

For reusability 
Read about fixtures

![](images/api_fixtures.png)

## test_controller.py

Passing pytest fixtures to test particular endpoint.
![](images/test_fixtures.png)

## Update app.py and controller.py

1. Added the logging mechanism as we did in regression model
2. Configuration of the app now from the config object

![](images/update_app.png)

![](images/controller_update.png)

## Run test

![](images/api_test.png)

# api prediction_endpoint

## Update in controller.py

Addition of predict/regression endpoint that will serve to POST request

Here v1 refers to our one of the version of endpoint for serving prediction and has nothing to do with model or api version.

Here we are leveraging regression model - **make prediction** utility to ensure there is just one copy of utility that is being used everywhere.It also helps to keep our api lightweight as we are taking away all the feature engineering/processing task from api.

![](images/predict_regression.png)

## Update in test_controller.py

We have added **test_prediction_endpoint_returns_prediction** function for our prediction.

Again we are leveraging regression model package to get the testing dataset.

![](images/predict_regression_test.png)

## Test the prediction endpoint

![](images/test_controller.png)

# API Versioning

## Adding the version file

![](images/api_version.png)

## Adding endpoint for version checking- controller.py

![](images/controller_version.png)

## Addign test for version checking -test_controller.py

![](images/test_controller_version.png)

## Executing test for version

![](images/api_version_test.png)

## Checking the api version and model version

![](images/version_check.png)

![](images/version_check_browser.png)

# References

1. [Git CheatSheet](https://www.atlassian.com/git/tutorials/atlassian-git-cheatsheet)
2. [Git course](https://www.pluralsight.com/courses/code-school-git-real)
3. [Working with folks](https://stackoverflow.com/questions/25545613/how-can-i-push-to-my-fork-from-a-clone-of-the-original-repo)
4. [Testing](https://landing.google.com/sre/sre-book/chapters/testing-reliability/)
5. [Trunk based Development](https://trunkbaseddevelopment.com/)
6. Fluent Python
7. The Devops Handbook<BR>
**PACKAGING**    
8. [Python Packaging](https://packaging.python.org/)
9. [Python Versioning](https://packaging.python.org/guides/single-sourcing-package-version/)
10. [ Python Logging](https://docs.python.org/3/library/logging.html)
9. [Python packaging and PyPI](https://www.youtube.com/watch?v=na0hQI5Ep5E)
10. [Setuptools documentation](https://setuptools.readthedocs.io/en/latest/)
11. [Wheel Documentation](https://wheel.readthedocs.io/en/stable/)
12. [Pytest Documentation](https://docs.pytest.org/en/latest/)<BR>
**REST API**
13. [REST API Principles](https://restfulapi.net/rest-architectural-constraints/)
14. [REST API Walkthrough](https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/)
15. [Flask Tutorial](https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world)
16. [Web Frameworks](https://github.com/vinta/awesome-python#web-frameworks)