# Serving your model exercise. Part 1 - Flask

## Intro
Reminder, there will usually be 3 different places where the code relevant to our model prediction runs:
1. **Training computer / server** - where we train our model and save it
2. **Inference server** - server that listens to REST API requests to make predictions / inferences with the model that was trained on the model server. Potentially, we could have many such servers. 
3. **Client** - client application (browser, mobile app etc.) that needs a prediction, and requests from **inference server** over HTTP with REST API to make the prediction

There are 2 directions for **sending data**:
1. Data sent via a REST API from the **client** to the **Inference server**
1. Prediction result that's sent from the **Inference server** back to the **client**

There are 2 ways to send the data from **client** to the **Inference server**:
1. As parameters of the URL
1. As a body of the HTTP request

Prediction that's returned from the **Inference server** to the **client** will always be in the body of the HTTP (limitation of the HTTP protocol).  However, it could be in different formats - regular string, HTML page, or a JSON file.

In this exercise we will learn about 2 combinations of the above:
1. Sending data as **parameters of the request** (as part of the URL) and receiving data as a **regular string** - useful for small and short data as inputs to the model, and when the prediction is short / simple.  We will use it to implement a **single prediction API**.
1. Sending data in the HTTP body as a **JSON file**, and getting back the prediction as a **JSON file** - more relevant for when the data has many features / complicated features, and when the prediction response is itself slightly longer / more complicated.  We will use it to implement **multiple predictions API**

Of course, there is no connection between the format of data sent to the server, and received back from the server, so you could have other variations.

## 1. Getting to a trained model
- Choose one of the models you trained in one the previous exercises or any other model. **Do not take something from many Flask examples online!**  **For easy debugging** - It's better to use a model with a small number of features and where the feature values are not long arrays (you can also take a small subsete of features of existing model).
- Specify where can one download the dataset from (to be used during checking the exercise)
- Say in one word what is the business problem and what you are predicting 
- Preprocess, split to train and test dataset
- Train the model - how well your models predicts (accuracy / $R^2$) is not of big importance here
- Do a few predictions of the model locally

This code will run on the **training server**

In [None]:
# your code here

## 2. Save you model, predict with saved model

Simulate in this notebook code that will happen during training on the **training server**:
- Using `pickle`, save your model to disk. Reference: https://scikit-learn.org/stable/modules/model_persistence.html
- Save the test dataset to file.  What's a good format(s) for saving datasets?

In [None]:
# your code here

Simulate in this notebook code that will happen during inference on the **inference server**:
- Load the model again with `pickle`.
- Read the test dataset file, and perform some predictions
- Compare the predictions received before saving the model, and after reading a saved model.  Show that you get the same results. 

In [None]:
# your code here

## 3. Serve your model - using URL parameters

Now we are done with **training server**, since we have the saved model.  From now all that's relevant is **inference server** and **client** code.

Let's create the **inference server** that answers to REST APIs with predictions:

- Using `flask`, create a Pycharm project and implement the following prediction API:
- **Single prediction API** that receives inputs as parameters (no body), and returns a single prediction as a string / text.  
- Example: http://localhost:5000/predict_single?key1=value1&key2=value2 (replace `key1`, `value1` etc. names with your relevant feature names and values) that would return the class label (example: `0` / `1`)
- **Important:** For efficiency purposes, consider what's the best place in your code to put the code that reads the model.  Why?
- **Important:** In general, take runtime efficiency into account.  Your API might be called large number of times per second, and you will be paying for more inference servers if your code is not efficient.
- Copy your **inference server** code also here for reference

In [None]:
# your code here

## 4. Consume your model with python

#### Simulate client requests for inference / prediction:
Assume your client runs Python code also, and not only your training and inference servers (in real case scenario, often times your client code will actually not be in Python).
Use Python `requests` module from here to request a prediction by the client from to the inference .  To pass parameters with Python `requests` module, use the `params` parameter of `requests.get` API.

**Print input and output of the prediction.**

**Warning**: don't get used to seeing it in a Jupyter notebook.  This code will usually run inside a **client application**

In [None]:
# your code here

## 5. Serve your model - using JSON files

- Using `flask`, add code to your previous file in Pycharm **with inference code** to create the following prediction API (in addition to **Single prediction API** done above):
- **Multiple prediction API** that receives input many observations to predict on as a json file in the body, and returns a json file with predictions.
- Your **JSON** file format has to be efficient, clear and following JSON file syntax: 
  - JSON file is a nested structure of potentially multiple dictionaries and lists 
  - JSON file tip: Use lists, every member in the list can be a dictionary of all the features.  
  - JSON file tip: Do not put indexes of predictions into the JSON files, indexes of predictions can be easily computed with Python code later 
  - JSON files are sometimes slightly verbose, but are extremely human readable.  Just looking at your JSON files of input and output, is it possible to understand what were the observations in input and what were the predictions in output?
  - See https://www.json.org/json-en.html for JSON format
- Think about efficiency of your code - your REST API might be called a huge number of times, with a huge number of observations every time.  Can part of the code be done only once?  Can you predict on everything together? Can you do less or cheaper data conversions?
- Example of URL that will be used to predict: http://localhost:5000/car-price
- Reference for working with JSONs in Flask: https://pythonise.com/series/learning-flask/working-with-json-in-flask
- Do you need a GET or a POST type of REST API call? Does it change what you did in step 3?  Conceptually, would you say it makes sense to use GET or POST types for predictions?
- Copy your **inference server** code also here for reference 

In [2]:
# your code here

Use Python `requests` module from here to make a prediction, and **print the input, and the output** of the prediction (or part of it if it's too large).  

**Hint:** to pass a JSON file to the `requests` module, use `json` parameter of the `requests.post` API.

**Warning**: don't get used to seeing it in a Jupyter notebook.  This code will usually run inside a **client application**

In [2]:
# your code here

## 6. Submit a zip file with:
1. This notebook
2. Your Python inference server file