## Machine Learning Tutorial 5: Save And Load Trained Model

Training machine learning models can be quite time consuming, especially if the training dataset is very big. It is possible to instead train your model beforehand and save it as a file so you can just load that model in its trained state.

`Pickle` and `sklearn.joblib` modules can be used for this purpose. Joblib seems to be more efficient with big numpy arrays hence it is preferred when you have many numpy objects involved in your training step.

# Without saving
<img src="img/model-step.png" alt="Model Step" width="500"/>

# With saving
<img src="img/saved-model-step.png" alt="Saved Model Step" width="500"/>

In [3]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [4]:
df = pd.read_csv("C:\\Users\\Vaishob\\Downloads\\homeprices.csv")
df.head()

Unnamed: 0,area,price
0,2600,550000
1,3000,565000
2,3200,610000
3,3600,680000
4,4000,725000


In [5]:
model = linear_model.LinearRegression()
model.fit(df[['area']], df.price)

In [6]:
model.coef_

array([135.78767123])

In [7]:
model.intercept_

180616.43835616432

In [8]:
model.predict([[5000]])



array([859554.79452055])

# Approaches to Saving Model to File:
### **Approach 1: Using Python Pickle**

The `pickle` module is used for serializing and deserializing Python objects, meaning it can convert Python objects into a byte stream (which can be saved to a file or transmitted over a network) and then later reconstruct the object from the byte stream.

* **Picklng**: The model is serialized and saved to a file in binary format using `pickle.dump()`
* **Unpickling**: The model is deserialized from the file and restored to its original state using `pickle.load()`

In [9]:
import pickle

In [10]:
# `wb` means "write" & "binary"
with open('model_pickle','wb') as file:
    pickle.dump(model,file)

If the above code runs correctly, you should see the model file appear on your current directory (in binary format)

**Check if your model loads**

In [11]:
# `rb` means "read" & "binary"
with open('model_pickle','rb') as file:
    mp = pickle.load(file)

In [12]:
mp.coef_

array([135.78767123])

In [13]:
mp.intercept_

180616.43835616432

In [14]:
mp.predict([[5000]])



array([859554.79452055])

Now, I can share this saved model file with anyone else, allowing them to load it on their local machine and use it to make predictions just like I did. The predictions they get should be accurate and consistent with mine, since the model's parameters are preserved exactly as they were when it was trained.

### Approach 2: Using sklearn joblib

The `joblib` library is particularly useful for saving and loading large datasets, machine learning models, and other objects, especially when dealing with large numpy arrays or when speed is a concern.

* **Saving Models**: `joblib.dump()` is used to save models or other large objects to a file.
* **Loading Models**: `joblib.load()` is used to laod models or objects from a file

In [15]:
import joblib

In [16]:
joblib.dump(model, 'model_joblib')

['model_joblib']

If the above code runs correctly, you should see the model file appear on your current directory

**Check if your model loads**

In [17]:
mj = joblib.load('model_joblib')

In [18]:
mj.coef_

array([135.78767123])

In [19]:
mj.intercept_

180616.43835616432

In [20]:
mj.predict([[5000]])



array([859554.79452055])

***Note:*** In this tutorial, we looked at saving models using `joblib` and `pickle` modules. We did this here using linear regression, but the same way can be used for saving any other machine learning model.