In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/headbraincsv/headbrain.csv


# In this article, let’s learn how to save and load your machine learning model in Python with scikit-learn in this tutorial.

Once we create a machine learning model, our job doesn’t end there. We can save the model to use in the future. We can either use the pickle or the joblib library for this purpose. The dump method is used to create the model and the load method is used to load and use the dumped model. Now let’s demonstrate how to do it. The save and load methods of both pickle and joblib have the same parameters.

syntax of dump() method:

pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)

parameters:

obj: The pickled Python object.
file: The pickled object will be written to a file or buffer.
fix_imports: When supplied, the method dump() will determine if the pickling procedure should be compatible with Python version 2 or not based on the value for the pickle protocol option. True is the default value. Only a name-value pair should be used with this default parameter.
syntax of load() method:

pickle.load(file, *, fix_imports=True, encoding=’ASCII’, errors=’strict’, buffers=None)

The load() method Returns the rebuilt object hierarchy indicated therein after reading the pickled representation of an object from the open file object file.

In [19]:
# import packages 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression 
from sklearn import metrics 
import joblib 
import pickle

In [10]:
# import the dataset 
dataset = pd.read_csv('/kaggle/input/headbraincsv/headbrain.csv')

In [12]:
dataset.head()

Unnamed: 0,Gender,Age Range,Head Size(cm^3),Brain Weight(grams)
0,1,1,4512,1530
1,1,1,3738,1297
2,1,1,4261,1335
3,1,1,3777,1282
4,1,1,4177,1590


In [13]:
X = dataset.iloc[:, : -1].values 
Y = dataset.iloc[:, -1].values

In [14]:
# train test split 
X_train, X_test, y_train, y_test = train_test_split( 
    X, Y, test_size=0.2, random_state=0) 


In [15]:
# create a linear regression model 
regressor = LinearRegression() 
regressor.fit(X_train, y_train) 

# Example: Saving and loading models using joblib
The SciPy ecosystem includes Joblib, which offers tools for pipelining Python jobs. It offers tools for effectively saving and loading Python objects that employ NumPy data structures. This can be helpful for machine learning algorithms that need to store the complete dataset or have a lot of parameters. let’s look at a simple example where we save and load a linear regression model. The same steps are repeated while using the joblib library.

In [16]:
# save the model 
filename = 'linear_model_2.sav'
joblib.dump(regressor, open(filename, 'wb')) 


The .sav extension is commonly used for saved models, but you can use any extension you prefer.

regressor: This is the trained machine learning model that you want to save.
open(filename, 'wb'): This opens a file with the specified filename in write-binary mode ('wb'). This mode is used because models are saved as binary files.
joblib.dump(): This function serializes (saves) the regressor object to the file opened in the previous step.

In summary, the code saves the trained model (regressor) to a file named linear_model_2.sav using the joblib library. This allows you to store the model on disk and load it later without having to retrain it.

In [17]:
# load the model 
load_model = joblib.load(open(filename, 'rb')) 
  
y_pred = load_model.predict(X_test) 
print('root mean squared error : ', np.sqrt( 
    metrics.mean_squared_error(y_test, y_pred))) 

root mean squared error :  71.23878018173228


# Example: Saving and loading models using pickle
Python’s default method for serializing objects is a pickle. Your machine learning algorithms can be serialized/encoded using the pickling process, and the serialized format can then be saved to a file. When you want to deserialize/decode your model and utilize it to produce new predictions, you can load this file later. The training of a linear regression model is shown in the example that follows. In the below example we fit the data with train data and the dump() method is used to create a  model. The dump method takes in the machine learning model and a file is given. The test data is used to find predictions after loading the model using the load() method. root mean square error metric is used to evaluate the predictions of the model.

In [20]:
# save the model 
filename = 'linear_model.sav'
pickle.dump(regressor, open(filename, 'wb')) 
  
# load the model 
load_model = pickle.load(open(filename, 'rb')) 
  
y_pred = load_model.predict(X_test) 
print('root mean squared error : ', np.sqrt( 
    metrics.mean_squared_error(y_test, y_pred)))

root mean squared error :  71.23878018173228
