# Intel® Extension for Scikit-learn Linear Regression for Concrete dataset

In [1]:
from timeit import default_timer as timer
from sklearn import metrics
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import requests
import warnings
import os
from IPython.display import HTML
warnings.filterwarnings('ignore')

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


### About Data

Dataset : Concrete Compressive Strength

Domain : Material manufacturing

Description: The actual concrete compressive strength (MPa) for a given mixture under a specific age (days) was determined from laboratory. Data is in raw form (not scaled).The data has 8 quantitative input variables, and 1 quantitative output variable, and 5407 instances (observations).

Objective : Modeling of strength of high performance concrete using Machine Learning

#### Dataset Link
https://www.kaggle.com/code/pritech/predicting-the-strength-of-concrete/input

### Download the data

In [2]:
data = pd.read_csv('concrete dataset(train).csv')

In [3]:
data.head(5)

Unnamed: 0,id,CementComponent,BlastFurnaceSlag,FlyAshComponent,WaterComponent,SuperplasticizerComponent,CoarseAggregateComponent,FineAggregateComponent,AgeInDays,Strength
0,0,525.0,0.0,0.0,186.0,0.0,1125.0,613.0,3,10.38
1,1,143.0,169.0,143.0,191.0,8.0,967.0,643.0,28,23.52
2,2,289.0,134.7,0.0,185.7,0.0,1075.0,795.3,28,36.96
3,3,304.0,76.0,0.0,228.0,0.0,932.0,670.0,365,39.05
4,4,157.0,236.0,0.0,192.0,0.0,935.4,781.2,90,74.19


In [4]:
data.columns

Index(['id', 'CementComponent', 'BlastFurnaceSlag', 'FlyAshComponent',
       'WaterComponent', 'SuperplasticizerComponent',
       'CoarseAggregateComponent', 'FineAggregateComponent', 'AgeInDays',
       'Strength'],
      dtype='object')

In [5]:
data.shape

(5407, 10)

In [6]:
x = data[['CementComponent', 'BlastFurnaceSlag', 'FlyAshComponent',
       'WaterComponent', 'SuperplasticizerComponent',
       'CoarseAggregateComponent', 'FineAggregateComponent', 'AgeInDays']]
y = data['Strength']

Split the data into train and test sets

In [7]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((4866, 8), (541, 8), (4866,), (541,))

### Normalize the data

In [8]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
scaler_x = MinMaxScaler()
scaler_y = StandardScaler()

In [9]:
scaler_x.fit(x_train)
x_train = scaler_x.transform(x_train)
x_test = scaler_x.transform(x_test)


### Patch original Scikit-learn with Intel® Extension for Scikit-learn
Intel® Extension for Scikit-learn (previously known as daal4py) contains drop-in replacement functionality for the stock Scikit-learn package. You can take advantage of the performance optimizations of Intel® Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

In [10]:
from sklearnex import patch_sklearn
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


Intel® Extension for Scikit-learn patching affects performance of specific Scikit-learn functionality. Refer to the [list of supported algorithms and parameters](https://intel.github.io/scikit-learn-intelex/algorithms.html) for details. In cases when unsupported parameters are used, the package fallbacks into original Scikit-learn. If the patching does not cover your scenarios, [submit an issue on GitHub](https://github.com/intel/scikit-learn-intelex/issues).

Training of the Lasso algorithm with Intel® Extension for Scikit-learn for YearPredictionMSD dataset

In [11]:
from sklearn.linear_model import LinearRegression

params = { 
    "positive": True,
    "fit_intercept":True
}
start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

'Intel® extension for Scikit-learn time: 0.01 s'

Predict and get a result of the Lasso algorithm with Intel® Extension for Scikit-learn

In [12]:
y_predict = model.predict(x_test)
mse_metric_opt = metrics.mean_squared_error(y_test, y_predict)
f'Patched Scikit-learn MSE: {mse_metric_opt}'

'Patched Scikit-learn MSE: 208.1449035991619'

### Train the same algorithm with original Scikit-learn
In order to cancel optimizations, we use *unpatch_sklearn* and reimport the class Lasso

In [13]:
from sklearnex import unpatch_sklearn
unpatch_sklearn()

Training of the Lasso algorithm with original Scikit-learn library for YearPredictionMSD dataset

In [14]:
from sklearn.linear_model import LinearRegression

start = timer()
model = LinearRegression(**params).fit(x_train, y_train)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 0.01 s'

Predict and get a result of the Lasso algorithm with original Scikit-learn

In [15]:
y_predict = model.predict(x_test)
mse_metric_original = metrics.mean_squared_error(y_test, y_predict)
f'Original Scikit-learn MSE: {mse_metric_original}'

'Original Scikit-learn MSE: 208.1449035991619'

In [16]:
HTML(f"<h3>Compare MSE metric of patched Scikit-learn and original</h3>"
     f"MSE metric of patched Scikit-learn: {mse_metric_opt} <br>"
     f"MSE metric of unpatched Scikit-learn: {mse_metric_original} <br>"
     f"Metrics ratio: {mse_metric_opt/mse_metric_original} <br>"
     f"<h3>With Scikit-learn-intelex patching you can:</h3>"
     f"<ul>"
     f"<li>Use your Scikit-learn code for training and prediction with minimal changes (a couple of lines of code);</li>"
     f"<li>Fast execution training and prediction of Scikit-learn models;</li>"
     f"<li>Get the similar quality</li>"
     f"<li>Get speedup in <strong>{(train_unpatched/train_patched):.1f}</strong> times.</li>"
     f"</ul>")