1.	Officeworks is a leading retail store in Australia, with numerous outlets around the country. The manager would like to improve the customer experience by providing them online predictive prices for their laptops if they want to sell them. To improve this experience the manager would like us to build a model which is sustainable and accurate enough. Apply Lasso and Ridge Regression model on the dataset and predict the price, given other attributes. Tabulate R squared, RMSE, and correlation values.

Importing requirements :

In [35]:
import pandas as pd
import numpy as np

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import train_test_split

In [2]:
df = pd.read_csv("Computer_Data (1).csv")

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,price,speed,hd,ram,screen,cd,multi,premium,ads,trend
0,1,1499,25,80,4,14,no,no,yes,94,1
1,2,1795,33,85,2,14,no,no,yes,94,1
2,3,1595,25,170,4,15,no,no,yes,94,1
3,4,1849,25,170,8,14,no,no,no,94,1
4,5,3295,33,340,16,14,no,no,yes,94,1


Dropping Unnecessary Columns :

In [4]:
df.drop('Unnamed: 0', axis = 'columns', inplace = True)

Splitting data into dependent and Independent Features :

In [5]:
x = df.drop('price',axis = 1)
y = df['price']

Splitting Data into Categorical and Numerical Data :

In [6]:
cat_col = x.select_dtypes(include = 'object')
num_col = x.select_dtypes(exclude = 'object')

In [7]:
cat_col.head()

Unnamed: 0,cd,multi,premium
0,no,no,yes
1,no,no,yes
2,no,no,yes
3,no,no,no
4,no,no,yes


Working with Categorical Columns:

In [9]:
oh = OneHotEncoder(drop = 'first')

In [12]:
data1 = oh.fit_transform(cat_col).toarray()

In [14]:
data1

array([[0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       ...,
       [1., 0., 1.],
       [1., 0., 1.],
       [1., 0., 1.]])

In [15]:
data1 = pd.DataFrame(data1, columns = oh.get_feature_names_out(cat_col.columns))

In [16]:
data1

Unnamed: 0,cd_yes,multi_yes,premium_yes
0,0.0,0.0,1.0
1,0.0,0.0,1.0
2,0.0,0.0,1.0
3,0.0,0.0,0.0
4,0.0,0.0,1.0
...,...,...,...
6254,0.0,0.0,1.0
6255,1.0,1.0,1.0
6256,1.0,0.0,1.0
6257,1.0,0.0,1.0


Scalling Numerical Features :

In [17]:
sc = StandardScaler()

In [18]:
data2 = sc.fit_transform(num_col)

In [19]:
data2

array([[-1.27675206, -1.30199424, -0.76135926, -0.67259069, -1.7012186 ,
        -1.89588625],
       [-0.89860955, -1.28265396, -1.1165581 , -0.67259069, -1.7012186 ,
        -1.89588625],
       [-1.27675206, -0.95386919, -0.76135926,  0.43232929, -1.7012186 ,
        -1.89588625],
       ...,
       [ 2.26833397,  3.03022861,  2.79062911,  0.43232929, -2.43622475,
         2.42247623],
       [ 2.26833397,  1.67640897,  1.36983376,  0.43232929, -2.43622475,
         2.42247623],
       [ 2.26833397,  1.67640897,  1.36983376,  2.64216924, -2.43622475,
         2.42247623]])

In [36]:
data2 = pd.DataFrame(data2, columns = sc.get_feature_names_out(num_col.columns))

In [37]:
data2

Unnamed: 0,speed,hd,ram,screen,ads,trend
0,-1.276752,-1.301994,-0.761359,-0.672591,-1.701219,-1.895886
1,-0.898610,-1.282654,-1.116558,-0.672591,-1.701219,-1.895886
2,-1.276752,-0.953869,-0.761359,0.432329,-1.701219,-1.895886
3,-1.276752,-0.953869,-0.050962,-0.672591,-1.701219,-1.895886
4,-0.898610,-0.296300,1.369834,-0.672591,-1.701219,-1.895886
...,...,...,...,...,...,...
6254,2.268334,0.430895,-0.050962,0.432329,-2.436225,2.422476
6255,0.661228,1.676409,1.369834,0.432329,-2.436225,2.422476
6256,2.268334,3.030229,2.790629,0.432329,-2.436225,2.422476
6257,2.268334,1.676409,1.369834,0.432329,-2.436225,2.422476


Concatinating processed Data:

In [38]:
X = pd.concat([data1, data2], axis = 'columns')

In [39]:
X.head()

Unnamed: 0,cd_yes,multi_yes,premium_yes,speed,hd,ram,screen,ads,trend
0,0.0,0.0,1.0,-1.276752,-1.301994,-0.761359,-0.672591,-1.701219,-1.895886
1,0.0,0.0,1.0,-0.89861,-1.282654,-1.116558,-0.672591,-1.701219,-1.895886
2,0.0,0.0,1.0,-1.276752,-0.953869,-0.761359,0.432329,-1.701219,-1.895886
3,0.0,0.0,0.0,-1.276752,-0.953869,-0.050962,-0.672591,-1.701219,-1.895886
4,0.0,0.0,1.0,-0.89861,-0.2963,1.369834,-0.672591,-1.701219,-1.895886


Splitting Data for Training and Testing of Modell :

In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

In [26]:
def model_eval(true, pred):
    mse = mean_squared_error(true, pred)
    mae = mean_absolute_error(true, pred)
    rmse = np.sqrt(mse)
    r2 = r2_score(true, pred)
    return mse, mae, rmse, r2

In [51]:
models = {
    'Lasso Model':Lasso(),
    'Ridge Model' : Ridge()
}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train, y_train)
    pred = model.predict(X_test)

    # Evaluating the models :
    mse, mae, rmse, r2 = model_eval(y_test, pred)

    print('-------------------------------------------------\n')
    print("{} Model Has :".format(list(models.keys())[i]))
    print('Mean Squared Error : {}'.format(mse))
    print('Mean Absolute Errro {} :'.format(mae))
    print('Root Means Squared Error {}'.format(rmse))
    print('R2 Score : {} \n'.format(r2))

    print('---------------------------------------------------\n')

-------------------------------------------------

Lasso Model Model Has :
Mean Squared Error : 81123.37729107836
Mean Absolute Errro 204.79968171284156 :
Root Means Squared Error 284.8216587464485
R2 Score : 0.7830831089594806 

---------------------------------------------------

-------------------------------------------------

Ridge Model Model Has :
Mean Squared Error : 81044.29040740924
Mean Absolute Errro 205.08261687550888 :
Root Means Squared Error 284.6827890958799
R2 Score : 0.7832945804427996 

---------------------------------------------------

