<a href="https://colab.research.google.com/github/wahyunh10/Demand-Forecasting-Public-Bike-Rental-Project/blob/main/Regression_Model_Normalization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Regression Model: Normalization Technique**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
df = pd.read_csv("cleanprocessed_data.csv", index_col='datetime', parse_dates=True)
df.head()

Unnamed: 0_level_0,month,holiday,workingday,temp,humidity,windspeed,rentals,hour_0,hour_1,hour_2,...,hour_22,hour_23,weather_1,weather_2,weather_3,weather_4,season_1,season_2,season_3,season_4
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-01 00:00:00,1,0,0,0.22449,0.81,0.0,16,1,0,0,...,0,0,1,0,0,0,1,0,0,0
2011-01-01 01:00:00,1,0,0,0.204082,0.8,0.0,40,0,1,0,...,0,0,1,0,0,0,1,0,0,0
2011-01-01 02:00:00,1,0,0,0.204082,0.8,0.0,32,0,0,1,...,0,0,1,0,0,0,1,0,0,0
2011-01-01 03:00:00,1,0,0,0.22449,0.75,0.0,13,0,0,0,...,0,0,1,0,0,0,1,0,0,0
2011-01-01 04:00:00,1,0,0,0.22449,0.75,0.0,1,0,0,0,...,0,0,1,0,0,0,1,0,0,0


In [3]:
df.columns

Index(['month', 'holiday', 'workingday', 'temp', 'humidity', 'windspeed',
       'rentals', 'hour_0', 'hour_1', 'hour_2', 'hour_3', 'hour_4', 'hour_5',
       'hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10', 'hour_11', 'hour_12',
       'hour_13', 'hour_14', 'hour_15', 'hour_16', 'hour_17', 'hour_18',
       'hour_19', 'hour_20', 'hour_21', 'hour_22', 'hour_23', 'weather_1',
       'weather_2', 'weather_3', 'weather_4', 'season_1', 'season_2',
       'season_3', 'season_4'],
      dtype='object')

In [4]:
X = df[['temp','hour_0', 'hour_1', 'hour_2', 'hour_3', 'hour_4', 'hour_5', 'hour_6',
       'hour_7', 'hour_8', 'hour_9', 'hour_10', 'hour_11', 'hour_12',
       'hour_13', 'hour_14', 'hour_15', 'hour_16', 'hour_17', 'hour_18',
       'hour_19', 'hour_20', 'hour_21', 'hour_22', 'hour_23','weather_1',
       'weather_2', 'weather_3', 'weather_4', 'season_1', 'season_2',
       'season_3', 'season_4']]

In [5]:
X.columns

Index(['temp', 'hour_0', 'hour_1', 'hour_2', 'hour_3', 'hour_4', 'hour_5',
       'hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10', 'hour_11', 'hour_12',
       'hour_13', 'hour_14', 'hour_15', 'hour_16', 'hour_17', 'hour_18',
       'hour_19', 'hour_20', 'hour_21', 'hour_22', 'hour_23', 'weather_1',
       'weather_2', 'weather_3', 'weather_4', 'season_1', 'season_2',
       'season_3', 'season_4'],
      dtype='object')

In [6]:
y = df[['rentals']]

# **Create Training & Testing Data Set**

In [7]:
from sklearn import metrics
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state=100)

In [9]:
from sklearn.linear_model import Ridge

# **More Advance Performance Evaluation Method**

**k-Fold Cross Validation**

In [10]:
from sklearn.model_selection import cross_val_score

In [11]:
maximum_accuracy=0
alpha=0
#for i in [float(j) for j in range(0, 50, 0.1)]:
for i in range(0, 50, 1):  
    lm = Ridge(alpha = i)
    accuracy = cross_val_score(estimator = lm, X = X_train, y = y_train, cv = 10)
    if accuracy.mean()>maximum_accuracy:
        maximum_accuracy=accuracy.mean()
        alpha=i

**Optimal Value of α**

In [12]:
print(alpha)

1


**Building Ridge Model to Predict**

In [13]:
lm = Ridge(alpha = alpha)
lm.fit(X_train, y_train)

Ridge(alpha=1)

**After Determining the best value of alpha i.e. the parameter for Ridge Model & building the prediction model we do the predictions using the X_Test set**

In [14]:
predictions = lm.predict(X_test)
predictions

array([[108.95726345],
       [358.20224422],
       [ 10.11578867],
       ...,
       [ 30.59837321],
       [430.08847263],
       [ 77.12618212]])

**MAE**

In [15]:
metrics.mean_absolute_error(y_test,predictions)

70.7592685262279

**MSE**

In [16]:
metrics.mean_squared_error(y_test,predictions)

9007.103697380724

**RMSE**

In [17]:
np.sqrt(metrics.mean_squared_error(y_test,predictions))

94.90576219271792

**Predictions using RidgeCV Model (Inbuilt Cross Validation Library)**

In [18]:
from sklearn.linear_model import RidgeCV
step1=0.1
clf = RidgeCV(alphas=np.arange(0.1,100,step=step1)).fit(X_train, y_train)
clf.score(X_train, y_train)

0.6255547963095036

In [19]:
clf.alpha_

0.8

In [20]:
y_pred=clf.predict(X_test)
y_pred

array([[108.90165834],
       [358.31290557],
       [ 10.02014204],
       ...,
       [ 30.47405766],
       [430.23963085],
       [ 77.03230806]])

In [21]:
metrics.mean_absolute_error(y_test,y_pred)

70.76006383663305

In [22]:
metrics.mean_squared_error(y_test,y_pred)

9007.224450054511

In [23]:
np.sqrt(metrics.mean_squared_error(y_test,y_pred))

94.90639836204149