# Multiple Linear Regression

- Here are some notations:
(Source: Lecture Slides of DeepLearning.AI's MLS on Coursera)

![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

- We make use of vectorization to speed up code execution.

## Gradient Descent

![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

- Feature Scaling the features makes Gradient Descent converge faster.

> **NOTE:** For simplicity, I wont be splitting dataset into training and test set. I shall use the entire dataset for training and evaluation since the goal of this notebook is to just implement Multiple Linear Regression.

In [1]:
# load libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# load dataset
dataset = pd.read_csv('startups_data.csv')
dataset.head()

Unnamed: 0,R&D Spend,Administration,Marketing Spend,State,Profit
0,165349.2,136897.8,471784.1,New York,192261.83
1,162597.7,151377.59,443898.53,California,191792.06
2,153441.51,101145.55,407934.54,Florida,191050.39
3,144372.41,118671.85,383199.62,New York,182901.99
4,142107.34,91391.77,366168.42,Florida,166187.94


In [3]:
# get features & target vectors
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [4]:
print(type(X))
print(X.shape)
print(X)

<class 'numpy.ndarray'>
(50, 4)
[[165349.2 136897.8 471784.1 'New York']
 [162597.7 151377.59 443898.53 'California']
 [153441.51 101145.55 407934.54 'Florida']
 [144372.41 118671.85 383199.62 'New York']
 [142107.34 91391.77 366168.42 'Florida']
 [131876.9 99814.71 362861.36 'New York']
 [134615.46 147198.87 127716.82 'California']
 [130298.13 145530.06 323876.68 'Florida']
 [120542.52 148718.95 311613.29 'New York']
 [123334.88 108679.17 304981.62 'California']
 [101913.08 110594.11 229160.95 'Florida']
 [100671.96 91790.61 249744.55 'California']
 [93863.75 127320.38 249839.44 'Florida']
 [91992.39 135495.07 252664.93 'California']
 [119943.24 156547.42 256512.92 'Florida']
 [114523.61 122616.84 261776.23 'New York']
 [78013.11 121597.55 264346.06 'California']
 [94657.16 145077.58 282574.31 'New York']
 [91749.16 114175.79 294919.57 'Florida']
 [86419.7 153514.11 0.0 'New York']
 [76253.86 113867.3 298664.47 'California']
 [78389.47 153773.43 299737.29 'New York']
 [73994.56 122782

In [5]:
print(type(y))
print(y.shape)
print(y)

<class 'numpy.ndarray'>
(50,)
[192261.83 191792.06 191050.39 182901.99 166187.94 156991.12 156122.51
 155752.6  152211.77 149759.96 146121.95 144259.4  141585.52 134307.35
 132602.65 129917.04 126992.93 125370.37 124266.9  122776.86 118474.03
 111313.02 110352.25 108733.99 108552.04 107404.34 105733.54 105008.31
 103282.38 101004.64  99937.59  97483.56  97427.84  96778.92  96712.8
  96479.51  90708.19  89949.14  81229.06  81005.76  78239.91  77798.83
  71498.49  69758.98  65200.33  64926.08  49490.75  42559.73  35673.41
  14681.4 ]


In [6]:
# Encode 'State' column (categorical variable)
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(
    transformers = [('encoder', OneHotEncoder(), [3])],
    remainder = 'passthrough'
)

X = ct.fit_transform(X)
print(X)

[[0.0 0.0 1.0 165349.2 136897.8 471784.1]
 [1.0 0.0 0.0 162597.7 151377.59 443898.53]
 [0.0 1.0 0.0 153441.51 101145.55 407934.54]
 [0.0 0.0 1.0 144372.41 118671.85 383199.62]
 [0.0 1.0 0.0 142107.34 91391.77 366168.42]
 [0.0 0.0 1.0 131876.9 99814.71 362861.36]
 [1.0 0.0 0.0 134615.46 147198.87 127716.82]
 [0.0 1.0 0.0 130298.13 145530.06 323876.68]
 [0.0 0.0 1.0 120542.52 148718.95 311613.29]
 [1.0 0.0 0.0 123334.88 108679.17 304981.62]
 [0.0 1.0 0.0 101913.08 110594.11 229160.95]
 [1.0 0.0 0.0 100671.96 91790.61 249744.55]
 [0.0 1.0 0.0 93863.75 127320.38 249839.44]
 [1.0 0.0 0.0 91992.39 135495.07 252664.93]
 [0.0 1.0 0.0 119943.24 156547.42 256512.92]
 [0.0 0.0 1.0 114523.61 122616.84 261776.23]
 [1.0 0.0 0.0 78013.11 121597.55 264346.06]
 [0.0 0.0 1.0 94657.16 145077.58 282574.31]
 [0.0 1.0 0.0 91749.16 114175.79 294919.57]
 [0.0 0.0 1.0 86419.7 153514.11 0.0]
 [1.0 0.0 0.0 76253.86 113867.3 298664.47]
 [0.0 0.0 1.0 78389.47 153773.43 299737.29]
 [0.0 1.0 0.0 73994.56 122782.75 3

In [10]:
# feature scaling, this will help in gradient descent
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X[:, 3:] = scaler.fit_transform(X[:, 3:])

In [11]:
# testing z-score normalization
for i in range(3, X.shape[1]):
    print(f'Column {i}: Mean = {X[:, i].mean()}; Std = {X[:, i].std()}; Min = {X[:, i].min()}; Max = {X[:, i].max()}')

Column 3: Mean = 3.1974423109204507e-16; Std = 1.0; Min = -1.6223620208201208; Max = 2.016411493158463
Column 4: Mean = 1.783573289060314e-15; Std = 1.0; Min = -2.525994015794806; Max = 2.210140504801784
Column 5: Mean = -5.3734794391857575e-16; Std = 1.0; Min = -1.743126975815244; Max = 2.1539430885717437


In [12]:
# once look at the transformed dataset
print(X)

[[0.0 0.0 1.0 2.016411493158463 0.5607529145307771 2.1539430885717437]
 [1.0 0.0 0.0 1.9558603364325031 1.0828065830760836 1.923600395642144]
 [0.0 1.0 0.0 1.7543637361407838 -0.7282570276886139 1.6265276693147557]
 [0.0 0.0 1.0 1.5547836905426 -0.09636463069766295 1.4222102362410107]
 [0.0 1.0 0.0 1.5049372036935105 -1.0799193536742695 1.2815277086174903]
 [0.0 0.0 1.0 1.2798000145910104 -0.7762390705391548 1.2542104579362325]
 [1.0 0.0 0.0 1.3400664059278975 0.9321472084702611 -0.6881499302965387]
 [0.0 1.0 0.0 1.2450566565302281 0.8719800111141467 0.932185978099574]
 [0.0 0.0 1.0 1.03036886075798 0.9869521013922136 0.8308869091888099]
 [1.0 0.0 0.0 1.091819207112805 -0.45664024606220305 0.7761074398639786]
 [0.0 1.0 0.0 0.6203982479442911 -0.3875990892467969 0.14980726713928416]
 [1.0 0.0 0.0 0.5930854179840066 -1.0655395950477557 0.3198336232146922]
 [0.0 1.0 0.0 0.4432598716517343 0.21544906370150557 0.320617441445128]
 [1.0 0.0 0.0 0.40207760283447386 0.5101789529969049 0.3439567

In [13]:
print(y)

[192261.83 191792.06 191050.39 182901.99 166187.94 156991.12 156122.51
 155752.6  152211.77 149759.96 146121.95 144259.4  141585.52 134307.35
 132602.65 129917.04 126992.93 125370.37 124266.9  122776.86 118474.03
 111313.02 110352.25 108733.99 108552.04 107404.34 105733.54 105008.31
 103282.38 101004.64  99937.59  97483.56  97427.84  96778.92  96712.8
  96479.51  90708.19  89949.14  81229.06  81005.76  78239.91  77798.83
  71498.49  69758.98  65200.33  64926.08  49490.75  42559.73  35673.41
  14681.4 ]


In [17]:
class MultipleLinearRegression():
    
    def __init__(self):
        self.w = np.array([0]) # will be changed later
        self.b = 0
        self.n = 0 # number of features
    
    def predict(self, x_new):
        '''
        Input: x_new -> 1D Numpy array of a single observation.
        '''
        f = np.dot(self.w, x_new) + self.b
        return f
    
    def predict_multiple(self, x_new):
        '''
        Input: x_new -> 2D Numpy array of multiple observations, with n columns.
        '''
        y_pred = list(map(lambda x: self.predict(x), x_new))
        return np.array(y_pred)
    
    def compute_cost_function(self, X, y):
        m = len(X)
        squared_error = 0
        for i in range(m):
            x_i_vector = X[i]
            y_actual_i = y[i]
            y_hat_i = self.predict(x_i_vector)
            error = y_hat_i - y_actual_i
            squared_error += (error ** 2)
        squared_error /= (2 * m)
        return squared_error
    
    def compute_gradient(self, X, y):
        m = len(X)
        dj_dw_vector = np.zeros(self.n)
        dj_db = 0
        for j in range(self.n):
            dj_dw_j = 0
            for i in range(m):
                x_i_vector = X[i]
                x_i_j = X[i][j]
                y_actual_i = y[i]
                y_hat_i = self.predict(x_i_vector)
                error = (y_hat_i - y_actual_i)
                dj_dw_j += (error * x_i_j)
                dj_db += error
            dj_dw_vector[j] = dj_dw_j
        dj_dw_vector /= m
        dj_db /= m
        return dj_dw_vector, dj_db
    
    def run_gradient_descent(self, X, y, alpha = 0.001, num_itr = int(1e6), threshold = 0.001):
        '''
        Input: X -> 2D numpy array of shape (m, n) representing input features.
        Input: y -> 1D numpy array of shape (m) representing target.
        '''
        m = X.shape[0]
        # initialise random values for parameters
        self.w = np.zeros(self.n)
        self.b = 0
        prev_w = np.zeros(self.n)
        prev_b = 0
        for i in range(num_itr):
            dj_dw_vector, dj_db = self.compute_gradient(X, y)
            self.w = self.w - (alpha * dj_dw_vector)
            self.b = self.b - (alpha * dj_db)
            squared_error = self.compute_cost_function(X, y)
            print(f'Iteration {i + 1}: Cost Function Value = {squared_error}')
#             if i == 0 or (i + 1) % 1000 == 0:
#                 print(f'Iteration {i + 1}: Cost Function Value = {squared_error}')
            if i == num_itr - 1:
                print('Final Cost:', squared_error)
            if (abs(prev_w - self.w) <= threshold).all() or abs(prev_b - self.b) <= threshold:
                print('Terminating Gradient Descent because parameters are almost not changing')
                print('Final Cost:', squared_error)
                break
            prev_w, prev_b = self.w, self.b
        print(f'Learned Parameters: w = {self.w}; b = {self.b}')
                
    def fit(self, X, y):
        self.n = X.shape[1]
        self.run_gradient_descent(X, y)

In [18]:
model = MultipleLinearRegression()
model.fit(X, y)

Iteration 1: Cost Function Value = 6987786760.305285
Iteration 2: Cost Function Value = 6907118370.161627
Iteration 3: Cost Function Value = 6827446046.32355
Iteration 4: Cost Function Value = 6748757289.861691
Iteration 5: Cost Function Value = 6671039759.395163
Iteration 6: Cost Function Value = 6594281269.103106
Iteration 7: Cost Function Value = 6518469786.761467
Iteration 8: Cost Function Value = 6443593431.804399
Iteration 9: Cost Function Value = 6369640473.410288
Iteration 10: Cost Function Value = 6296599328.611858
Iteration 11: Cost Function Value = 6224458560.430146
Iteration 12: Cost Function Value = 6153206876.032077
Iteration 13: Cost Function Value = 6082833124.911242
Iteration 14: Cost Function Value = 6013326297.091701
Iteration 15: Cost Function Value = 5944675521.354413
Iteration 16: Cost Function Value = 5876870063.486106
Iteration 17: Cost Function Value = 5809899324.550206
Iteration 18: Cost Function Value = 5743752839.179641
Iteration 19: Cost Function Value = 56

Iteration 415: Cost Function Value = 284783299.62176764
Iteration 416: Cost Function Value = 283788985.1751414
Iteration 417: Cost Function Value = 282801797.44898444
Iteration 418: Cost Function Value = 281821664.6890783
Iteration 419: Cost Function Value = 280848515.98326147
Iteration 420: Cost Function Value = 279882281.251025
Iteration 421: Cost Function Value = 278922891.2332346
Iteration 422: Cost Function Value = 277970277.481983
Iteration 423: Cost Function Value = 277024372.35057235
Iteration 424: Cost Function Value = 276085108.9836173
Iteration 425: Cost Function Value = 275152421.3072768
Iteration 426: Cost Function Value = 274226244.019606
Iteration 427: Cost Function Value = 273306512.5810301
Iteration 428: Cost Function Value = 272393163.20493674
Iteration 429: Cost Function Value = 271486132.84838825
Iteration 430: Cost Function Value = 270585359.2029488
Iteration 431: Cost Function Value = 269690780.6856267
Iteration 432: Cost Function Value = 268802336.4299308
Iterati

Iteration 820: Cost Function Value = 122114323.31695136
Iteration 821: Cost Function Value = 121956725.06059319
Iteration 822: Cost Function Value = 121799640.95146048
Iteration 823: Cost Function Value = 121643068.93238583
Iteration 824: Cost Function Value = 121487006.95688199
Iteration 825: Cost Function Value = 121331452.98906061
Iteration 826: Cost Function Value = 121176405.00355229
Iteration 827: Cost Function Value = 121021860.9854273
Iteration 828: Cost Function Value = 120867818.93011667
Iteration 829: Cost Function Value = 120714276.84333514
Iteration 830: Cost Function Value = 120561232.74100338
Iteration 831: Cost Function Value = 120408684.64917284
Iteration 832: Cost Function Value = 120256630.6039495
Iteration 833: Cost Function Value = 120105068.6514199
Iteration 834: Cost Function Value = 119953996.84757696
Iteration 835: Cost Function Value = 119803413.25824699
Iteration 836: Cost Function Value = 119653315.95901729
Iteration 837: Cost Function Value = 119503703.0351

Iteration 1100: Cost Function Value = 92537669.33786003
Iteration 1101: Cost Function Value = 92469408.53518371
Iteration 1102: Cost Function Value = 92401333.42875221
Iteration 1103: Cost Function Value = 92333443.38169573
Iteration 1104: Cost Function Value = 92265737.7594789
Iteration 1105: Cost Function Value = 92198215.92989224
Iteration 1106: Cost Function Value = 92130877.26304209
Iteration 1107: Cost Function Value = 92063721.13134138
Iteration 1108: Cost Function Value = 91996746.90950064
Iteration 1109: Cost Function Value = 91929953.97451816
Iteration 1110: Cost Function Value = 91863341.70567127
Iteration 1111: Cost Function Value = 91796909.48450673
Iteration 1112: Cost Function Value = 91730656.69483186
Iteration 1113: Cost Function Value = 91664582.72270538
Iteration 1114: Cost Function Value = 91598686.9564283
Iteration 1115: Cost Function Value = 91532968.78653492
Iteration 1116: Cost Function Value = 91467427.60578398
Iteration 1117: Cost Function Value = 91402062.809

Iteration 1285: Cost Function Value = 82504973.0664288
Iteration 1286: Cost Function Value = 82462271.82662958
Iteration 1287: Cost Function Value = 82419670.19593516
Iteration 1288: Cost Function Value = 82377167.84632967
Iteration 1289: Cost Function Value = 82334764.45095363
Iteration 1290: Cost Function Value = 82292459.68409956
Iteration 1291: Cost Function Value = 82250253.22120771
Iteration 1292: Cost Function Value = 82208144.73886228
Iteration 1293: Cost Function Value = 82166133.91478677
Iteration 1294: Cost Function Value = 82124220.42784022
Iteration 1295: Cost Function Value = 82082403.95801294
Iteration 1296: Cost Function Value = 82040684.1864224
Iteration 1297: Cost Function Value = 81999060.79530908
Iteration 1298: Cost Function Value = 81957533.46803269
Iteration 1299: Cost Function Value = 81916101.88906786
Iteration 1300: Cost Function Value = 81874765.74400008
Iteration 1301: Cost Function Value = 81833524.71952182
Iteration 1302: Cost Function Value = 81792378.503

Iteration 1717: Cost Function Value = 70206473.21426845
Iteration 1718: Cost Function Value = 70187201.41608526
Iteration 1719: Cost Function Value = 70167955.98932038
Iteration 1720: Cost Function Value = 70148736.86073087
Iteration 1721: Cost Function Value = 70129543.95732312
Iteration 1722: Cost Function Value = 70110377.20635125
Iteration 1723: Cost Function Value = 70091236.5353169
Iteration 1724: Cost Function Value = 70072121.8719678
Iteration 1725: Cost Function Value = 70053033.14429738
Iteration 1726: Cost Function Value = 70033970.28054358
Iteration 1727: Cost Function Value = 70014933.20918803
Iteration 1728: Cost Function Value = 69995921.85895538
Iteration 1729: Cost Function Value = 69976936.15881222
Iteration 1730: Cost Function Value = 69957976.03796647
Iteration 1731: Cost Function Value = 69939041.4258663
Iteration 1732: Cost Function Value = 69920132.25219935
Iteration 1733: Cost Function Value = 69901248.44689201
Iteration 1734: Cost Function Value = 69882389.9401

In [19]:
y_pred = model.predict_multiple(X)

In [20]:
pd.DataFrame({
    'Actual': y,
    'Predicted': y_pred
})

Unnamed: 0,Actual,Predicted
0,192261.83,191891.171058
1,191792.06,187538.441801
2,191050.39,174265.063121
3,182901.99,169077.401029
4,166187.94,162600.62186
5,156991.12,158091.202482
6,156122.51,138105.441587
7,155752.6,157527.40385
8,152211.77,151751.894284
9,149759.96,146785.166966


In [21]:
from sklearn.metrics import r2_score

print(r2_score(y, y_pred))

0.9135797685177178
