In [1]:
from IPython.display import Image
from IPython.core.display import HTML 

# Stole this image from the following website, hey thanks!
Image(url= "https://cdn-images-1.medium.com/max/1800/1*uLHXR8LKGDucpwUYHx3VaQ.png")

# Multiple Linear Regression in Python

### Generate DataSet
First we'll generate a testing dataset: 

In [2]:
from pandas import DataFrame
from sklearn import linear_model
import statsmodels.api as sm
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
                }

df = DataFrame(Stock_Market,
               columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])

### Pull out variables of interest
Next we'll pull out the variables we went to test. You can add more than two variables on the X axis, but we'll keep it simple for this example. 

In [3]:
x = df[['Interest_Rate','Unemployment_Rate']]
y = df['Stock_Index_Price']

### sklearn: 
We'll first fit our model and make predictions with sklearn: 

In [5]:
# with sklearn
from sklearn.linear_model import LinearRegression
regression_model = LinearRegression()
# Fit the data(train the model)
regression_model.fit(x, y)

# Predict
y_predicted = regression_model.predict(x)

print('Intercept: \n', regression_model.intercept_) # pull out intercept
print('Coefficients: \n', regression_model.coef_) # pull out coeffeicients

Intercept: 
 1798.403977625855
Coefficients: 
 [ 345.54008701 -250.14657137]


This output includes the intercept and coefficients. You can use this information to build the multiple linear regression equation as follows:

`Stock_Index_Price = (Intercept) + (Interest_Rate coef)*X1 + (Unemployment_Rate coef)*X2`

And once you plug the numbers:

`Stock_Index_Price = (1798.4040) + (345.5401)*X1 + (-250.1466)*X2`

In [29]:
# prediction with sklearn
New_Interest_Rate = 2.75
New_Unemployment_Rate = 5.3
print ('Predicted Stock Index Price: \n', regression_model.predict([[New_Interest_Rate ,New_Unemployment_Rate]]))

Predicted Stock Index Price: 
 [1422.86238865]


Imagine that you want to predict the stock index price after you collected the following data:

Interest Rate = 2.75 (i.e., X1= 2.75)
Unemployment Rate = 5.3 (i.e., X2= 5.3)
If you plug that data into the regression equation, you’ll get the exact same predicted results as displayed in the second part:

`Stock_Index_Price = (1798.4040) + (345.5401)*(2.75) + (-250.1466)*(5.3) = 1422.86`

### Statsmodels
Now we'll do the same thing with statsmodels, and as you can see, the results are the same: 

In [30]:
# with statsmodels
X = sm.add_constant(X) # adding a constant
 
model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 
 
print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
Dep. Variable:      Stock_Index_Price   R-squared:                       0.898
Model:                            OLS   Adj. R-squared:                  0.888
Method:                 Least Squares   F-statistic:                     92.07
Date:                Tue, 19 Feb 2019   Prob (F-statistic):           4.04e-11
Time:                        15:22:10   Log-Likelihood:                -134.61
No. Observations:                  24   AIC:                             275.2
Df Residuals:                      21   BIC:                             278.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1798.4040    899.24

Cannot easily plot the results, as there are many dimensions. 