## ***Machine Learning in Python***
### Linear Regression

Linear Regression is used as a predictive model that assumes a linear relationship between the dependent variable ( which is the variable we are trying to predict) and the independent variable (input)

For example, you may use linear regression to predict the marks of a student (your dependent variable) based on the attendance, internal grades (input variable)

Under the simple linear regression, only one independent/input variable is used to predict the dependent variable. ( Y=C+M*X)

Y = Dependent Variable (output)
C = Constant (y-intercept)
M = Slope of the regression line ( the effect that X has on Y)
X = independent variable (input)

In reality, a relationship may exist between the dependent variable and multiple independent variables. For these types of models (assuming linearity), we can use Multiple Linear Regression (Y = C+M1*X1+M2*X2+...)


In [8]:
from pandas import DataFrame
import statsmodels.api as sm


In [9]:
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
                }




In [13]:
df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])
df

Unnamed: 0,Year,Month,Interest_Rate,Unemployment_Rate,Stock_Index_Price
0,2017,12,2.75,5.3,1464
1,2017,11,2.5,5.3,1394
2,2017,10,2.5,5.3,1357
3,2017,9,2.5,5.3,1293
4,2017,8,2.5,5.4,1256
5,2017,7,2.5,5.6,1254
6,2017,6,2.5,5.5,1234
7,2017,5,2.25,5.5,1195
8,2017,4,2.25,5.5,1159
9,2017,3,2.25,5.6,1167


In [14]:
X = df[['Interest_Rate','Unemployment_Rate']]
#above here we have two variables so multiple linear regression, if you have only one variable, use x = df['Interest_Rate], or if you have more variables, you can add them here
Y = df['Stock_Index_Price']



In [15]:
X = sm.add_constant(X) #adding a constant

  return ptp(axis=axis, out=out, **kwargs)


In [16]:
model = sm.OLS(Y,X).fit()
predictions = model.predict(X)

print_model = model.summary()
print(print_model)



                            OLS Regression Results                            
Dep. Variable:      Stock_Index_Price   R-squared:                       0.898
Model:                            OLS   Adj. R-squared:                  0.888
Method:                 Least Squares   F-statistic:                     92.07
Date:                Sat, 12 Oct 2019   Prob (F-statistic):           4.04e-11
Time:                        15:46:51   Log-Likelihood:                -134.61
No. Observations:                  24   AIC:                             275.2
Df Residuals:                      21   BIC:                             278.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1798.4040    899.24

#### Interpreting the Regression Results

I highlighted several important components within the results:

#### Adjusted. R-squared 
reflects the fit of the model. R-squared values range from 0 to 1, where a higher value generally indicates a better fit, assuming certain conditions are met.
##### const coefficient 
is your Y-intercept. It means that if both the Interest_Rate and Unemployment_Rate coefficients are zero, then the expected output (i.e., the Y) would be equal to the const coefficient.
#### Interest_Rate coefficient 
represents the change in the output Y due to a change of one unit in the interest rate (everything else held constant)
##### Unemployment_Rate coefficient 
represents the change in the output Y due to a change of one unit in the unemployment rate (everything else held constant)
#### std err 
reflects the level of accuracy of the coefficients. The lower it is, the higher is the level of accuracy
##### P >|t| 
is your p-value. A p-value of less than 0.05 is considered to be statistically significant
##### Confidence Interval 
represents the range in which our coefficients are likely to fall (with a likelihood of 95%)

