# The Multiple Regression Model (Second Example)

### Intro and objectives


### In this lab you will learn:
1. examples of multiple regression models.
2. how to fit multiple regression models in Python.


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to fit a multiple regression model.
* Worked Examples of multiple regression models
* How to interpret the results obtained

In [7]:
!pip install wooldridge

import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Example 2. Determinants of hourly salary




#### In this case we fit a multiple linear models to predict college hourly salary from experience, education and tenure (experience in the company)

$ wage=\beta_0+\beta_1*educ+\beta_2*exper+\beta_3*tenure+u $



### Using the data in WAGE1 where n=526 individuals

In [2]:
salaries = woo.dataWoo('wage1')


In [3]:
salaries.head()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.0,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4


In [4]:
salaries.describe()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
count,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,...,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0
mean,5.896103,12.562738,17.01711,5.104563,0.102662,0.479087,0.608365,1.043726,0.722433,0.250951,...,0.043726,0.287072,0.10076,0.258555,0.36692,0.1673,0.140684,1.623268,473.435361,78.15019
std,3.693086,2.769022,13.57216,7.224462,0.303805,0.500038,0.48858,1.261891,0.448225,0.433973,...,0.20468,0.452826,0.301298,0.438257,0.482423,0.373599,0.348027,0.531538,616.044772,199.434664
min,0.53,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.634878,1.0,0.0
25%,3.33,12.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.202972,25.0,0.0
50%,4.65,12.0,13.5,2.0,0.0,0.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.536867,182.5,4.0
75%,6.88,14.0,26.0,7.0,0.0,1.0,1.0,2.0,1.0,0.75,...,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.928619,676.0,49.0
max,24.98,18.0,51.0,44.0,1.0,1.0,1.0,6.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.218076,2601.0,1936.0


In [15]:
type(salaries)

pandas.core.frame.DataFrame

In [16]:
# We impose a simple, linear, model: 
# We specify CeoSalaries as the empirical dataset

reg = smf.ols(formula='wage ~ educ + exper + tenure', data=salaries)

In [12]:
# We fit the model
results = reg.fit()


In [13]:
results.params

Intercept   -2.872735
educ         0.598965
exper        0.022340
tenure       0.169269
dtype: float64

## Based on the previous we have fitted the following model:

$ wage=-2.87+0.59*educ+0.0223*exper+0.169*tenure+u $


## How do we interpret the equation?

#### Based on the fitted model, we conclude:

#### 1. One year of education increases hourly salary by 0.59 dollars

#### 2. One year of previous experience increases hourly salary by 0.022 dollars
#### 3. One year of in-company experience increases hourly salary by 0.169 dollars

## T-Test of statistical significance


In [14]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.306
Model:                            OLS   Adj. R-squared:                  0.302
Method:                 Least Squares   F-statistic:                     76.87
Date:                Thu, 17 Nov 2022   Prob (F-statistic):           3.41e-41
Time:                        06:33:57   Log-Likelihood:                -1336.8
No. Observations:                 526   AIC:                             2682.
Df Residuals:                     522   BIC:                             2699.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.8727      0.729     -3.941      0.0

### Based on the previous table:

#### educ is statistically significant as its p-value is extremely low (p-value: 0.0000)

#### exper is NOT statistically significant as its p-value is rather different from zero (p-value: 0.064)

#### tenure is statistically significant as its p-value is extremely low (p-value: 0.0000)

