# The Multiple Regression Model (First Example)

### Intro and objectives


### In this lab you will learn:
1. examples of multiple regression models.
2. how to fit multiple regression models in Python.


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to fit a multiple regression model.
* Worked Examples of multiple regression models
* How to interpret the results obtained

In [1]:
!pip install wooldridge

import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wooldridge
  Downloading wooldridge-0.4.4-py3-none-any.whl (5.1 MB)
[K     |████████████████████████████████| 5.1 MB 4.0 MB/s 
Installing collected packages: wooldridge
Successfully installed wooldridge-0.4.4
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stargazer
  Downloading stargazer-0.0.5-py3-none-any.whl (9.7 kB)
Installing collected packages: stargazer
Successfully installed stargazer-0.0.5


# Example 1. Determinants of college GPA

#### The variables in GPA1 include the college grade point average (colGPA), high school GPA (hsGPA), and achievement test score (ACT) for a sample of 141 students from a large university; both college and high school GPAs are on a four-point scale.


#### In this case we fit a multiple linear models to predict college GPA from high school GPA and achievement test score.

$ colGPA=\beta_0+\beta_1*hsGPA+\beta_2*ACT+u $



### Using the data in GPA1 where n=141 individuals

In [2]:
collegePerformance = woo.dataWoo('gpa1')


In [3]:
collegePerformance.head()

Unnamed: 0,age,soph,junior,senior,senior5,male,campus,business,engineer,colGPA,...,greek,car,siblings,bgfriend,clubs,skipped,alcohol,gradMI,fathcoll,mothcoll
0,21,0,0,1,0,0,0,1,0,3.0,...,0,1,1,0,0,2.0,1.0,1,0,0
1,21,0,0,1,0,0,0,1,0,3.4,...,0,1,0,1,1,0.0,1.0,1,1,1
2,20,0,1,0,0,0,0,1,0,3.0,...,0,1,1,0,1,0.0,1.0,1,1,1
3,19,1,0,0,0,1,1,1,0,3.5,...,0,0,1,0,0,0.0,0.0,0,0,0
4,20,0,1,0,0,0,0,1,0,3.6,...,0,1,1,1,0,0.0,1.5,1,1,0


In [4]:
collegePerformance.describe()

Unnamed: 0,age,soph,junior,senior,senior5,male,campus,business,engineer,colGPA,...,greek,car,siblings,bgfriend,clubs,skipped,alcohol,gradMI,fathcoll,mothcoll
count,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0,...,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0,141.0
mean,20.886525,0.021277,0.382979,0.503546,0.092199,0.524823,0.170213,0.794326,0.035461,3.056738,...,0.319149,0.77305,0.93617,0.475177,0.602837,1.076241,1.901064,0.87234,0.588652,0.539007
std,1.271064,0.144819,0.487846,0.50177,0.290337,0.501164,0.377159,0.405634,0.185601,0.37231,...,0.467809,0.420353,0.245321,0.501164,0.491055,1.088882,1.374701,0.3349,0.493832,0.500253
min,19.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.2,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,20.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,2.8,...,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
50%,21.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,3.0,...,0.0,1.0,1.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0
75%,21.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,3.3,...,1.0,1.0,1.0,1.0,1.0,2.0,3.0,1.0,1.0,1.0
max,30.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0,...,1.0,1.0,1.0,1.0,1.0,5.0,7.0,1.0,1.0,1.0


In [5]:
type(collegePerformance)

pandas.core.frame.DataFrame

In [6]:
# We impose a simple, linear, model: 
# We specify CeoSalaries as the empirical dataset

reg = smf.ols(formula='colGPA ~ hsGPA + ACT', data=collegePerformance)

In [7]:
# We fit the model
results = reg.fit()


In [9]:
results.params

Intercept    1.286328
hsGPA        0.453456
ACT          0.009426
dtype: float64

## Based on the previous we have fitted the following model:

$ colGPA=1.28+0.45*hsGPA+0.0094*ACT+u $


## How do we interpret the equation?

#### Based on the fitted model, we conclude:

####1. Holding ACT fixed, another point on high school grade point average is associated with another .453 points college grade point average

####2. If we compare two students with the same ACT, but the hsGPA of student A is one point higher, we predict student A to have a colGPA that is .453 higher than that of student B

####3. Holding high school grade point average fixed, another 10 points on ACT are associated with less than one point (0.9) on college GPA

## T-Test of statistical significance


In [10]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Thu, 17 Nov 2022   Prob (F-statistic):           1.53e-06
Time:                        06:19:14   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.2863      0.341      3.774      0.0

### Based on the previous table:

#### hsGPA is statistically significant as its p-value is extremely low (p-value: 0.0000)

#### ACT is statistically significant as its p-value is rather different from zero (p-value: 0.383)