# Interacting Dummy and Regular Variables (First Example)

### Intro and objectives

### In this lab you will learn:
1. examples of simple regression models with dummy variables interacting with regular ones.
2. how to fit simple regression models in Python.


## What I hope you'll get out of this lab
* The feeling that you'll "know where to start" when you need to fit a simple regression model and include several dummy variables interacting with regular ones.
* Examples of simple regression models
* How to interpret the results obtained

In [1]:
!pip install wooldridge
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wooldridge
  Downloading wooldridge-0.4.4-py3-none-any.whl (5.1 MB)
[K     |████████████████████████████████| 5.1 MB 4.9 MB/s 
Installing collected packages: wooldridge
Successfully installed wooldridge-0.4.4


# Example 1. Log Hourly Wage, marital status and gender

## In this example we want to test for differences in return to education accross gender.

### Using the data in WAGE1 where n=526 individuals

In [2]:
Wages = woo.dataWoo('wage1')


In [3]:
Wages.head()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.0,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4


In [4]:
Wages.describe()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
count,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,...,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0,526.0
mean,5.896103,12.562738,17.01711,5.104563,0.102662,0.479087,0.608365,1.043726,0.722433,0.250951,...,0.043726,0.287072,0.10076,0.258555,0.36692,0.1673,0.140684,1.623268,473.435361,78.15019
std,3.693086,2.769022,13.57216,7.224462,0.303805,0.500038,0.48858,1.261891,0.448225,0.433973,...,0.20468,0.452826,0.301298,0.438257,0.482423,0.373599,0.348027,0.531538,616.044772,199.434664
min,0.53,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.634878,1.0,0.0
25%,3.33,12.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.202972,25.0,0.0
50%,4.65,12.0,13.5,2.0,0.0,0.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.536867,182.5,4.0
75%,6.88,14.0,26.0,7.0,0.0,1.0,1.0,2.0,1.0,0.75,...,0.0,1.0,0.0,1.0,1.0,0.0,0.0,1.928619,676.0,49.0
max,24.98,18.0,51.0,44.0,1.0,1.0,1.0,6.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,3.218076,2601.0,1936.0


In [5]:
type(Wages)

pandas.core.frame.DataFrame

In [6]:
Wages.columns

Index(['wage', 'educ', 'exper', 'tenure', 'nonwhite', 'female', 'married',
       'numdep', 'smsa', 'northcen', 'south', 'west', 'construc', 'ndurman',
       'trcommpu', 'trade', 'services', 'profserv', 'profocc', 'clerocc',
       'servocc', 'lwage', 'expersq', 'tenursq'],
      dtype='object')

## Let's interact female with educ:

In [10]:
# We impose a simple, linear, model: 
# We specify CeoSalaries as the empirical dataset

reg = smf.ols(formula='np.log(wage) ~female+educ+female*educ+exper+np.power(exper,2)+tenure+np.power(tenure,2)', data=Wages)

In [11]:
# We fit the model
results = reg.fit()


In [12]:
b = results.params
print(f'b: \n{b}\n')

b: 
Intercept              0.388806
female                -0.226789
educ                   0.082369
female:educ           -0.005565
exper                  0.029337
np.power(exper, 2)    -0.000580
tenure                 0.031897
np.power(tenure, 2)   -0.000590
dtype: float64



In [13]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:           np.log(wage)   R-squared:                       0.441
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     58.37
Date:                Sun, 20 Nov 2022   Prob (F-statistic):           1.67e-61
Time:                        20:30:43   Log-Likelihood:                -260.49
No. Observations:                 526   AIC:                             537.0
Df Residuals:                     518   BIC:                             571.1
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
Intercept               0.3888    

## How do we interpret the equation?

#### The R-squared in relatively large (R-squared=0.44).
#### The F-statistic is large (58.37) with p-value close to zero. Therefore the model is statistically significant.






# In regards to the Hypotheses:

$H_0:\delta_1=0$

#### Given the t-statistic for the $\delta_1$ corresponding to $female*educ$ and its p-value (-0.426 and 0.670 respectively) we conclude that it is not statistically significant. Moreover the confidence interval crosses the cero [-0.031:0.020] thus providing strong evidence that the coefficient is close to cero.

# In regards to the Hypotheses:

$H_1:\delta_0=0, \delta_1=0$

In [14]:
# automated F test:
hypotheses = ['female = 0', 'female:educ = 0']
ftest = results.f_test(hypotheses)
fstat = ftest.statistic[0][0]
fpval = ftest.pvalue

In [15]:
print(f'F statistic: {fstat}\n')
print(f'F p-value: {fpval}\n')

F statistic: 34.32554896206902

F p-value: 1.0023440921149353e-14



Based on the previous result:
The F-statistic is large (34.32) and its associated p-value is almost cero.
Therefore we reject the null hypothesis that female and female*educ have no effect on salary
Therefore gender DOES HAVE an impact on salary, also gender DOES HAVE and impact on the return on education (educated women earn less than their male counterparts with the same level of education).
