# Logarithmic and Quadratic functional form (Example )

### Intro and objectives
#### Understand the application of logarithmic and quadratic functional forms in linear regression models

### In this lab you will learn:
1. examples of quadratic functional forms applied to simple regression models.
2. how to fit advanced regression models in Python.


## What I hope you'll get out of this lab
* Fit more advanced linear regression models
* How to interpret the results obtained

In [1]:
!pip install wooldridge
import wooldridge as woo
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wooldridge
  Downloading wooldridge-0.4.4-py3-none-any.whl (5.1 MB)
[K     |████████████████████████████████| 5.1 MB 13.2 MB/s 
Installing collected packages: wooldridge
Successfully installed wooldridge-0.4.4


# Example 1. Housing prices and air pollution

#### For a sample of 506 communities in the Boston area, we estimate a model relating median housing price (price) in the community to various community characteristics: nox is the amount of nitrogen oxide in the air, in parts per million; dist is a weighted distance of the community from five employment centers, in miles; rooms is the average number of rooms in houses in the community; and stratio is the average student-teacher ratio of schools in the community.



#### To study the relationship between housing prices and external factors , we postulate the following model:

$ log(price)=\beta_0+\beta_1*log(nox)+\beta_2*log(dist)+\beta_3*rooms+\beta_4*stratio+u $



### Using the data in HPRICE2 where n=506 individuals

In [23]:
hprice2 = woo.dataWoo('hprice2')


In [3]:
hprice2.head()

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.1,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.0,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.0,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.3,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4


In [24]:
hprice2.describe()

Unnamed: 0,price,crime,nox,rooms,dist,radial,proptax,stratio,lowstat,lprice,lnox,lproptax
count,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0,506.0
mean,22511.509881,3.611536,5.549783,6.284051,3.795751,9.549407,40.823715,18.459289,12.701482,9.941057,1.693091,5.931405
std,9208.856171,8.590247,1.158395,0.702594,2.106137,8.707259,16.853711,2.16582,7.238066,0.409255,0.20141,0.396367
min,5000.0,0.006,3.85,3.56,1.13,1.0,18.700001,12.6,1.73,8.517193,1.348073,5.231109
25%,16850.0,0.082,4.49,5.8825,2.1,4.0,27.9,17.4,6.9225,9.732093,1.501853,5.631212
50%,21200.0,0.2565,5.38,6.21,3.21,5.0,33.0,19.1,11.36,9.961757,1.682688,5.799093
75%,24999.0,3.677,6.24,6.62,5.1875,24.0,66.599998,20.200001,17.0575,10.126591,1.83098,6.50129
max,50001.0,88.975998,8.71,8.78,12.13,24.0,71.099998,22.0,39.07,10.819798,2.164472,6.566672


In [5]:
type(hprice2)

pandas.core.frame.DataFrame

In [26]:
# We impose a simple, linear, model: 
# We specify CeoSalaries as the empirical dataset

reg = smf.ols(formula='np.log(price) ~ np.log(nox)+np.log(dist)+rooms+np.power(rooms,2)+stratio', data=hprice2)

In [27]:
# We fit the model
results = reg.fit()


In [28]:
b = results.params
print(f'b: \n{b}\n')

b: 
Intercept             13.385477
np.log(nox)           -0.901682
np.log(dist)          -0.086781
rooms                 -0.545113
np.power(rooms, 2)     0.062261
stratio               -0.047590
dtype: float64



## Based on the previous we have fitted the following model:

$ log(price)=13.39-0.902*log(nox)-0.087*log(dist)-0.545*rooms+0.062*rooms^2-0.047stratio+u $


## Let's compute t-tests of statistical significance and F-test of overall statistical significance

In [30]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:          np.log(price)   R-squared:                       0.603
Model:                            OLS   Adj. R-squared:                  0.599
Method:                 Least Squares   F-statistic:                     151.8
Date:                Fri, 18 Nov 2022   Prob (F-statistic):           7.89e-98
Time:                        16:49:28   Log-Likelihood:                -31.806
No. Observations:                 506   AIC:                             75.61
Df Residuals:                     500   BIC:                             101.0
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept             13.3855      0

### The F-statistic of overall significance (151.8) and its low p-value (7.89e-98) means that the model is significant.

### According to the t-statics and related p-values, all explanatory variables are significant.
### The R-squared is moderately large (0.603) specially for an econometric dataset.

## How do we interpret the equation?

#### This estimated equation implies that nox, distance to employment centers and student teacher ratio have negative effects on house prices.

#### 1% increase in nox decreases house prices by 0.902%

#### A unit increase in the student/teacher ratio (stratio) decreases house prices by 4.7%.

#### An increase in rooms from five to six increases price by about -54.5+12.4(5)=7.5%
#### An increase in rooms from six to seven increases price by about -54.5+12.4(6)=19.9%. This is a very strong increasing effect.

# Feel free to improve the model by adding additional regressors and other functional forms