# 18.4 - Interpreting Estimated Coefficients

# Question 1

Suppose that we would like to know how much families in the US are spending on recreation annually. We've estimated the following model:

<h4><center>𝑒𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒=873+0.0012𝑎𝑛𝑛𝑢𝑎𝑙_𝑖𝑛𝑐𝑜𝑚𝑒+0.00002𝑎𝑛𝑛𝑢𝑎𝑙_𝑖𝑛𝑐𝑜𝑚𝑒^2−223.57ℎ𝑎𝑣𝑒_𝑘𝑖𝑑𝑠</center></h4>

expenditure is the annual spending on recreation in US dollars, annual_income is the annual income in US dollars, and have_kids is a dummy variable indicating the families with children. Interpret the estimated coefficients. What additional statistics should be given in order to make sure that your interpretations make sense statistically. Write up your answer.

For every \\$1 increase in a familiy's income, the dependent variable increases by the derivative (wrt annual_income), which is .00004\*annual_income +.0012. So if a family's income changes from \\$50,000 to \\$50,001, the family will spend .00002*50000+.0012 = \$1.0012 more on expenditures (the model might need some tweaking). 

The intercept would imply that families making \\$0 with 0 kids spend \\$873 on recreation, but it is unlikely that the model is accurate at that level of annual income.

Families also spend, on average, \$223.57 less if they have kids.

# Question 2

In [1]:
import numpy as np
import pandas as pd
from sklearn import linear_model
import statsmodels.api as sm
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'weatherinszeged'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))

weather = pd.read_sql_query('select * from weatherinszeged',con=engine)

engine.dispose()

In [2]:
y = weather['apparenttemperature'] - weather['temperature']
X = weather[['humidity', 'windspeed']]

results = sm.OLS(y, X).fit()
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.425
Model:,OLS,Adj. R-squared (uncentered):,0.425
Method:,Least Squares,F-statistic:,35700.0
Date:,"Tue, 19 Nov 2019",Prob (F-statistic):,0.0
Time:,23:45:00,Log-Likelihood:,-176750.0
No. Observations:,96453,AIC:,353500.0
Df Residuals:,96451,BIC:,353500.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
humidity,-0.4873,0.010,-47.338,0.000,-0.507,-0.467
windspeed,-0.0772,0.001,-126.510,0.000,-0.078,-0.076

0,1,2,3
Omnibus:,9577.682,Durbin-Watson:,0.228
Prob(Omnibus):,0.0,Jarque-Bera (JB):,12669.324
Skew:,-0.867,Prob(JB):,0.0
Kurtosis:,3.378,Cond. No.,27.2


Yes, both variables are statistically significant. 

The windspeed variable makes sense. If it's more windy, it's going to feel colder (relative to the actual temperature). I would expect humidity to have the opposite sign though. It seems like more humid days feel warmer, at least in the summer.

Humidity is on a scale of 0 to 100. So the parameter estimate means that for every 1 percent increase in humidity, it feels .0049 degrees colder, relative to the actual temperature.

The windspeed variable indicates that for every 1 mph/kph increase in wind speed, the perceived tmperature falls .077 degrees, again relative to the actual temperature.

In [5]:
weather.describe()

Unnamed: 0,temperature,apparenttemperature,humidity,windspeed,windbearing,visibility,loudcover,pressure,humidity*windspeed
count,96453.0,96453.0,96453.0,96453.0,96453.0,96453.0,96453.0,96453.0,96453.0
mean,11.932678,10.855029,0.734899,10.81064,187.509232,10.347325,0.0,1003.235956,7.640729
std,9.551546,10.696847,0.195473,6.913571,107.383428,4.192123,0.0,116.969906,5.034842
min,-21.822222,-27.716667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,4.688889,2.311111,0.6,5.8282,116.0,8.3398,0.0,1011.9,3.820852
50%,12.0,12.0,0.78,9.9659,180.0,10.0464,0.0,1016.45,6.701464
75%,18.838889,18.838889,0.89,14.1358,290.0,14.812,0.0,1021.09,10.21384
max,39.905556,39.344444,1.0,63.8526,359.0,16.1,0.0,1046.38,43.346835


Including the interaction of humidity and windspeed:

In [4]:
weather['humidity*windspeed'] = weather['humidity']*weather['windspeed']

y = weather['apparenttemperature'] - weather['temperature']
X = weather[['humidity', 'windspeed', 'humidity*windspeed']]

results = sm.OLS(y, X).fit()
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.533
Model:,OLS,Adj. R-squared (uncentered):,0.533
Method:,Least Squares,F-statistic:,36770.0
Date:,"Tue, 19 Nov 2019",Prob (F-statistic):,0.0
Time:,23:46:32,Log-Likelihood:,-166700.0
No. Observations:,96453,AIC:,333400.0
Df Residuals:,96450,BIC:,333400.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
humidity,0.2820,0.011,26.590,0.000,0.261,0.303
windspeed,0.0958,0.001,74.776,0.000,0.093,0.098
humidity*windspeed,-0.3038,0.002,-149.513,0.000,-0.308,-0.300

0,1,2,3
Omnibus:,4919.327,Durbin-Watson:,0.265
Prob(Omnibus):,0.0,Jarque-Bera (JB):,9471.445
Skew:,-0.381,Prob(JB):,0.0
Kurtosis:,4.333,Cond. No.,38.0


All variable coefficients are statistically significant.

The signs for the humidity and windspeed coefficients both changed to positive. However, the interaction term coefficient is negative. For most of the values in our data, the derivative is still negative with respect to either variable.

The change in our dependent variable due to a one-unit change in windspeed is .0958 - .3038 \* humidity. The average value of humidity in our dataset is 0.735, so the average change is -.127 per mph/kph

The change in our dependent variable due to a one-unit change in humidity is .282 - .3038 \* windspeed. The average windspeed is 10.8, so the average change is -3.00. Our humidity variable is on a scale of 0 to 1, so a one-percent change in humidity causes a -.03 unit change in our dependent variable.

# Question 3

In [7]:
postgres_db = 'houseprices'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))

house = pd.read_sql_query('select * from houseprices',con=engine)

engine.dispose()

y = house['saleprice']

X = house[['grlivarea', 'totalbsmtsf', 'fullbath', 'halfbath', 'overallqual', 'overallcond', 'yearbuilt', 'garagearea']]
X = sm.add_constant(X)

results = sm.OLS(y, X).fit()

results.summary()

  return ptp(axis=axis, out=out, **kwargs)


0,1,2,3
Dep. Variable:,saleprice,R-squared:,0.774
Model:,OLS,Adj. R-squared:,0.773
Method:,Least Squares,F-statistic:,621.3
Date:,"Wed, 20 Nov 2019",Prob (F-statistic):,0.0
Time:,00:03:02,Log-Likelihood:,-17458.0
No. Observations:,1460,AIC:,34930.0
Df Residuals:,1451,BIC:,34980.0
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.078e+06,1.04e+05,-10.322,0.000,-1.28e+06,-8.73e+05
grlivarea,56.7268,3.607,15.727,0.000,49.651,63.802
totalbsmtsf,27.7002,3.157,8.773,0.000,21.507,33.894
fullbath,-4205.0317,2830.361,-1.486,0.138,-9757.068,1347.005
halfbath,-2034.5941,2580.532,-0.788,0.431,-7096.565,3027.377
overallqual,1.947e+04,1161.885,16.761,0.000,1.72e+04,2.18e+04
overallcond,6557.8129,986.728,6.646,0.000,4622.247,8493.379
yearbuilt,494.5793,53.807,9.192,0.000,389.031,600.128
garagearea,44.6874,6.115,7.308,0.000,32.692,56.683

0,1,2,3
Omnibus:,569.028,Durbin-Watson:,1.979
Prob(Omnibus):,0.0,Jarque-Bera (JB):,83414.289
Skew:,-0.722,Prob(JB):,0.0
Kurtosis:,40.002,Cond. No.,293000.0


All our variables except fullbath and halfbath are statistically significant.

In [8]:
y = house['saleprice']

X = house[['grlivarea', 'totalbsmtsf', 'overallqual', 'overallcond', 'yearbuilt', 'garagearea']]
X = sm.add_constant(X)

results = sm.OLS(y, X).fit()

results.summary()

0,1,2,3
Dep. Variable:,saleprice,R-squared:,0.774
Model:,OLS,Adj. R-squared:,0.773
Method:,Least Squares,F-statistic:,827.9
Date:,"Wed, 20 Nov 2019",Prob (F-statistic):,0.0
Time:,00:04:03,Log-Likelihood:,-17459.0
No. Observations:,1460,AIC:,34930.0
Df Residuals:,1453,BIC:,34970.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.007e+06,9.14e+04,-11.019,0.000,-1.19e+06,-8.28e+05
grlivarea,53.1172,2.544,20.879,0.000,48.127,58.108
totalbsmtsf,29.2147,2.841,10.285,0.000,23.643,34.787
overallqual,1.933e+04,1158.020,16.692,0.000,1.71e+04,2.16e+04
overallcond,6597.9506,985.828,6.693,0.000,4664.153,8531.749
yearbuilt,456.6546,46.745,9.769,0.000,364.960,548.349
garagearea,45.2792,6.097,7.426,0.000,33.319,57.240

0,1,2,3
Omnibus:,544.391,Durbin-Watson:,1.974
Prob(Omnibus):,0.0,Jarque-Bera (JB):,78000.53
Skew:,-0.637,Prob(JB):,0.0
Kurtosis:,38.785,Cond. No.,256000.0


After excluding non-statistically significant variables, most of our point estimates are approximately the same.

According to our model:

House selling price increases $53 for every square foot of above-ground living area.

House price increases $29 for every square foot added to the basement.

House price increases $1,933 for every 1-point increase in overall quality (10-point scale).

House price increases $6,598 for every 1-point increase in overall condition (10-point scale).

A 1-year increase in the year a house was built increases selling price by $457.

Every extra square foot of garage space increases selling price by $45.

All these amounts sound fairly reasonable to me.