# 1. Interpretation and significance
Suppose that you would like to know how much families in the US are spending on recreation annually. Use the following estimated model:

𝑒𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒 = 873 + 0.0012𝑎𝑛𝑛𝑢𝑎𝑙_𝑖𝑛𝑐𝑜𝑚𝑒 + 0.00002𝑎𝑛𝑛𝑢𝑎𝑙_𝑖𝑛𝑐𝑜𝑚𝑒2 − 223.57ℎ𝑎𝑣𝑒_𝑘𝑖𝑑𝑠

Here, expenditure is the annual spending on recreation in US dollars, annual_income is the annual income in US dollars, and have_kids is a dummy variable indicating the families with children. Interpret the estimated coefficients. What additional statistics can be given in order to ensure that your interpretations make sense statistically? Write up your answer.

In [None]:
# annual_income = for every $1 increase in income, expenditure increases by $0.0012
# annual_income2 = for every $1 increase in income, expenditure increases by $0.00002
# have_kids = if a family has kids, their expenditure decreases by $223.57 
# t, p values for each coefficient would allow for a confidence interval range

# 2. Weather model
First, load the dataset from the weatherinszeged table from Thinkful's database.
Build a linear regression model where your target variable is the difference between the apparenttemperature and the temperature. As explanatory variables, use humidity and windspeed. Now, estimate your model using OLS. 
Are the estimated coefficients statistically significant? 
Are the signs of the estimated coefficients in line with your previous expectations? 
Interpret the estimated coefficients. What are the relations between the target and the explanatory variables?

Next, include the interaction of humidity and windspeed to the model above and estimate the model using OLS. 
Are the coefficients statistically significant? 
Did the signs of the estimated coefficients for humidity and windspeed change? Interpret the estimated coefficients.

In [6]:
%reload_ext nb_black
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import plotly
import seaborn as sns

%matplotlib inline
from scipy import stats

from sklearn import linear_model
import statsmodels.api as sm
from sklearn.model_selection import train_test_split, cross_val_score

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [2]:
postgres_user = "dsbc_student"
postgres_pw = "7*.8G9QH21"
postgres_host = "142.93.121.174"
postgres_port = "5432"
postgres_db = "weatherinszeged"
conn_str = f"postgresql://{postgres_user}:{postgres_pw}@{postgres_host}:{postgres_port}/{postgres_db}"
query = """
SELECT *
FROM weatherinszeged
"""
df = pd.read_sql_query(query, con=conn_str)
df.head()

Unnamed: 0,date,summary,preciptype,temperature,apparenttemperature,humidity,windspeed,windbearing,visibility,loudcover,pressure,dailysummary
0,2006-03-31 22:00:00+00:00,Partly Cloudy,rain,9.472222,7.388889,0.89,14.1197,251.0,15.8263,0.0,1015.13,Partly cloudy throughout the day.
1,2006-03-31 23:00:00+00:00,Partly Cloudy,rain,9.355556,7.227778,0.86,14.2646,259.0,15.8263,0.0,1015.63,Partly cloudy throughout the day.
2,2006-04-01 00:00:00+00:00,Mostly Cloudy,rain,9.377778,9.377778,0.89,3.9284,204.0,14.9569,0.0,1015.94,Partly cloudy throughout the day.
3,2006-04-01 01:00:00+00:00,Partly Cloudy,rain,8.288889,5.944444,0.83,14.1036,269.0,15.8263,0.0,1016.41,Partly cloudy throughout the day.
4,2006-04-01 02:00:00+00:00,Mostly Cloudy,rain,8.755556,6.977778,0.83,11.0446,259.0,15.8263,0.0,1016.51,Partly cloudy throughout the day.


<IPython.core.display.Javascript object>

In [5]:
# create new column for diff in apparenttemperature and temperature
df["temp_diff"] = df.apparenttemperature - df.temperature
df["int_term"] = df.humidity * df.windspeed

# feature vars are humidity and windspeed; target is temp_diff
df = df[["temp_diff", "humidity", "windspeed"]]
df.head()

Unnamed: 0,temp_diff,humidity,windspeed
0,-2.083333,0.89,14.1197
1,-2.127778,0.86,14.2646
2,0.0,0.89,3.9284
3,-2.344444,0.83,14.1036
4,-1.777778,0.83,11.0446


<IPython.core.display.Javascript object>

### OLS Model

In [7]:
# create model vars
X = df[["humidity", "windspeed"]]
y = df.temp_diff

# create train and test vars
X_train, X_test, y_train, y_test = train_test_split(X, y)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [8]:
# create linear regression model object
X_train_const = sm.add_constant(X_train)
model = sm.OLS(y_train, X_train_const).fit()
model.summary()

0,1,2,3
Dep. Variable:,temp_diff,R-squared:,0.287
Model:,OLS,Adj. R-squared:,0.287
Method:,Least Squares,F-statistic:,14580.0
Date:,"Tue, 08 Dec 2020",Prob (F-statistic):,0.0
Time:,15:26:25,Log-Likelihood:,-127930.0
No. Observations:,72339,AIC:,255900.0
Df Residuals:,72336,BIC:,255900.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.4381,0.024,100.037,0.000,2.390,2.486
humidity,-3.0254,0.028,-109.104,0.000,-3.080,-2.971
windspeed,-0.1193,0.001,-152.782,0.000,-0.121,-0.118

0,1,2,3
Omnibus:,3056.469,Durbin-Watson:,1.991
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3612.253
Skew:,-0.485,Prob(JB):,0.0
Kurtosis:,3.507,Cond. No.,88.5


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

> the estimated coefficients are statistically significant
> every 1% increase in humidty, the difference in apparenttemp & temp descreases 3 degrees
> every 1 rate point increase in windspeed, the diff in temps decreases by 0.11 degrees

## Interaction Term

In [10]:
# create new column for humidity and windspeed interaction term
df["int_term"] = df.humidity * df.windspeed
df.head()

Unnamed: 0,temp_diff,humidity,windspeed,int_term
0,-2.083333,0.89,14.1197,12.566533
1,-2.127778,0.86,14.2646,12.267556
2,0.0,0.89,3.9284,3.496276
3,-2.344444,0.83,14.1036,11.705988
4,-1.777778,0.83,11.0446,9.167018


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [11]:
# create model vars
X = df[["humidity", "windspeed", "int_term"]]
y = df.temp_diff

# create train and test vars
X_train, X_test, y_train, y_test = train_test_split(X, y)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [12]:
# create linear regression model object
X_train_const = sm.add_constant(X_train)
model = sm.OLS(y_train, X_train_const).fit()
model.summary()

0,1,2,3
Dep. Variable:,temp_diff,R-squared:,0.341
Model:,OLS,Adj. R-squared:,0.341
Method:,Least Squares,F-statistic:,12460.0
Date:,"Tue, 08 Dec 2020",Prob (F-statistic):,0.0
Time:,15:38:04,Log-Likelihood:,-125130.0
No. Observations:,72339,AIC:,250300.0
Df Residuals:,72335,BIC:,250300.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.1027,0.039,2.659,0.008,0.027,0.178
humidity,0.1534,0.050,3.092,0.002,0.056,0.251
windspeed,0.0895,0.003,31.564,0.000,0.084,0.095
int_term,-0.2959,0.004,-76.408,0.000,-0.303,-0.288

0,1,2,3
Omnibus:,3844.435,Durbin-Watson:,2.003
Prob(Omnibus):,0.0,Jarque-Bera (JB):,7474.08
Skew:,-0.392,Prob(JB):,0.0
Kurtosis:,4.366,Cond. No.,193.0


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

> The interaction term is statistically significant
> The signs for both other coefficients changes
> with an increase in windspeed or humidity, the effect on temperature is descreased by and increase in the other var
---