### 系数解释

你不仅要能够拟合复杂的线性模型，还可以知道哪些变量是可以解释的，这一点很重要。

在该 notebook 中，你将拟合几种不同的模型，并在可能的情况下使用下面的测试题将适当的解释与你的系数相匹配。

在某些情况下，你的线性回归模型的系数将不会被保留，因为缺乏重要性，这不是该 notebook 的目的 。**这一点需要严格执行，从而确保你很好地理解解释系数（当它们是可以解释的时候）的方法。**

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm;

df = pd.read_csv('./house_prices.csv')
df.head()

  from pandas.core import datetools


Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price
0,1112,B,1188,3,2,ranch,598291
1,491,B,3512,5,3,victorian,1744259
2,5952,B,1134,3,2,ranch,571669
3,3525,A,1940,4,2,ranch,493675
4,5108,B,2208,6,4,victorian,1101539


我们将在整个 notebook 上为这个数据集拟合许多不同的模型。 每个模型都有一个测试题目，这些题目可以让你将模型系数的解释与相应的值相匹配。 当然，如果没有’好’的解释的话，这也是选择之一！

### 模型 1

`1.` 对于第一个模型，通过使用一个房屋的 `neighborhood`、 `style` 与 `area` 拟合一个模型来预测 `price`  。使用输出，将正确的值与下面测试1中的相应解释相匹配。不要忘记添加一个截距！ 你还需要构建虚拟变量，并且在拟合线性模型时不要忘记删除其中一列。在第一个测试题目中，通过将基线创建 邻域 **C** 与房屋 style **lodge**，将系数解释与第一个测验中的值联系起来可能是最容易的。 

### 模型 2

`2.` 现在我们来试一下预测价格的第二个模型。这一次，我们使用 `area` 与 `area squared` 来预测价格，同时也要用到房屋的 `style` 。你需要再次需要使用虚拟变量，并在模型中添加一个截距。使用模型的结果来回答第二个与第三个测试题目。

In [2]:
pd.get_dummies(df['style'])

Unnamed: 0,lodge,ranch,victorian
0,0,1,0
1,0,0,1
2,0,1,0
3,0,1,0
4,0,0,1
5,1,0,0
6,0,0,1
7,0,0,1
8,0,1,0
9,0,0,1


In [3]:
df['intercept'] = 1
df[['A','B','C']] = pd.get_dummies(df['neighborhood'])
df[['lodge','ranch','victorian']] = pd.get_dummies(df['style'])
df.head(6)

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,intercept,A,B,C,lodge,ranch,victorian
0,1112,B,1188,3,2,ranch,598291,1,0,1,0,0,1,0
1,491,B,3512,5,3,victorian,1744259,1,0,1,0,0,0,1
2,5952,B,1134,3,2,ranch,571669,1,0,1,0,0,1,0
3,3525,A,1940,4,2,ranch,493675,1,1,0,0,0,1,0
4,5108,B,2208,6,4,victorian,1101539,1,0,1,0,0,0,1
5,7507,C,1785,4,2,lodge,455235,1,0,0,1,1,0,0


In [5]:
lm = sm.OLS(df['price'], df[['intercept', 'A','B','ranch','victorian','area']])
results = lm.fit()
results.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.919
Model:,OLS,Adj. R-squared:,0.919
Method:,Least Squares,F-statistic:,13720.0
Date:,"Fri, 03 Aug 2018",Prob (F-statistic):,0.0
Time:,05:43:29,Log-Likelihood:,-80348.0
No. Observations:,6028,AIC:,160700.0
Df Residuals:,6022,BIC:,160700.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,-1.983e+05,5540.744,-35.791,0.000,-2.09e+05,-1.87e+05
A,-194.2464,4965.459,-0.039,0.969,-9928.324,9539.832
B,5.243e+05,4687.484,111.844,0.000,5.15e+05,5.33e+05
ranch,-1974.7032,5757.527,-0.343,0.732,-1.33e+04,9312.111
victorian,-6262.7365,6893.293,-0.909,0.364,-1.98e+04,7250.586
area,348.7375,2.205,158.177,0.000,344.415,353.060

0,1,2,3
Omnibus:,114.369,Durbin-Watson:,2.002
Prob(Omnibus):,0.0,Jarque-Bera (JB):,139.082
Skew:,0.271,Prob(JB):,6.290000000000001e-31
Kurtosis:,3.509,Cond. No.,11200.0


In [4]:
df.groupby('style').mean()[['price', 'area']]

Unnamed: 0_level_0,price,area
style,Unnamed: 1_level_1,Unnamed: 2_level_1
lodge,305017.6,848.978031
ranch,575131.8,1611.31798
victorian,1046083.0,2980.95996


In [5]:
df['area_squared'] = df['area']*df['area']
df.head()

Unnamed: 0,house_id,neighborhood,area,bedrooms,bathrooms,style,price,intercept,A,B,C,lodge,ranch,victorian,area_squared
0,1112,B,1188,3,2,ranch,598291,1,0,1,0,0,1,0,1411344
1,491,B,3512,5,3,victorian,1744259,1,0,1,0,0,0,1,12334144
2,5952,B,1134,3,2,ranch,571669,1,0,1,0,0,1,0,1285956
3,3525,A,1940,4,2,ranch,493675,1,1,0,0,0,1,0,3763600
4,5108,B,2208,6,4,victorian,1101539,1,0,1,0,0,0,1,4875264


In [6]:
lm2 = sm.OLS(df['price'], df[['intercept', 'area', 'area_squared', 'ranch','victorian']])
results2 = lm2.fit()
results2.summary()

0,1,2,3
Dep. Variable:,price,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.678
Method:,Least Squares,F-statistic:,3173.0
Date:,"Tue, 22 Jan 2019",Prob (F-statistic):,0.0
Time:,11:12:27,Log-Likelihood:,-84516.0
No. Observations:,6028,AIC:,169000.0
Df Residuals:,6023,BIC:,169100.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,1.855e+04,1.26e+04,1.467,0.142,-6229.316,4.33e+04
area,334.0146,13.525,24.696,0.000,307.501,360.528
area_squared,0.0029,0.002,1.283,0.199,-0.002,0.007
ranch,9917.2547,1.27e+04,0.781,0.435,-1.5e+04,3.48e+04
victorian,2509.3956,1.53e+04,0.164,0.870,-2.75e+04,3.25e+04

0,1,2,3
Omnibus:,375.22,Durbin-Watson:,2.009
Prob(Omnibus):,0.0,Jarque-Bera (JB):,340.688
Skew:,0.519,Prob(JB):,1.05e-74
Kurtosis:,2.471,Cond. No.,43300000.0
