# 第10章 工具变量法
![工具变量法流程图](../pic/10-1-工具变量法流程.png)

#### （1）构建参照OLS
发现：教育投资回报率太高，可能存在“遗漏变量”，考虑引入遗漏变量“能力”。
- s的回归系数为10.26%，这与现实情况相差太大。
- 以iq作为能力的代理变量

In [119]:
import pandas as pd
import statsmodels.api as sm
from linearmodels.iv import IV2SLS

grilic = pd.read_stata('../2_Data/Data-2e/grilic.dta')

dependent = grilic['lnw']
exog = grilic[['s','expr', 'tenure', 'rns', 'smsa']]
exog = sm.add_constant(exog)
endog = grilic['iq']
instruments = grilic[['med','kww']]
exog_iq = grilic[['s','expr', 'tenure', 'rns', 'smsa','iq']]
exog_iq = sm.add_constant(exog_iq)

In [120]:
res_ols = IV2SLS(dependent, exog, None,None).fit()


#### （2）引入“智商（iq）”作为“能力”的代理变量，进行OLS
发现：教育投资回报率依然很高,还有9.28%。

In [121]:
res_ols_iq = IV2SLS(dependent, exog_iq,None,None).fit()


#### （3）由于用“iq”度量“能力”存在“测量误差”，考虑引入工具变量进行2SLS，使用稳健标准误。
工具变量：
- med：母亲的教育年限
- kww：kww测试成绩
都与iq成正相关，且假设都外生。

发现：教育投资回报率将为6.08%，显著

In [122]:
iv_model = IV2SLS(dependent=dependent,
                exog=exog,
                endog=endog,
                instruments=instruments
                )
res_iv = iv_model.fit()

In [123]:
print(res_iv.summary)

                          IV-2SLS Estimation Summary                          
Dep. Variable:                    lnw   R-squared:                      0.2775
Estimator:                    IV-2SLS   Adj. R-squared:                 0.2718
No. Observations:                 758   F-statistic:                    370.04
Date:                Fri, Apr 26 2024   P-value (F-stat)                0.0000
Time:                        19:58:31   Distribution:                  chi2(6)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          3.2180     0.3984     8.0781     0.0000      2.4373      3.9988
s              0.0608     0.0190     3.2073     0.00


#### （4）进行工具变量外生性的过度识别检验

In [124]:
print(res_iv.wooldridge_overid)
print('=======================')
print(res_iv.sargan)
print('=======================')
print(res_iv.anderson_rubin)
print('=======================')
print(res_iv.basmann)

Wooldridge's score test of overidentification
H0: Model is not overidentified.
Statistic: 0.1515
P-value: 0.6972
Distributed: chi2(1)
Sargan's test of overidentification
H0: The model is not overidentified.
Statistic: 0.1300
P-value: 0.7185
Distributed: chi2(1)
Anderson-Rubin test of overidentification
H0: The model is not overidentified.
Statistic: 0.1299
P-value: 0.7185
Distributed: chi2(1)
Basmann's test of overidentification
H0: The model is not overidentified.
Statistic: 0.1286
P-value: 0.7199
Distributed: chi2(1)


#### （5）弱工具变量检验：工具变量与解释变量的相关性检验

默认是稳健标准误情况下的第一阶段结果

In [130]:
print(res_iv.first_stage.diagnostics)
print(res_iv.first_stage.individual) # 第一阶段完整结果 
## 工具变量都显著不为0.

    rsquared  partial.rsquared  shea.rsquared     f.stat    f.pval   f.dist
iq    0.3066          0.038229       0.038229  27.091564  0.000001  chi2(2)
{'iq':                             OLS Estimation Summary                            
Dep. Variable:                     iq   R-squared:                      0.3066
Estimator:                        OLS   Adj. R-squared:                 0.3001
No. Observations:                 758   F-statistic:                    337.72
Date:                Fri, Apr 26 2024   P-value (F-stat)                0.0000
Time:                        20:06:38   Distribution:                  chi2(7)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
---------------------------------------------------

正式检验需要第一阶段的非稳健的F统计量，是cov_type调整

In [126]:
print(iv_model.fit(cov_type='unadjusted').first_stage.diagnostics)  # 第一阶段模型诊断
print('==============================================================================')
print(iv_model.fit(cov_type='unadjusted').first_stage.individual)  # 第一阶段完整模型结果

    rsquared  partial.rsquared  shea.rsquared     f.stat        f.pval  \
iq    0.3066          0.038229       0.038229  15.064763  3.849954e-07   

      f.dist  
iq  F(2,750)  
{'iq':                             OLS Estimation Summary                            
Dep. Variable:                     iq   R-squared:                      0.3066
Estimator:                        OLS   Adj. R-squared:                 0.3001
No. Observations:                 758   F-statistic:                    335.16
Date:                Fri, Apr 26 2024   P-value (F-stat)                0.0000
Time:                        19:58:32   Distribution:                  chi2(7)
Cov. Estimator:            unadjusted                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------

#### （6）稳健起见，使用LIML再次估计

kappa 为 1 说明与2SLS基本一样。

In [127]:
from linearmodels.iv  import IVLIML

iv_liml_model = IVLIML(dependent=dependent,exog=exog,endog=endog,instruments=instruments)
res_iv_liml = iv_liml_model.fit()

print(res_iv_liml.summary)

                          IV-LIML Estimation Summary                          
Dep. Variable:                    lnw   R-squared:                      0.2768
Estimator:                    IV-LIML   Adj. R-squared:                 0.2710
No. Observations:                 758   F-statistic:                    369.62
Date:                Fri, Apr 26 2024   P-value (F-stat)                0.0000
Time:                        19:58:32   Distribution:                  chi2(6)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          3.2150     0.4001     8.0345     0.0000      2.4307      3.9993
s              0.0606     0.0190     3.1857     0.00

#### （7）检验存在内生解释变量

In [128]:
print(res_iv.wu_hausman())
print(res_iv.durbin())

Wu-Hausman test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 3.8719
P-value: 0.0495
Distributed: F(1,750)
Durbin test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 3.8931
P-value: 0.0485
Distributed: chi2(1)



#### （8）汇报结果

In [131]:
from linearmodels.iv.results import compare
from collections import OrderedDict

od = OrderedDict()
od['ols_without_iq'] = res_ols
od['ols_with_iq'] = res_ols_iq
od['2sls'] = res_iv
od['liml_iq'] = res_iv_liml

print(compare(od,stars=True))

                               Model Comparison                              
                     ols_without_iq   ols_with_iq          2sls       liml_iq
-----------------------------------------------------------------------------
Dep. Variable                   lnw           lnw           lnw           lnw
Estimator                       OLS           OLS       IV-2SLS       IV-LIML
No. Observations                758           758           758           758
Cov. Est.                    robust        robust        robust        robust
R-squared                    0.3521        0.3600        0.2775        0.2768
Adj. R-squared               0.3478        0.3548        0.2718        0.2710
F-statistic                  423.58        435.33        370.04        369.62
P-value (F-stat)             0.0000        0.0000        0.0000        0.0000
const                     4.1037***     3.8952***     3.2180***     3.2150***
                           (46.996)      (33.756)      (8.0781) 