找不到IV？基于异方差构造工具变量 / 开普饭

作者：江鑫 (安徽大学)
邮箱：jiangxin199566@foxmail.com

1. 背景介绍
2. 理论介绍
3. 注意事项
4. 具体操作
5. 参考资料
6. 相关推文

1. 背景介绍

在实证研究中，尽管工具变量 (IV) 是解决内生性问题的重要方法，但是寻找一个合适的工具变量却是困难的。为此，Lewbel (2012) 提出了在没有合适外生工具变量情况下，如何通过异方差来构造工具变量。

2. 理论介绍

假设有以下模型，其中和为内生变量，为外生协变量向量，并且误差项和可能相关，此时我们要估计和向量。

标准工具变量估计依赖于一个出现在方程中，但不在方程中的元素。不过，Lewbel (2012) 基于异方差的识别方法突破了传统工具变量估计必须满足排除性约束 (exclusion restriction) 条件的限制。具体地，该方法主要利用的异方差中所包含的信息来构造的有效工具变量。

标准回归模型的假设：

和是固定常数 (特别注意的是，是一个被处理变量，且处理效果假设是同质的)；
标准外生假设，即，，且是非奇异的；

Lewbel (2012) 方法最关键的额外假设：

，且，其中外生变量向量，或是元素的子集，为的均值。

Lewbel (2012) 方法可以总结为两个步骤：

对方程 (2) 进行 OLS 线性回归，得到的估计值，以及残差估计值；
令为部分或全部元素 (不包括常数项)，构造工具变量。然后，将构造的工具变量引入方程 (1)，并采用 2SLS 估计回归系数和。

3. 注意事项

由于该方法中的关键假设和难以直接证明，因此 Lewbel (2012) 提出三个充分条件对该假设进行替代性检验。

A1：误差项和具有以下结构，即：

其中，是常数，、、是不可观测的误差项且独立于。假设 A1 的解释是是内生的，因为它包含出现在两个方程的误差项。这种假设是不可直接检验的，所以应通过经济或计量理论证明其合理性。例如，代表个体工资，代表个体的受教育程度，此时可能是不可观测的能力，即同时影响和。代表所有影响工资但不影响受教育程度的因素，代表所有影响受教育程度但不影响工资的因素。

A2：与不相关

假设 A2 认为满足同方差性。我们可以通过对方程 (1) 进行 Pagan 和 Hall (1983) 检验来看是否满足该假设。

A3：与相关。

该假设认为方程 (2) 的误差项满足异方差性，以确保构造的工具变量与相关。

4. 具体操作

安装命令：

ssc install center, replace

ssc install bcuse, replace

ssc install ivreg2h, replace

下载数据：

. bcuse engeldat // 调用数据集. center age-twocars, prefix(z_) //变量标准化

. *以两阶段最小二乘估计为例. ivreg2h foodshare z_* (lrtotexp = lrinc), small robust

回归结果：

Standard IV Results

IV (2SLS) estimation

--------------------

Estimates efficient for homoskedasticity only

Statistics robust to heteroskedasticity

Number of obs = 854

F( 13, 840) = 12.41

Prob > F = 0.0000

Total (centered) SS = 9.637457679 Centered R2 = 0.2904

Total (uncentered) SS = 78.91341406 Uncentered R2 = 0.9133

Residual SS = 6.838888132 Root MSE = .09023

------------------------------------------------------------------------------

| Robust

foodshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lrtotexp | -0.086 0.020 -4.33 0.000 -0.125 -0.047

z_age | -0.014 0.007 -1.98 0.048 -0.028 -0.000

z_age2 | 0.022 0.007 3.21 0.001 0.008 0.035

z_agesp | 0.000 0.003 0.14 0.890 -0.006 0.007

z_agesp2 | -0.001 0.003 -0.21 0.836 -0.006 0.005

z_spwork | -0.013 0.008 -1.52 0.130 -0.029 0.004

z_s1 | -0.004 0.009 -0.48 0.631 -0.022 0.013

z_s2 | -0.015 0.009 -1.76 0.079 -0.032 0.002

z_s3 | -0.013 0.009 -1.43 0.153 -0.030 0.005

z_washer | -0.000 0.009 -0.02 0.986 -0.018 0.018

z_gasheat | 0.007 0.007 1.05 0.296 -0.006 0.020

z_onecar | -0.033 0.010 -3.39 0.001 -0.052 -0.014

z_twocars | -0.050 0.013 -3.83 0.000 -0.076 -0.024

_cons | 0.336 0.012 27.60 0.000 0.312 0.360

------------------------------------------------------------------------------

Underidentification test (Kleibergen-Paap rk LM statistic): 91.532

Chi-sq(1) P-val = 0.0000

------------------------------------------------------------------------------

Weak identification test (Cragg-Donald Wald F statistic): 211.280

(Kleibergen-Paap rk Wald F statistic): 219.969

Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38

15% maximal IV size 8.96

20% maximal IV size 6.66

25% maximal IV size 5.53

Source: Stock-Yogo (2005). Reproduced by permission.

NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

------------------------------------------------------------------------------

Hansen J statistic (overidentification test of all instruments): 0.000

(equation exactly identified)

------------------------------------------------------------------------------

Instrumented: lrtotexp

Included instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3

z_washer z_gasheat z_onecar z_twocars

Excluded instruments: lrinc

------------------------------------------------------------------------------

IV with Generated Instruments onlyInstruments created from Z:z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3 z_washer z_gasheat z_onecar z_twocars

IV (2SLS) estimation--------------------Estimates efficient for homoskedasticity onlyStatistics robust to heteroskedasticity                                                      Number of obs =      854                                                      F( 13,   840) =    10.08                                                      Prob > F      =   0.0000Total (centered) SS     =  9.637457679                Centered R2   =   0.2469Total (uncentered) SS   =  78.91341406                Uncentered R2 =   0.9080Residual SS             =  7.257858456                Root MSE      =   .09295------------------------------------------------------------------------------             |               Robust   foodshare |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]-------------+----------------------------------------------------------------    lrtotexp |     -0.055      0.059    -0.94   0.347       -0.171       0.060       z_age |     -0.015      0.008    -1.94   0.053       -0.030       0.000      z_age2 |      0.023      0.007     3.09   0.002        0.008       0.037     z_agesp |      0.001      0.004     0.34   0.735       -0.006       0.008    z_agesp2 |      0.000      0.003     0.01   0.990       -0.006       0.006    z_spwork |     -0.014      0.009    -1.52   0.128       -0.033       0.004        z_s1 |     -0.003      0.009    -0.38   0.701       -0.021       0.014        z_s2 |     -0.015      0.009    -1.68   0.093       -0.032       0.002        z_s3 |     -0.011      0.009    -1.23   0.217       -0.029       0.007    z_washer |     -0.000      0.009    -0.02   0.984       -0.019       0.018   z_gasheat |      0.004      0.009     0.42   0.674       -0.014       0.022    z_onecar |     -0.038      0.014    -2.65   0.008       -0.067      -0.010   z_twocars |     -0.061      0.025    -2.47   0.014       -0.109      -0.013       _cons |      0.318      0.035     9.03   0.000        0.249       0.387------------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic):              7.200                                                   Chi-sq(12) P-val =   0.8441------------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic):                2.266                         (Kleibergen-Paap rk Wald F statistic):          0.892Stock-Yogo weak ID test critical values:  5% maximal IV relative bias    21.01                                         10% maximal IV relative bias    11.52                                         20% maximal IV relative bias     6.53                                         30% maximal IV relative bias     4.75                                         10% maximal IV size             43.27                                         15% maximal IV size             23.24                                         20% maximal IV size             16.35                                         25% maximal IV size             12.82Source: Stock-Yogo (2005).  Reproduced by permission.NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.------------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments):        12.913                                                   Chi-sq(11) P-val =   0.2991------------------------------------------------------------------------------Instrumented:         lrtotexpIncluded instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3                      z_washer z_gasheat z_onecar z_twocarsExcluded instruments: lrtotexp_z_age_g lrtotexp_z_age2_g lrtotexp_z_agesp_g                      lrtotexp_z_agesp2_g lrtotexp_z_spwork_g lrtotexp_z_s1_g                      lrtotexp_z_s2_g lrtotexp_z_s3_g lrtotexp_z_washer_g                      lrtotexp_z_gasheat_g lrtotexp_z_onecar_g                      lrtotexp_z_twocars_g------------------------------------------------------------------------------

IV with Generated Instruments and External Instruments

Testing Orthogonality of Instruments created from Z:

z_age z_age2 z_agesp z_agesp2 z_spwork z_s1

z_s2 z_s3 z_washer z_gasheat z_onecar z_twocars
IV (2SLS) estimation

--------------------

Estimates efficient for homoskedasticity only

Statistics robust to heteroskedasticity

Number of obs = 854 F( 13, 840) = 12.70 Prob > F = 0.0000 Total (centered) SS = 9.637457679 Centered R2 = 0.2891 Total (uncentered) SS = 78.91341406 Uncentered R2 = 0.9132 Residual SS = 6.851665184 Root MSE = .09031 ------------------------------------------------------------------------------ | Robust foodshare | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- lrtotexp | -0.085 0.019 -4.48 0.000 -0.122 -0.048 z_age | -0.014 0.007 -1.98 0.048 -0.028 -0.000 z_age2 | 0.022 0.007 3.21 0.001 0.008 0.035 z_agesp | 0.000 0.003 0.15 0.882 -0.006 0.007 z_agesp2 | -0.001 0.003 -0.20 0.842 -0.006 0.005 z_spwork | -0.013 0.008 -1.52 0.129 -0.029 0.004 z_s1 | -0.004 0.009 -0.48 0.634 -0.022 0.013 z_s2 | -0.015 0.009 -1.75 0.080 -0.032 0.002 z_s3 | -0.013 0.009 -1.43 0.154 -0.030 0.005 z_washer | -0.000 0.009 -0.02 0.986 -0.018 0.018 z_gasheat | 0.007 0.007 1.03 0.305 -0.006 0.020 z_onecar | -0.033 0.010 -3.40 0.001 -0.053 -0.014 z_twocars | -0.051 0.013 -3.86 0.000 -0.076 -0.025 _cons | 0.336 0.012 28.99 0.000 0.313 0.358 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 101.566 Chi-sq(13) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 18.043 (Kleibergen-Paap rk Wald F statistic): 17.632 Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 21.10 10% maximal IV relative bias 11.52 20% maximal IV relative bias 6.49 30% maximal IV relative bias 4.71 10% maximal IV size 45.64 15% maximal IV size 24.42 20% maximal IV size 17.14 25% maximal IV size 13.41 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 16.221 Chi-sq(12) P-val = 0.1813 -orthog- option: Hansen J statistic (eqn. excluding suspect orthog. conditions): 16.046 Chi-sq(11) P-val = 0.1394 C statistic (exogeneity/orthogonality of suspect instruments): 0.175 Chi-sq(1) P-val = 0.6758 Instruments tested: lrinc ------------------------------------------------------------------------------ Instrumented: lrtotexp Included instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3 z_washer z_gasheat z_onecar z_twocars Excluded instruments: lrinc lrtotexp_z_age_g lrtotexp_z_age2_g lrtotexp_z_agesp_g lrtotexp_z_agesp2_g lrtotexp_z_spwork_g lrtotexp_z_s1_g lrtotexp_z_s2_g lrtotexp_z_s3_g lrtotexp_z_washer_g lrtotexp_z_gasheat_g lrtotexp_z_onecar_g lrtotexp_z_twocars_g ------------------------------------------------------------------------------

5. 参考资料

Lewbel A. Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models[J]. Journal of business & economic statistics, 2012, 30(1):p.67-80. -PDF-
Baum C F, Lewbel A . Advice on using heteroskedasticity-based identification[J]. Stata Journal, 2019, 19(4):757-767. -PDF-
张楠, 高梦媛, 寇璇. 卫生公平的文化壁垒——跨方言区流动降低了公共卫生服务可及性吗[J]. 财贸经济, 2021, 42(02):36-50. -Link-
温兴祥. 本地非农就业对农村居民家庭消费的影响——基于CHIP农村住户调查数据的实证研究[J]. 中国经济问题, 2019(03):95-107. -Link-
计量经济圈推文：基于异方差解决内生性问题方法的使用建议 -Link-

6. 相关推文

Note：产生如下推文列表的 Stata 命令为：
lianxh 工具变量, m
安装最新版 lianxh 命令：
ssc install lianxh, replace

专题：Stata命令

Stata新命令-pdslasso：众多控制变量和工具变量如何挑选？

专题：IV-GMM

IV在哪里？奇思妙想的工具变量
twostepweakiv：弱工具变量有多弱？
多个(弱)工具变量如何应对-IV-mivreg？
IV：工具变量不满足外生性怎么办？
IV-工具变量法：第一阶段系数符号确定时的小样本无偏估计
IV：可以用内生变量的滞后项做工具变量吗？
Stata: 工具变量法 (IV) 也不难呀！
IV-估计：工具变量不外生时也可以用！

专题：内生性-因果推断

工具变量-IV：排他性约束及经典文献解读

找不到IV？基于异方差构造工具变量