找不到IV?基于异方差构造工具变量
作者:江鑫 (安徽大学)
邮箱:jiangxin199566@foxmail.com
目录
1. 背景介绍
2. 理论介绍
3. 注意事项
4. 具体操作
5. 参考资料
6. 相关推文
1. 背景介绍
在实证研究中,尽管工具变量 (IV) 是解决内生性问题的重要方法,但是寻找一个合适的工具变量却是困难的。为此,Lewbel (2012) 提出了在没有合适外生工具变量情况下,如何通过异方差来构造工具变量。
2. 理论介绍
假设有以下模型,其中 和 为内生变量, 为外生协变量向量,并且误差项 和 可能相关,此时我们要估计 和向量 。
标准工具变量估计依赖于一个出现在 方程中,但不在 方程中的 元素。不过,Lewbel (2012) 基于异方差的识别方法突破了传统工具变量估计必须满足排除性约束 (exclusion restriction) 条件的限制。具体地,该方法主要利用 的异方差中所包含的信息来构造 的有效工具变量。
标准回归模型的假设:
和 是固定常数 (特别注意的是, 是一个被处理变量,且处理效果假设是同质的); 标准外生假设,即 ,,且 是非奇异的;
Lewbel (2012) 方法最关键的额外假设:
,且 ,其中外生变量向量 ,或 是 元素的子集, 为 的均值。
Lewbel (2012) 方法可以总结为两个步骤:
对方程 (2) 进行 OLS 线性回归,得到 的估计值 ,以及残差估计值 ; 令 为部分或全部元素 (不包括常数项),构造工具变量 。然后,将构造的工具变量引入方程 (1),并采用 2SLS 估计回归系数 和 。
3. 注意事项
由于该方法中的关键假设 和 难以直接证明,因此 Lewbel (2012) 提出三个充分条件对该假设进行替代性检验。
A1:误差项 和 具有以下结构,即:
其中, 是常数,、 、 是不可观测的误差项且独立于 。假设 A1 的解释是 是内生的,因为它包含出现在两个方程的误差项 。这种假设是不可直接检验的,所以应通过经济或计量理论证明其合理性。例如, 代表个体工资, 代表个体的受教育程度,此时 可能是不可观测的能力,即同时影响 和 。 代表所有影响工资但不影响受教育程度的因素, 代表所有影响受教育程度但不影响工资的因素。
A2: 与 不相关
假设 A2 认为 满足同方差性。我们可以通过对方程 (1) 进行 Pagan 和 Hall (1983) 检验来看是否满足该假设。
A3: 与 相关。
该假设认为方程 (2) 的误差项满足异方差性,以确保构造的工具变量与 相关。
4. 具体操作
安装命令:
ssc install center, replace
ssc install bcuse, replace
ssc install ivreg2h, replace
下载数据:
. bcuse engeldat // 调用数据集. center age-twocars, prefix(z_) //变量标准化
. *以两阶段最小二乘估计为例. ivreg2h foodshare z_* (lrtotexp = lrinc), small robust
回归结果:
Standard IV Results
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 854
F( 13, 840) = 12.41
Prob > F = 0.0000
Total (centered) SS = 9.637457679 Centered R2 = 0.2904
Total (uncentered) SS = 78.91341406 Uncentered R2 = 0.9133
Residual SS = 6.838888132 Root MSE = .09023
------------------------------------------------------------------------------
| Robust
foodshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lrtotexp | -0.086 0.020 -4.33 0.000 -0.125 -0.047
z_age | -0.014 0.007 -1.98 0.048 -0.028 -0.000
z_age2 | 0.022 0.007 3.21 0.001 0.008 0.035
z_agesp | 0.000 0.003 0.14 0.890 -0.006 0.007
z_agesp2 | -0.001 0.003 -0.21 0.836 -0.006 0.005
z_spwork | -0.013 0.008 -1.52 0.130 -0.029 0.004
z_s1 | -0.004 0.009 -0.48 0.631 -0.022 0.013
z_s2 | -0.015 0.009 -1.76 0.079 -0.032 0.002
z_s3 | -0.013 0.009 -1.43 0.153 -0.030 0.005
z_washer | -0.000 0.009 -0.02 0.986 -0.018 0.018
z_gasheat | 0.007 0.007 1.05 0.296 -0.006 0.020
z_onecar | -0.033 0.010 -3.39 0.001 -0.052 -0.014
z_twocars | -0.050 0.013 -3.83 0.000 -0.076 -0.024
_cons | 0.336 0.012 27.60 0.000 0.312 0.360
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 91.532
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 211.280
(Kleibergen-Paap rk Wald F statistic): 219.969
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: lrtotexp
Included instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3
z_washer z_gasheat z_onecar z_twocars
Excluded instruments: lrinc
------------------------------------------------------------------------------
IV with Generated Instruments onlyInstruments created from Z:z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3 z_washer z_gasheat z_onecar z_twocars
IV (2SLS) estimation--------------------Estimates efficient for homoskedasticity onlyStatistics robust to heteroskedasticity Number of obs = 854 F( 13, 840) = 10.08 Prob > F = 0.0000Total (centered) SS = 9.637457679 Centered R2 = 0.2469Total (uncentered) SS = 78.91341406 Uncentered R2 = 0.9080Residual SS = 7.257858456 Root MSE = .09295------------------------------------------------------------------------------ | Robust foodshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- lrtotexp | -0.055 0.059 -0.94 0.347 -0.171 0.060 z_age | -0.015 0.008 -1.94 0.053 -0.030 0.000 z_age2 | 0.023 0.007 3.09 0.002 0.008 0.037 z_agesp | 0.001 0.004 0.34 0.735 -0.006 0.008 z_agesp2 | 0.000 0.003 0.01 0.990 -0.006 0.006 z_spwork | -0.014 0.009 -1.52 0.128 -0.033 0.004 z_s1 | -0.003 0.009 -0.38 0.701 -0.021 0.014 z_s2 | -0.015 0.009 -1.68 0.093 -0.032 0.002 z_s3 | -0.011 0.009 -1.23 0.217 -0.029 0.007 z_washer | -0.000 0.009 -0.02 0.984 -0.019 0.018 z_gasheat | 0.004 0.009 0.42 0.674 -0.014 0.022 z_onecar | -0.038 0.014 -2.65 0.008 -0.067 -0.010 z_twocars | -0.061 0.025 -2.47 0.014 -0.109 -0.013 _cons | 0.318 0.035 9.03 0.000 0.249 0.387------------------------------------------------------------------------------Underidentification test (Kleibergen-Paap rk LM statistic): 7.200 Chi-sq(12) P-val = 0.8441------------------------------------------------------------------------------Weak identification test (Cragg-Donald Wald F statistic): 2.266 (Kleibergen-Paap rk Wald F statistic): 0.892Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 21.01 10% maximal IV relative bias 11.52 20% maximal IV relative bias 6.53 30% maximal IV relative bias 4.75 10% maximal IV size 43.27 15% maximal IV size 23.24 20% maximal IV size 16.35 25% maximal IV size 12.82Source: Stock-Yogo (2005). Reproduced by permission.NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.------------------------------------------------------------------------------Hansen J statistic (overidentification test of all instruments): 12.913 Chi-sq(11) P-val = 0.2991------------------------------------------------------------------------------Instrumented: lrtotexpIncluded instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3 z_washer z_gasheat z_onecar z_twocarsExcluded instruments: lrtotexp_z_age_g lrtotexp_z_age2_g lrtotexp_z_agesp_g lrtotexp_z_agesp2_g lrtotexp_z_spwork_g lrtotexp_z_s1_g lrtotexp_z_s2_g lrtotexp_z_s3_g lrtotexp_z_washer_g lrtotexp_z_gasheat_g lrtotexp_z_onecar_g lrtotexp_z_twocars_g------------------------------------------------------------------------------
IV with Generated Instruments and External Instruments
Testing Orthogonality of Instruments created from Z:
z_age z_age2 z_agesp z_agesp2 z_spwork z_s1
z_s2 z_s3 z_washer z_gasheat z_onecar z_twocars
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs = 854
F( 13, 840) = 12.70
Prob > F = 0.0000
Total (centered) SS = 9.637457679 Centered R2 = 0.2891
Total (uncentered) SS = 78.91341406 Uncentered R2 = 0.9132
Residual SS = 6.851665184 Root MSE = .09031
------------------------------------------------------------------------------
| Robust
foodshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lrtotexp | -0.085 0.019 -4.48 0.000 -0.122 -0.048
z_age | -0.014 0.007 -1.98 0.048 -0.028 -0.000
z_age2 | 0.022 0.007 3.21 0.001 0.008 0.035
z_agesp | 0.000 0.003 0.15 0.882 -0.006 0.007
z_agesp2 | -0.001 0.003 -0.20 0.842 -0.006 0.005
z_spwork | -0.013 0.008 -1.52 0.129 -0.029 0.004
z_s1 | -0.004 0.009 -0.48 0.634 -0.022 0.013
z_s2 | -0.015 0.009 -1.75 0.080 -0.032 0.002
z_s3 | -0.013 0.009 -1.43 0.154 -0.030 0.005
z_washer | -0.000 0.009 -0.02 0.986 -0.018 0.018
z_gasheat | 0.007 0.007 1.03 0.305 -0.006 0.020
z_onecar | -0.033 0.010 -3.40 0.001 -0.053 -0.014
z_twocars | -0.051 0.013 -3.86 0.000 -0.076 -0.025
_cons | 0.336 0.012 28.99 0.000 0.313 0.358
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 101.566
Chi-sq(13) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 18.043
(Kleibergen-Paap rk Wald F statistic): 17.632
Stock-Yogo weak ID test critical values: 5% maximal IV relative bias 21.10
10% maximal IV relative bias 11.52
20% maximal IV relative bias 6.49
30% maximal IV relative bias 4.71
10% maximal IV size 45.64
15% maximal IV size 24.42
20% maximal IV size 17.14
25% maximal IV size 13.41
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments): 16.221
Chi-sq(12) P-val = 0.1813
-orthog- option:
Hansen J statistic (eqn. excluding suspect orthog. conditions): 16.046
Chi-sq(11) P-val = 0.1394
C statistic (exogeneity/orthogonality of suspect instruments): 0.175
Chi-sq(1) P-val = 0.6758
Instruments tested: lrinc
------------------------------------------------------------------------------
Instrumented: lrtotexp
Included instruments: z_age z_age2 z_agesp z_agesp2 z_spwork z_s1 z_s2 z_s3
z_washer z_gasheat z_onecar z_twocars
Excluded instruments: lrinc lrtotexp_z_age_g lrtotexp_z_age2_g
lrtotexp_z_agesp_g lrtotexp_z_agesp2_g lrtotexp_z_spwork_g
lrtotexp_z_s1_g lrtotexp_z_s2_g lrtotexp_z_s3_g
lrtotexp_z_washer_g lrtotexp_z_gasheat_g
lrtotexp_z_onecar_g lrtotexp_z_twocars_g
------------------------------------------------------------------------------
5. 参考资料
Lewbel A. Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models[J]. Journal of business & economic statistics, 2012, 30(1):p.67-80. -PDF- Baum C F, Lewbel A . Advice on using heteroskedasticity-based identification[J]. Stata Journal, 2019, 19(4):757-767. -PDF- 张楠, 高梦媛, 寇璇. 卫生公平的文化壁垒——跨方言区流动降低了公共卫生服务可及性吗[J]. 财贸经济, 2021, 42(02):36-50. -Link- 温兴祥. 本地非农就业对农村居民家庭消费的影响——基于CHIP农村住户调查数据的实证研究[J]. 中国经济问题, 2019(03):95-107. -Link- 计量经济圈推文:基于异方差解决内生性问题方法的使用建议 -Link-
6. 相关推文
Note:产生如下推文列表的 Stata 命令为:
lianxh 工具变量, m
安装最新版lianxh
命令:
ssc install lianxh, replace
专题:Stata命令 Stata新命令-pdslasso:众多控制变量和工具变量如何挑选? 专题:IV-GMM IV在哪里?奇思妙想的工具变量 twostepweakiv:弱工具变量有多弱? 多个(弱)工具变量如何应对-IV-mivreg? IV:工具变量不满足外生性怎么办? IV-工具变量法:第一阶段系数符号确定时的小样本无偏估计 IV:可以用内生变量的滞后项做工具变量吗? Stata: 工具变量法 (IV) 也不难呀! IV-估计:工具变量不外生时也可以用! 专题:内生性-因果推断 工具变量-IV:排他性约束及经典文献解读