多期DID之安慰剂检验、平行趋势检验
NEW!连享会·推文专辑:
Stata资源 | 数据处理 | Stata绘图
连享会学习群-常见问题解答汇总:
👉 WD 主页:https://gitee.com/arlionn/WD
Stata 暑期班:9天直播
🎦 时间:2020.7.28-8.7
🍓 嘉宾:连玉君 (中山大学) | 江艇 (中国人民大学)
🌲 主页:https://gitee.com/arlionn/PX | 👌 微信版「基础不牢,地动山摇……」
多期DID之安慰剂检验、平行趋势检验
❝
这期将介绍多期DID中安慰剂检验的实现步骤,相关数据后台回复20200628获取。
❞
在传统DID模型中,所有单位的政策时间一致,安慰剂检验只需在所有单位中随机抽取固定数量的若干单位作为实验组便可。但是,在多期DID中每个单位的政策时间不同,该种方法便不再适用。
解决办法就是:为每个样本对象随机抽取样本期作为其政策时间。比如,本文中提供了我国30个省2000-2018年的数据,在多期DID中就需要为这30个省中每个省随机抽取2000-2018中的某一个年份作为它的政策时间。
首先,让我们来看一下原始政策时间下的多期DID估计情况:
多期DID估计
cd ×××××××××
use 数据0.dta, clear
xtset id year
* 生成单位时间处理变量
gen DT = ((id == 1 & year >= 2005) | (id == 2 & year >= 2005) | (id == 3 & year >= 2006) | (id == 4 & year >= 2006) | (id == 5 & year >= 2006) | (id == 6 & year >= 2006) | (id == 7 & year >= 2006) | (id == 8 & year >= 2006) | (id == 9 & year >= 2005) | (id == 10 & year >= 2003) | ( id == 11 &year >= 2004) | (id == 12 & year >= 2006) | (id == 13 & year >= 2006) | (id == 14 & year >= 2006) | (id == 15 & year >= 2005) | (id == 16 & year >= 2006) | (id == 17 & year >= 2006) | (id == 18 & year >= 2006) | (id == 19 & year >= 2002) | (id == 20 & year >= 2006) | (id == 21 & year >= 2003) | (id == 22 & year >= 2006) | (id == 23 & year >= 2006) | (id == 24 & year >= 2006) | (id == 25 & year >= 2006) | (id == 27 & year >= 2005) | (id == 28 & year >= 2006) | (id == 29 & year >= 2006) | (id == 30 & year >= 2006))
多期DID估计:
xtreg y DT x1-x6, fe
结果:
Fixed-effects (within) regression Number of obs = 570
Group variable: id Number of groups = 30
R-sq: Obs per group:
within = 0.5624 min = 19
between = 0.0578 avg = 19.0
overall = 0.2567 max = 19
F(7,533) = 97.87
corr(u_i, Xb) = -0.3584 Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
DT | .0792811 .0104967 7.55 0.000 .0586611 .099901
x1 | .7772515 .4184583 1.86 0.064 -.0447783 1.599281
x2 | .0796133 .0644417 1.24 0.217 -.0469776 .2062043
x3 | .2698871 .0823515 3.28 0.001 .1081137 .4316605
x4 | .1934262 .1931832 1.00 0.317 -.1860676 .5729201
x5 | -.7536294 .2024724 -3.72 0.000 -1.151371 -.3558877
x6 | -1.154312 .2885104 -4.00 0.000 -1.721069 -.5875551
_cons | .8809552 .0650585 13.54 0.000 .7531526 1.008758
-------------+----------------------------------------------------------------
sigma_u | .10451545
sigma_e | .06805517
rho | .70224943 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(29, 533) = 26.25 Prob > F = 0.0000
平行趋势检验
* 平行趋势检验gen current = ((id == 1 & year == 2005) | (id == 2 & year == 2005) | (id == 3 & year == 2006) | (id == 4 & year == 2006) | (id == 5 & year == 2006) | (id == 6 & year == 2006) | (id == 7 & year == 2006) | (id == 8 & year == 2006) | (id == 9 & year == 2005) | (id == 10 & year == 2003) | ( id == 11 &year == 2004) | (id == 12 & year == 2006) | (id == 13 & year == 2006) | (id == 14 & year == 2006) | (id == 15 & year == 2005) | (id == 16 & year == 2006) | (id == 17 & year == 2006) | (id == 18 & year == 2006) | (id == 19 & year == 2002) | (id == 20 & year == 2006) | (id == 21 & year == 2003) | (id == 22 & year == 2006) | (id == 23 & year == 2006) | (id == 24 & year == 2006) | (id == 25 & year == 2006) | (id == 27 & year == 2005) | (id == 28 & year == 2006) | (id == 29 & year == 2006) | (id == 30 & year == 2006))
gen pre6 = f6.currentgen pre5 = f5.currentgen pre4 = f4.currentgen pre3 = f3.currentgen pre2 = f2.currentgen pre1 = f.currentgen post1 = l.currentgen post2 = l2.currentgen post3 = l3.currentgen post4 = l4.currentgen post5 = l5.currentgen post6 = l6.currentgen post7 = l7.current
replace pre1 = 0 if pre1 == .replace pre2 = 0 if pre2 == .replace pre3 = 0 if pre3 == .replace pre4 = 0 if pre4 == .replace pre5 = 0 if pre5 == .replace pre6 = 0 if pre6 == .replace post1 = 0 if post1 == .replace post2 = 0 if post2 == .replace post3 = 0 if post3 == .replace post4 = 0 if post4 == .replace post5 = 0 if post5 == .replace post6 = 0 if post6 == .replace post7 = 0 if post7 == .
xtreg y pre3 pre2 pre1 current post1 post2 post3 post4 post5 post6 post7 x1-x6, fe
coefplot, keep(pre3 pre2 pre1 current post1 post2 post3 post4 post5 post6 post7) vertical addplot(line @b @at) yline(0) levels(95)
安慰剂检验
生成备用矩阵
mat b = J(500,1,0)
mat se = J(500,1,0)
mat p = J(500,1,0)
抽样过程-方案1:
在变量year中随机抽取30个数据依次作为这30个省份的政策时间
forvalues i=1/500{ use 数据0.dta, clear xtset id year sample 30, count keep year mkmat year, matrix(sampleyear) //向量转化为矩阵,方便调用 use 数据.dta,clear xtset id year gen treat = 0 * 生成单位时间处理变量 foreach j of numlist 1/30 { replace trea = 1 if (id == `j' & year >= sampleyear[`j',1]) } qui xtreg y treat x1-x6, fe * 存储并计算所需回归结果 mat b[`i',1] = _b[treat] mat se[`i',1] = _se[treat] mat p[`i',1] = 2*ttail(e(df_r),abs(_b[treat]/_se[treat]))}
抽样过程-方案2:
与方案1不同,这里首先将数据按照省份分组,然后在每个省份组内的year变量中随机抽取一个年份作为其政策时间。该种方法更为合理,推荐使用。
forvalues i = 1/500{
use data.dta, clear
xtset id Year
bsample 1, strata(id) //根据id分组,每组随机抽取一个年份
keep Year
save matchyear.dta, replace
mkmat Year, matrix(sampleyear)
use data.dta, clear
xtset id Year
gen DID = 0
foreach j of numlist 1/25 {
replace DID = 1 if (id == `j' & Year >= sampleyear[`j',1])
}
qui xtreg IS DID ALED FIN2 FDI AFIN, re
mat b[`i',1] = _b[DID]
mat se[`i',1] = _se[DID]
scalar df_r = e(N) - e(df_m) -1
mat p[`i',1] = 2*ttail(df_r,abs(_b[DID]/_se[DID]))
}
绘图
矩阵转化为向量并绘图
svmat b, names(coef)svmat se, names(se)svmat p, names(pvalue)
drop if pvalue1 == .label var pvalue1 p值label var coef1 估计系数
twoway (scatter pvalue1 coef1, xline(0 -0.03, lwidth(0.2) lp(shortdash)) xlabel(-0.05(0.01)0.1, grid) xtitle(估计系数) ytitle(p值) msymbol(smcircle_hollow) mcolor(orange) legend(off)) (kdensity coef1, title(安慰剂检验))