Stata17:零膨胀有序Logit模型
引言
Stata的新ziologit命令适用于零膨胀有序logit模型。
有序logit回归用于对模,如症状严重程度记录为无、轻度、中度或严重。这种有序结果的有序分类反应进行建值越大表示级别越高,但数值无关紧要。
在某些情况下,数据中观察到的零(或最低类别中的值)比传统有序logit模型所期望的要多。一个0可能代表一种特征的缺失,而剩余的值则代表该特征的增加。可以观察到许多零,一些是因为个体没有该特征,一些是因为个体有该特征但表现出最低水平。例如:
在一项关于酒精消费的研究中,有些人说自己不喝酒是因为他们从不喝酒,而另一些人可能说自己不喝酒是因为他们在调查期间没有喝酒。
在一项旨在缩小肿瘤的治疗的临床试验中,结果代表没有改善、部分缓解或完全缓解。一个人可能没有表现出任何改善,因为肿瘤对治疗有抵抗性,或者因为肿瘤是可以治疗的,但在测量时没有萎缩。这种区别很重要,因为可治疗的肿瘤是更高剂量的良好候选。
在这样的上下文中,您可以使用零膨胀的有序logit (ZIOL)模型。ZIOL模型假设最低值的结果来自logit模型和有序logit模型,允许对每个模型使用不同的预测集。
让我们看看它如何工作
让我们使用虚构的日常香烟消费数据。codebook命令向我们显示了香烟消费的四个级别。
use https://www.stata-press.com/data/r17/tobacco
codebook tobacco
结果为:
. use https://www.stata-press.com/data/r17/tobacco(Fictional tobacco consumption data)
. ed
. desc
Contains data from https://www.stata-press.com/data/r17/tobacco.dta Observations: 15,000 Fictional tobacco consumption data Variables: 7 20 Apr 2020 14:47-------------------------------------------------------------------------------------------------------------------------------------Variable Storage Display Value name type format label Variable label-------------------------------------------------------------------------------------------------------------------------------------tobacco byte %27.0g tobaclbl Tobacco usageeducation byte %10.0g Amount of formal schooling (in years)income double %10.0g Annual income (in $10,000s)parent byte %17.0g parlbl Whether parents smokedfemale byte %10.0g femlbl Femaleage double %10.0g Age (in decades)religion byte %19.0g religlbl Religion prohibits smoking-------------------------------------------------------------------------------------------------------------------------------------Sorted by:
. codebook tobacco
-------------------------------------------------------------------------------------------------------------------------------------tobacco Tobacco usage-------------------------------------------------------------------------------------------------------------------------------------
Type: Numeric (byte) Label: tobaclbl
Range: [0,3] Units: 1 Unique values: 4 Missing .: 0/15,000
Tabulation: Freq. Numeric Label 9,469 0 0 cigarettes 3,806 1 1–7 cigarettes/day 1,050 2 8–12 cigarettes/day 675 3 >12 cigarettes/day
.
超过一半的受访者表示没有吸烟。0的报告可能有两个原因——因为应答者总是不吸烟的,或者因为应答者容易吸烟,但在收集数据的时间段内并不吸烟。传统的有序logit模型无法区分零香烟消费的两个原因。ZIOL模型允许我们除了建模消费水平之外,还可以建模易受吸烟影响的概率。
模型估计
我们使用ziologit来拟合ZIOL模型。我们将香烟消费水平作为教育(教育)、1万美元收入(收入)和性别(女性)的函数。我们指定了 inflate() 选项,将吸烟者的可能性建模为受调查者的教育、收入和父母是否吸烟(父母)的函数。
ziologit tobacco education income i.female, inflate(income education i.parent)
结果为:
. ziologit tobacco education income i.female, inflate(income education i.parent)
Iteration 0: log likelihood = -15977.364 (not concave)Iteration 1: log likelihood = -13149.83 (not concave)Iteration 2: log likelihood = -12467.245 Iteration 3: log likelihood = -11039.218 Iteration 4: log likelihood = -9929.2298 Iteration 5: log likelihood = -9715.1143 Iteration 6: log likelihood = -9703.2464 Iteration 7: log likelihood = -9703.2168 Iteration 8: log likelihood = -9703.2168
Zero-inflated ordered logit regression Number of obs = 15,000 Wald chi2(3) = 3147.70Log likelihood = -9703.2168 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ tobacco | Coefficient Std. err. z P>|z| [95% conf. interval]-------------+----------------------------------------------------------------tobacco | education | .5090816 .0094838 53.68 0.000 .4904938 .5276695 income | .583636 .0114401 51.02 0.000 .5612139 .6060581 | female | Female | -.5307721 .0580736 -9.14 0.000 -.6445943 -.4169499-------------+----------------------------------------------------------------inflate | income | -.1279677 .00705 -18.15 0.000 -.1417856 -.1141499 education | -.1412459 .0049693 -28.42 0.000 -.1509855 -.1315062 | parent | Smoking | 1.187864 .0529432 22.44 0.000 1.084097 1.29163 _cons | 2.617219 .1156891 22.62 0.000 2.390473 2.843966-------------+---------------------------------------------------------------- /cut1 | 5.85957 .104449 5.654853 6.064286 /cut2 | 11.14187 .1945483 10.76056 11.52318 /cut3 | 14.3632 .2495117 13.87417 14.85224------------------------------------------------------------------------------
.
表的第一部分标记为“烟草”,报告香烟消费的预定logit模型的结果。第二部分,标签为“膨胀”,报告的结果为logit模型的可能性成为一个吸烟者。
为了更容易地解释前两个部分的结果,我们要求ziologit显示优势比而不是系数。
ziologit, or
结果为:
. ziologit, or
Zero-inflated ordered logit regression Number of obs = 15,000 Wald chi2(3) = 3147.70Log likelihood = -9703.2168 Prob > chi2 = 0.0000
------------------------------------------------------------------------------ tobacco | Odds ratio Std. err. z P>|z| [95% conf. interval]-------------+----------------------------------------------------------------tobacco | education | 1.663763 .0157788 53.68 0.000 1.633122 1.694978 income | 1.792544 .0205068 51.02 0.000 1.752799 1.833191 | female | Female | .5881507 .034156 -9.14 0.000 .5248755 .659054-------------+----------------------------------------------------------------inflate | income | .8798818 .0062032 -18.15 0.000 .8678073 .8921242 education | .8682758 .0043147 -28.42 0.000 .8598602 .8767738 | parent | Smoking | 3.280066 .1736572 22.44 0.000 2.956768 3.638714 _cons | 13.69758 1.584661 22.62 0.000 10.91866 17.18378-------------+---------------------------------------------------------------- /cut1 | 5.85957 .104449 5.654853 6.064286 /cut2 | 11.14187 .1945483 10.76056 11.52318 /cut3 | 14.3632 .2495117 13.87417 14.85224------------------------------------------------------------------------------Note: Estimates are transformed only in the first 2 equations to odds ratios.Note: _cons estimates baseline odds.
.
年收入增加1万美元,吸烟几率会降低0.88倍(降低12%),但在吸烟者中,吸烟几率会增加1.79倍(增加79%)。这表明,较富裕的人不太可能吸烟,但如果他们决定吸烟,他们往往会吸更多的烟。
但就表现出不同吸烟行为的可能性而言,这些结果意味着什么呢?假设我们想知道香烟消费与收入水平的关系。为此,我们使用了margin命令。对于年收入为0美元、5万美元、10万美元、150美元、00美元和20万美元的人,我们估计了每种香烟消费水平的预期概率。
margins, at(income=(0(5)20))
结果为:
. margins, at(income=(0(5)20))
Predictive margins Number of obs = 15,000Model VCE: OIM
1._predict: Pr(tobacco=0), predict(pmargin outcome(0))2._predict: Pr(tobacco=1), predict(pmargin outcome(1))3._predict: Pr(tobacco=2), predict(pmargin outcome(2))4._predict: Pr(tobacco=3), predict(pmargin outcome(3))
1._at: income = 02._at: income = 53._at: income = 104._at: income = 155._at: income = 20
------------------------------------------------------------------------------ | Delta-method | Margin std. err. z P>|z| [95% conf. interval]-------------+----------------------------------------------------------------_predict#_at | 1 1 | .7428698 .0044443 167.15 0.000 .7341591 .7515805 1 2 | .6190759 .0038733 159.83 0.000 .6114843 .6266675 1 3 | .5168462 .0052057 99.29 0.000 .5066433 .5270492 1 4 | .526699 .0092168 57.15 0.000 .5086344 .5447636 1 5 | .6340465 .0138387 45.82 0.000 .6069232 .6611697 2 1 | .2121431 .0034296 61.86 0.000 .2054211 .2188651 2 2 | .2792459 .0033861 82.47 0.000 .2726092 .2858826 2 3 | .3042245 .0040212 75.65 0.000 .2963431 .312106 2 4 | .2226386 .0050478 44.11 0.000 .2127452 .232532 2 5 | .0633686 .0047963 13.21 0.000 .0539681 .0727692 3 1 | .0372614 .0014098 26.43 0.000 .0344983 .0400245 3 2 | .0737865 .0019981 36.93 0.000 .0698702 .0777027 3 3 | .1146585 .0029075 39.44 0.000 .1089599 .1203572 3 4 | .1351544 .0041403 32.64 0.000 .1270395 .1432693 3 5 | .138638 .0052133 26.59 0.000 .1284201 .1488559 4 1 | .0077257 .0005647 13.68 0.000 .0066189 .0088324 4 2 | .0278917 .0011614 24.01 0.000 .0256153 .030168 4 3 | .0642707 .002228 28.85 0.000 .0599038 .0686376 4 4 | .115508 .0045623 25.32 0.000 .1065661 .12445 4 5 | .1639469 .0085572 19.16 0.000 .147175 .1807188------------------------------------------------------------------------------
.
我们估计了许多期望概率。用边值图将结果可视化是很有帮助的。

吸0支烟的概率随着年收入的增加而降低,直到10万美元;然后,概率又逐渐增加。每天抽1-7支烟的概率在年收入10万美元时最高,在年收入20万美元时最低。
现在我们要研究收入和吸烟易感性之间的关系。我们将predict(ps) 选项添加到margin,以请求预测的易感性概率的估计。
quietly margins, predict(ps) at(income=(0(5)20))
marginsplot
结果为:

当收入为零时,五分之四的受访者容易吸烟。吸烟者的可能性随着收入的增加而降低,当年收入为20万美元时,只有略超过三分之一的受访者容易吸烟。这支持了收入可以作为健康意识的代理的解释。
接下来,我们使用边际来关注易受吸烟影响的受试者。通过在每个结果水平上指定统计pcond1,我们根据易感性计算烟草在每个水平上的概率。与前面一样,计算是在五个收入水平上进行的,并以边际图表示。
quietly margins, predict(pcond1 outcome(0)) predict(pcond1 outcome(1)) predict(pcond1 outcome(2)) predict(pcond1 outcome(3)) at(income=(0(5)20))
结果为:

当年收入为零时,超过一半的吸烟易感人群报告说没有吸烟,而那些吸烟的人最有可能每天只吸几支烟。随着收入的增加,零消费的可能性下降。年收入越高,成为烟瘾大的人的可能性就越大。这表明,在吸烟者中,香烟被视为经济学家所说的正常商品,即收入增加时需求增加的东西。
从这个例子可以看出,收入对香烟消费的影响是多方面的。ziologit命令使对吸烟易感性和吸烟强度建模成为可能,从而更好地了解影响吸烟行为的因素。
文章代码汇总:
*========================================
. * 高级计量经济学
. *========================================
.
.
. * 计量经济学服务中心
.
.
. *-------------------------------------------------------------------------------
. * 参考资料:
. * 《初级计量经济学及Stata应用:Stata从入门到进阶》
. * 《高级计量经济学及Stata应用:Stata回归分析与应用》
. * 《量化社会科学方法》
. * 《社会科学因果推断》
. * 《面板数据计量分析方法》
. * 《时间序列计量分析方法》
. * 《高级计量经济学及Eviews应用》
. * 《R、Python、Mtalab初高级教程》
. * 《空间计量入门:空间计量在Geoda、GeodaSpace中的应用》
. * 《零基础|轻松搞定空间计量:空间计量及GeoDa、Stata应用》
. * 《空间计量第二部:空间计量及Matlab应用课程》
. * 《空间计量第三部:空间计量及Stata应用课程》
. * 《空间计量第四部:《空间计量及ArcGis应用课程》
. * 《空间计量第五部:空间计量经济学》
. * 《空间计量第六部:《空间计量及Python应用》
. * 《空间计量第七部:《空间计量及R应用》
. * 《空间计量第八部:《高级空间计量经济学》
. *-------------------------------------------------------------------------------
.
.
.
. *-------------------------------------------------------------------------------
. *高级计量经济学
. *数量经济学&计量经济学服务中心
. *-------------------------------------------------------------------------------
.
. . use https://www.stata-press.com/data/r17/tobacco
(Fictional tobacco consumption data)
. ed
. desc
Contains data from https://www.stata-press.com/data/r17/tobacco.dta
Observations: 15,000 Fictional tobacco consumption data
Variables: 7 20 Apr 2020 14:47
-------------------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------------------------------------------
tobacco byte %27.0g tobaclbl Tobacco usage
education byte %10.0g Amount of formal schooling (in years)
income double %10.0g Annual income (in $10,000s)
parent byte %17.0g parlbl Whether parents smoked
female byte %10.0g femlbl Female
age double %10.0g Age (in decades)
religion byte %19.0g religlbl Religion prohibits smoking
-------------------------------------------------------------------------------------------------------------------------------------
Sorted by:
. codebook tobacco
-------------------------------------------------------------------------------------------------------------------------------------
tobacco Tobacco usage
-------------------------------------------------------------------------------------------------------------------------------------
Type: Numeric (byte)
Label: tobaclbl
Range: [0,3] Units: 1
Unique values: 4 Missing .: 0/15,000
Tabulation: Freq. Numeric Label
9,469 0 0 cigarettes
3,806 1 1–7 cigarettes/day
1,050 2 8–12 cigarettes/day
675 3 >12 cigarettes/day
. ziologit tobacco education income i.female, inflate(income education i.parent)
Iteration 0: log likelihood = -15977.364 (not concave)
Iteration 1: log likelihood = -13149.83 (not concave)
Iteration 2: log likelihood = -12467.245
Iteration 3: log likelihood = -11039.218
Iteration 4: log likelihood = -9929.2298
Iteration 5: log likelihood = -9715.1143
Iteration 6: log likelihood = -9703.2464
Iteration 7: log likelihood = -9703.2168
Iteration 8: log likelihood = -9703.2168
Zero-inflated ordered logit regression Number of obs = 15,000
Wald chi2(3) = 3147.70
Log likelihood = -9703.2168 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
tobacco | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
tobacco |
education | .5090816 .0094838 53.68 0.000 .4904938 .5276695
income | .583636 .0114401 51.02 0.000 .5612139 .6060581
|
female |
Female | -.5307721 .0580736 -9.14 0.000 -.6445943 -.4169499
-------------+----------------------------------------------------------------
inflate |
income | -.1279677 .00705 -18.15 0.000 -.1417856 -.1141499
education | -.1412459 .0049693 -28.42 0.000 -.1509855 -.1315062
|
parent |
Smoking | 1.187864 .0529432 22.44 0.000 1.084097 1.29163
_cons | 2.617219 .1156891 22.62 0.000 2.390473 2.843966
-------------+----------------------------------------------------------------
/cut1 | 5.85957 .104449 5.654853 6.064286
/cut2 | 11.14187 .1945483 10.76056 11.52318
/cut3 | 14.3632 .2495117 13.87417 14.85224
------------------------------------------------------------------------------
. ziologit, or
Zero-inflated ordered logit regression Number of obs = 15,000
Wald chi2(3) = 3147.70
Log likelihood = -9703.2168 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
tobacco | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
tobacco |
education | 1.663763 .0157788 53.68 0.000 1.633122 1.694978
income | 1.792544 .0205068 51.02 0.000 1.752799 1.833191
|
female |
Female | .5881507 .034156 -9.14 0.000 .5248755 .659054
-------------+----------------------------------------------------------------
inflate |
income | .8798818 .0062032 -18.15 0.000 .8678073 .8921242
education | .8682758 .0043147 -28.42 0.000 .8598602 .8767738
|
parent |
Smoking | 3.280066 .1736572 22.44 0.000 2.956768 3.638714
_cons | 13.69758 1.584661 22.62 0.000 10.91866 17.18378
-------------+----------------------------------------------------------------
/cut1 | 5.85957 .104449 5.654853 6.064286
/cut2 | 11.14187 .1945483 10.76056 11.52318
/cut3 | 14.3632 .2495117 13.87417 14.85224
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first 2 equations to odds ratios.
Note: _cons estimates baseline odds.
. margins, at(income=(0(5)20))
Predictive margins Number of obs = 15,000
Model VCE: OIM
1._predict: Pr(tobacco=0), predict(pmargin outcome(0))
2._predict: Pr(tobacco=1), predict(pmargin outcome(1))
3._predict: Pr(tobacco=2), predict(pmargin outcome(2))
4._predict: Pr(tobacco=3), predict(pmargin outcome(3))
1._at: income = 0
2._at: income = 5
3._at: income = 10
4._at: income = 15
5._at: income = 20
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_predict#_at |
1 1 | .7428698 .0044443 167.15 0.000 .7341591 .7515805
1 2 | .6190759 .0038733 159.83 0.000 .6114843 .6266675
1 3 | .5168462 .0052057 99.29 0.000 .5066433 .5270492
1 4 | .526699 .0092168 57.15 0.000 .5086344 .5447636
1 5 | .6340465 .0138387 45.82 0.000 .6069232 .6611697
2 1 | .2121431 .0034296 61.86 0.000 .2054211 .2188651
2 2 | .2792459 .0033861 82.47 0.000 .2726092 .2858826
2 3 | .3042245 .0040212 75.65 0.000 .2963431 .312106
2 4 | .2226386 .0050478 44.11 0.000 .2127452 .232532
2 5 | .0633686 .0047963 13.21 0.000 .0539681 .0727692
3 1 | .0372614 .0014098 26.43 0.000 .0344983 .0400245
3 2 | .0737865 .0019981 36.93 0.000 .0698702 .0777027
3 3 | .1146585 .0029075 39.44 0.000 .1089599 .1203572
3 4 | .1351544 .0041403 32.64 0.000 .1270395 .1432693
3 5 | .138638 .0052133 26.59 0.000 .1284201 .1488559
4 1 | .0077257 .0005647 13.68 0.000 .0066189 .0088324
4 2 | .0278917 .0011614 24.01 0.000 .0256153 .030168
4 3 | .0642707 .002228 28.85 0.000 .0599038 .0686376
4 4 | .115508 .0045623 25.32 0.000 .1065661 .12445
4 5 | .1639469 .0085572 19.16 0.000 .147175 .1807188
------------------------------------------------------------------------------
. quietly margins, predict(ps) at(income=(0(5)20))
.
.
.
. . marginsplot
1