Stata17:零膨胀有序Logit模型

引言

Stata的新ziologit命令适用于零膨胀有序logit模型。

有序logit回归用于对模,如症状严重程度记录为无、轻度、中度或严重。这种有序结果的有序分类反应进行建值越大表示级别越高,但数值无关紧要。

在某些情况下,数据中观察到的零(或最低类别中的值)比传统有序logit模型所期望的要多。一个0可能代表一种特征的缺失,而剩余的值则代表该特征的增加。可以观察到许多零,一些是因为个体没有该特征,一些是因为个体有该特征但表现出最低水平。例如:

在一项关于酒精消费的研究中,有些人说自己不喝酒是因为他们从不喝酒,而另一些人可能说自己不喝酒是因为他们在调查期间没有喝酒。

在一项旨在缩小肿瘤的治疗的临床试验中,结果代表没有改善、部分缓解或完全缓解。一个人可能没有表现出任何改善,因为肿瘤对治疗有抵抗性,或者因为肿瘤是可以治疗的,但在测量时没有萎缩。这种区别很重要,因为可治疗的肿瘤是更高剂量的良好候选。

在这样的上下文中,您可以使用零膨胀的有序logit (ZIOL)模型。ZIOL模型假设最低值的结果来自logit模型和有序logit模型,允许对每个模型使用不同的预测集。

让我们看看它如何工作

让我们使用虚构的日常香烟消费数据。codebook命令向我们显示了香烟消费的四个级别。

use https://www.stata-press.com/data/r17/tobacco
codebook tobacco

结果为:

 . use https://www.stata-press.com/data/r17/tobacco(Fictional tobacco consumption data)

. ed

. desc

Contains data from https://www.stata-press.com/data/r17/tobacco.dta Observations:        15,000                  Fictional tobacco consumption data    Variables:             7                  20 Apr 2020 14:47-------------------------------------------------------------------------------------------------------------------------------------Variable      Storage   Display    Value    name         type    format    label      Variable label-------------------------------------------------------------------------------------------------------------------------------------tobacco         byte    %27.0g     tobaclbl   Tobacco usageeducation       byte    %10.0g                Amount of formal schooling (in years)income          double  %10.0g                Annual income (in $10,000s)parent          byte    %17.0g     parlbl     Whether parents smokedfemale          byte    %10.0g     femlbl     Femaleage             double  %10.0g                Age (in decades)religion        byte    %19.0g     religlbl   Religion prohibits smoking-------------------------------------------------------------------------------------------------------------------------------------Sorted by: 

. codebook tobacco

-------------------------------------------------------------------------------------------------------------------------------------tobacco                                                                                                                 Tobacco usage-------------------------------------------------------------------------------------------------------------------------------------

                  Type: Numeric (byte)                 Label: tobaclbl

                 Range: [0,3]                         Units: 1         Unique values: 4                         Missing .: 0/15,000

            Tabulation: Freq.   Numeric  Label                        9,469         0  0 cigarettes                        3,806         1  1–7 cigarettes/day                        1,050         2  8–12 cigarettes/day                          675         3  >12 cigarettes/day

. 

超过一半的受访者表示没有吸烟。0的报告可能有两个原因——因为应答者总是不吸烟的,或者因为应答者容易吸烟,但在收集数据的时间段内并不吸烟。传统的有序logit模型无法区分零香烟消费的两个原因。ZIOL模型允许我们除了建模消费水平之外,还可以建模易受吸烟影响的概率。

模型估计

我们使用ziologit来拟合ZIOL模型。我们将香烟消费水平作为教育(教育)、1万美元收入(收入)和性别(女性)的函数。我们指定了 inflate() 选项,将吸烟者的可能性建模为受调查者的教育、收入和父母是否吸烟(父母)的函数。

ziologit tobacco education income i.female, inflate(income education i.parent)

结果为:

. ziologit tobacco education income i.female, inflate(income education i.parent)

Iteration 0:   log likelihood = -15977.364  (not concave)Iteration 1:   log likelihood =  -13149.83  (not concave)Iteration 2:   log likelihood = -12467.245  Iteration 3:   log likelihood = -11039.218  Iteration 4:   log likelihood = -9929.2298  Iteration 5:   log likelihood = -9715.1143  Iteration 6:   log likelihood = -9703.2464  Iteration 7:   log likelihood = -9703.2168  Iteration 8:   log likelihood = -9703.2168  

Zero-inflated ordered logit regression                 Number of obs =  15,000                                                       Wald chi2(3)  = 3147.70Log likelihood = -9703.2168                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------     tobacco | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]-------------+----------------------------------------------------------------tobacco      |   education |   .5090816   .0094838    53.68   0.000     .4904938    .5276695      income |    .583636   .0114401    51.02   0.000     .5612139    .6060581             |      female |     Female  |  -.5307721   .0580736    -9.14   0.000    -.6445943   -.4169499-------------+----------------------------------------------------------------inflate      |      income |  -.1279677     .00705   -18.15   0.000    -.1417856   -.1141499   education |  -.1412459   .0049693   -28.42   0.000    -.1509855   -.1315062             |      parent |    Smoking  |   1.187864   .0529432    22.44   0.000     1.084097     1.29163       _cons |   2.617219   .1156891    22.62   0.000     2.390473    2.843966-------------+----------------------------------------------------------------       /cut1 |    5.85957    .104449                      5.654853    6.064286       /cut2 |   11.14187   .1945483                      10.76056    11.52318       /cut3 |    14.3632   .2495117                      13.87417    14.85224------------------------------------------------------------------------------

. 

表的第一部分标记为“烟草”,报告香烟消费的预定logit模型的结果。第二部分,标签为“膨胀”,报告的结果为logit模型的可能性成为一个吸烟者。

为了更容易地解释前两个部分的结果,我们要求ziologit显示优势比而不是系数。

ziologit, or

结果为:

. ziologit, or

Zero-inflated ordered logit regression                 Number of obs =  15,000                                                       Wald chi2(3)  = 3147.70Log likelihood = -9703.2168                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------     tobacco | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]-------------+----------------------------------------------------------------tobacco      |   education |   1.663763   .0157788    53.68   0.000     1.633122    1.694978      income |   1.792544   .0205068    51.02   0.000     1.752799    1.833191             |      female |     Female  |   .5881507    .034156    -9.14   0.000     .5248755     .659054-------------+----------------------------------------------------------------inflate      |      income |   .8798818   .0062032   -18.15   0.000     .8678073    .8921242   education |   .8682758   .0043147   -28.42   0.000     .8598602    .8767738             |      parent |    Smoking  |   3.280066   .1736572    22.44   0.000     2.956768    3.638714       _cons |   13.69758   1.584661    22.62   0.000     10.91866    17.18378-------------+----------------------------------------------------------------       /cut1 |    5.85957    .104449                      5.654853    6.064286       /cut2 |   11.14187   .1945483                      10.76056    11.52318       /cut3 |    14.3632   .2495117                      13.87417    14.85224------------------------------------------------------------------------------Note: Estimates are transformed only in the first 2 equations to odds ratios.Note: _cons estimates baseline odds.

. 

年收入增加1万美元,吸烟几率会降低0.88倍(降低12%),但在吸烟者中,吸烟几率会增加1.79倍(增加79%)。这表明,较富裕的人不太可能吸烟,但如果他们决定吸烟,他们往往会吸更多的烟。

但就表现出不同吸烟行为的可能性而言,这些结果意味着什么呢?假设我们想知道香烟消费与收入水平的关系。为此,我们使用了margin命令。对于年收入为0美元、5万美元、10万美元、150美元、00美元和20万美元的人,我们估计了每种香烟消费水平的预期概率。

margins, at(income=(0(5)20))

结果为:

. margins, at(income=(0(5)20))

Predictive margins                                      Number of obs = 15,000Model VCE: OIM

1._predict: Pr(tobacco=0), predict(pmargin outcome(0))2._predict: Pr(tobacco=1), predict(pmargin outcome(1))3._predict: Pr(tobacco=2), predict(pmargin outcome(2))4._predict: Pr(tobacco=3), predict(pmargin outcome(3))

1._at: income =  02._at: income =  53._at: income = 104._at: income = 155._at: income = 20

------------------------------------------------------------------------------             |            Delta-method             |     Margin   std. err.      z    P>|z|     [95% conf. interval]-------------+----------------------------------------------------------------_predict#_at |        1 1  |   .7428698   .0044443   167.15   0.000     .7341591    .7515805        1 2  |   .6190759   .0038733   159.83   0.000     .6114843    .6266675        1 3  |   .5168462   .0052057    99.29   0.000     .5066433    .5270492        1 4  |    .526699   .0092168    57.15   0.000     .5086344    .5447636        1 5  |   .6340465   .0138387    45.82   0.000     .6069232    .6611697        2 1  |   .2121431   .0034296    61.86   0.000     .2054211    .2188651        2 2  |   .2792459   .0033861    82.47   0.000     .2726092    .2858826        2 3  |   .3042245   .0040212    75.65   0.000     .2963431     .312106        2 4  |   .2226386   .0050478    44.11   0.000     .2127452     .232532        2 5  |   .0633686   .0047963    13.21   0.000     .0539681    .0727692        3 1  |   .0372614   .0014098    26.43   0.000     .0344983    .0400245        3 2  |   .0737865   .0019981    36.93   0.000     .0698702    .0777027        3 3  |   .1146585   .0029075    39.44   0.000     .1089599    .1203572        3 4  |   .1351544   .0041403    32.64   0.000     .1270395    .1432693        3 5  |    .138638   .0052133    26.59   0.000     .1284201    .1488559        4 1  |   .0077257   .0005647    13.68   0.000     .0066189    .0088324        4 2  |   .0278917   .0011614    24.01   0.000     .0256153     .030168        4 3  |   .0642707    .002228    28.85   0.000     .0599038    .0686376        4 4  |    .115508   .0045623    25.32   0.000     .1065661      .12445        4 5  |   .1639469   .0085572    19.16   0.000      .147175    .1807188------------------------------------------------------------------------------

. 

我们估计了许多期望概率。用边值图将结果可视化是很有帮助的。

吸0支烟的概率随着年收入的增加而降低,直到10万美元;然后,概率又逐渐增加。每天抽1-7支烟的概率在年收入10万美元时最高,在年收入20万美元时最低。

现在我们要研究收入和吸烟易感性之间的关系。我们将predict(ps) 选项添加到margin,以请求预测的易感性概率的估计。

quietly margins, predict(ps) at(income=(0(5)20))

marginsplot

结果为:

当收入为零时,五分之四的受访者容易吸烟。吸烟者的可能性随着收入的增加而降低,当年收入为20万美元时,只有略超过三分之一的受访者容易吸烟。这支持了收入可以作为健康意识的代理的解释。

接下来,我们使用边际来关注易受吸烟影响的受试者。通过在每个结果水平上指定统计pcond1,我们根据易感性计算烟草在每个水平上的概率。与前面一样,计算是在五个收入水平上进行的,并以边际图表示。

quietly margins, predict(pcond1 outcome(0)) predict(pcond1 outcome(1)) predict(pcond1 outcome(2)) predict(pcond1 outcome(3)) at(income=(0(5)20))

结果为:

当年收入为零时,超过一半的吸烟易感人群报告说没有吸烟,而那些吸烟的人最有可能每天只吸几支烟。随着收入的增加,零消费的可能性下降。年收入越高,成为烟瘾大的人的可能性就越大。这表明,在吸烟者中,香烟被视为经济学家所说的正常商品,即收入增加时需求增加的东西。

从这个例子可以看出,收入对香烟消费的影响是多方面的。ziologit命令使对吸烟易感性和吸烟强度建模成为可能,从而更好地了解影响吸烟行为的因素。


文章代码汇总:

           *========================================
.           *           高级计量经济学
.           *========================================
.     
.       
.           *        计量经济学服务中心
.       

. *-------------------------------------------------------------------------------     
. *        参考资料:
. *        《初级计量经济学及Stata应用:Stata从入门到进阶》             
. *        《高级计量经济学及Stata应用:Stata回归分析与应用》
. *        《量化社会科学方法》
. *        《社会科学因果推断》
. *        《面板数据计量分析方法》
. *        《时间序列计量分析方法》
. *        《高级计量经济学及Eviews应用》
. *        《R、Python、Mtalab初高级教程》
. *        《空间计量入门:空间计量在Geoda、GeodaSpace中的应用》 
. *        《零基础|轻松搞定空间计量:空间计量及GeoDa、Stata应用》
. *        《空间计量第二部:空间计量及Matlab应用课程》
. *        《空间计量第三部:空间计量及Stata应用课程》
. *        《空间计量第四部:《空间计量及ArcGis应用课程》
. *        《空间计量第五部:空间计量经济学》
. *        《空间计量第六部:《空间计量及Python应用》
. *        《空间计量第七部:《空间计量及R应用》
. *        《空间计量第八部:《高级空间计量经济学》
. *-------------------------------------------------------------------------------



. *-------------------------------------------------------------------------------
. *高级计量经济学  
. *数量经济学&计量经济学服务中心
. *-------------------------------------------------------------------------------
.

. . use https://www.stata-press.com/data/r17/tobacco
(Fictional tobacco consumption data)

. ed

. desc

Contains data from https://www.stata-press.com/data/r17/tobacco.dta
 Observations:        15,000                  Fictional tobacco consumption data
    Variables:             7                  20 Apr 2020 14:47
-------------------------------------------------------------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------------------------------------------------------------
tobacco         byte    %27.0g     tobaclbl   Tobacco usage
education       byte    %10.0g                Amount of formal schooling (in years)
income          double  %10.0g                Annual income (in $10,000s)
parent          byte    %17.0g     parlbl     Whether parents smoked
female          byte    %10.0g     femlbl     Female
age             double  %10.0g                Age (in decades)
religion        byte    %19.0g     religlbl   Religion prohibits smoking
-------------------------------------------------------------------------------------------------------------------------------------
Sorted by:

. codebook tobacco

-------------------------------------------------------------------------------------------------------------------------------------
tobacco                                                                                                                 Tobacco usage
-------------------------------------------------------------------------------------------------------------------------------------

Type: Numeric (byte)
                 Label: tobaclbl

Range: [0,3]                         Units: 1
         Unique values: 4                         Missing .: 0/15,000

Tabulation: Freq.   Numeric  Label
                        9,469         0  0 cigarettes
                        3,806         1  1–7 cigarettes/day
                        1,050         2  8–12 cigarettes/day
                          675         3  >12 cigarettes/day

. ziologit tobacco education income i.female, inflate(income education i.parent)

Iteration 0:   log likelihood = -15977.364  (not concave)
Iteration 1:   log likelihood =  -13149.83  (not concave)
Iteration 2:   log likelihood = -12467.245  
Iteration 3:   log likelihood = -11039.218  
Iteration 4:   log likelihood = -9929.2298  
Iteration 5:   log likelihood = -9715.1143  
Iteration 6:   log likelihood = -9703.2464  
Iteration 7:   log likelihood = -9703.2168  
Iteration 8:   log likelihood = -9703.2168

Zero-inflated ordered logit regression                 Number of obs =  15,000
                                                       Wald chi2(3)  = 3147.70
Log likelihood = -9703.2168                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
     tobacco | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
tobacco      |
   education |   .5090816   .0094838    53.68   0.000     .4904938    .5276695
      income |    .583636   .0114401    51.02   0.000     .5612139    .6060581
             |
      female |
     Female  |  -.5307721   .0580736    -9.14   0.000    -.6445943   -.4169499
-------------+----------------------------------------------------------------
inflate      |
      income |  -.1279677     .00705   -18.15   0.000    -.1417856   -.1141499
   education |  -.1412459   .0049693   -28.42   0.000    -.1509855   -.1315062
             |
      parent |
    Smoking  |   1.187864   .0529432    22.44   0.000     1.084097     1.29163
       _cons |   2.617219   .1156891    22.62   0.000     2.390473    2.843966
-------------+----------------------------------------------------------------
       /cut1 |    5.85957    .104449                      5.654853    6.064286
       /cut2 |   11.14187   .1945483                      10.76056    11.52318
       /cut3 |    14.3632   .2495117                      13.87417    14.85224
------------------------------------------------------------------------------

. ziologit, or

Zero-inflated ordered logit regression                 Number of obs =  15,000
                                                       Wald chi2(3)  = 3147.70
Log likelihood = -9703.2168                            Prob > chi2   =  0.0000

------------------------------------------------------------------------------
     tobacco | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
tobacco      |
   education |   1.663763   .0157788    53.68   0.000     1.633122    1.694978
      income |   1.792544   .0205068    51.02   0.000     1.752799    1.833191
             |
      female |
     Female  |   .5881507    .034156    -9.14   0.000     .5248755     .659054
-------------+----------------------------------------------------------------
inflate      |
      income |   .8798818   .0062032   -18.15   0.000     .8678073    .8921242
   education |   .8682758   .0043147   -28.42   0.000     .8598602    .8767738
             |
      parent |
    Smoking  |   3.280066   .1736572    22.44   0.000     2.956768    3.638714
       _cons |   13.69758   1.584661    22.62   0.000     10.91866    17.18378
-------------+----------------------------------------------------------------
       /cut1 |    5.85957    .104449                      5.654853    6.064286
       /cut2 |   11.14187   .1945483                      10.76056    11.52318
       /cut3 |    14.3632   .2495117                      13.87417    14.85224
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first 2 equations to odds ratios.
Note: _cons estimates baseline odds.

. margins, at(income=(0(5)20))

Predictive margins                                      Number of obs = 15,000
Model VCE: OIM

1._predict: Pr(tobacco=0), predict(pmargin outcome(0))
2._predict: Pr(tobacco=1), predict(pmargin outcome(1))
3._predict: Pr(tobacco=2), predict(pmargin outcome(2))
4._predict: Pr(tobacco=3), predict(pmargin outcome(3))

1._at: income =  0
2._at: income =  5
3._at: income = 10
4._at: income = 15
5._at: income = 20

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
_predict#_at |
        1 1  |   .7428698   .0044443   167.15   0.000     .7341591    .7515805
        1 2  |   .6190759   .0038733   159.83   0.000     .6114843    .6266675
        1 3  |   .5168462   .0052057    99.29   0.000     .5066433    .5270492
        1 4  |    .526699   .0092168    57.15   0.000     .5086344    .5447636
        1 5  |   .6340465   .0138387    45.82   0.000     .6069232    .6611697
        2 1  |   .2121431   .0034296    61.86   0.000     .2054211    .2188651
        2 2  |   .2792459   .0033861    82.47   0.000     .2726092    .2858826
        2 3  |   .3042245   .0040212    75.65   0.000     .2963431     .312106
        2 4  |   .2226386   .0050478    44.11   0.000     .2127452     .232532
        2 5  |   .0633686   .0047963    13.21   0.000     .0539681    .0727692
        3 1  |   .0372614   .0014098    26.43   0.000     .0344983    .0400245
        3 2  |   .0737865   .0019981    36.93   0.000     .0698702    .0777027
        3 3  |   .1146585   .0029075    39.44   0.000     .1089599    .1203572
        3 4  |   .1351544   .0041403    32.64   0.000     .1270395    .1432693
        3 5  |    .138638   .0052133    26.59   0.000     .1284201    .1488559
        4 1  |   .0077257   .0005647    13.68   0.000     .0066189    .0088324
        4 2  |   .0278917   .0011614    24.01   0.000     .0256153     .030168
        4 3  |   .0642707    .002228    28.85   0.000     .0599038    .0686376
        4 4  |    .115508   .0045623    25.32   0.000     .1065661      .12445
        4 5  |   .1639469   .0085572    19.16   0.000      .147175    .1807188
------------------------------------------------------------------------------

.  quietly margins, predict(ps) at(income=(0(5)20))




. . marginsplot

1

(0)

相关推荐