标准化率（standardized rate）R实现 / 开普饭

1. 基本介绍

标化率，全称是标准化率（standardized rate），是流行病学中常见的一个指标，当几个比较组之间的年龄、性别等变量的构成不同时，此时直接比较组间的粗率（crude rate）容易导致偏倚，通常需要对率做标准化（standardization）后再比较。

标化率中心思想：利用某一指定的标准人口构成，消除不同地区在人口构成指标（年龄、性别等）方面的差别，即计算按标准人口构成校准后的总率。标准人口应该选择有代表性的、较稳定的、数量较大的人群，如全国、全世界、全省的人口为标准人口，时间也最好与被标化资料一致或接近。

标准化率的方法：主要有两种，即直接法和间接法。
直接法是根据一个标准人口（如全国、全省人口或合并人口等）构成，重新计算各组的预期率，从而得到标准化率。直接法需要已知各组的人口构成和相应的率（如患病率、死亡率等），以及标准人口构成。
间接法是根据标准患病率（或死亡率、发病率等）及各组的人口构成来计算预期率，从而得到标准化率。间接法需要已知各组的人口构成以及标准人口患病率（或死亡率、发病率等）。

要点：

	直接法	间接法
思路	调整原因，即人口构成的差异	调整结果，即死亡率等
实现	利用标准人口构成调整死亡率等	利用标准死亡率调整粗死亡率
计算	调整率=标准人口年龄构成（即年龄组别占比）× 各年龄组别实际死亡率	调整率=各年龄组别实际死亡率（即粗死亡率）×总标准死亡率/总预期死亡率

注：
总标准死亡率=标准人口年龄构成（即年龄组别占比）× 各年龄组别标准死亡率
总预期死亡率=各年龄组别标准死亡率 × 实际人口年龄构成

2. 直接法：计算年龄调整的标准化率

假设某地区5个年龄组的HIV感染人数与对应年龄组的人口数。计算该地区的HIV的年龄标化率。

library(tidyverse) library(epitools) df=tibble(age_group=c("<1", "1-4", "5-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65-74", "75-84", "85+"), case= c(141, 926, 1253, 1080, 1869, 4891, 14956, 30888, 41725, 26501, 5928), pop=c(1784033, 7065148, 15658730, 10482916, 9939972, 10563872, 9114202, 6850263, 4702482, 1874619, 330915), standard_pop=c(906897, 3794573, 10003544, 10629526, 9465330, 8249558, 7294330, 5022499, 2920220, 1019504, 142532))

DT::datatable(df)

2.1 HIV粗感染率（Crude Rates）

case/pop=CrudeRate;可以通过mutate来计算

# 1.直接法：计算年龄调整的标准化率---- # 1.1 HIV粗感染率（Crude Rates）---- # case/pop=CrudeRate;可以通过mutate来计算 df %>% mutate(CrudeRate=case/pop ) # A tibble: 11 x 5 # age_group case pop standard_pop CrudeRate # 1 <1 141 1784033 906897 0.0000790 # 2 1-4 926 7065148 3794573 0.000131 # 3 5-14 1253 15658730 10003544 0.0000800 # 4 15-24 1080 10482916 10629526 0.000103 # 5 25-34 1869 9939972 9465330 0.000188 # 6 35-44 4891 10563872 8249558 0.000463 # 7 45-54 14956 9114202 7294330 0.00164 # 8 55-64 30888 6850263 5022499 0.00451 # 9 65-74 41725 4702482 2920220 0.00887 # 10 75-84 26501 1874619 1019504 0.0141 # 11 85+ 5928 330915 142532 0.0179

2.2 HIV年龄标化率(Adjusting the Rates)

首先需要通过 standard_pop标准人口来计算各个年龄组的比例，这个standard_pop可以根据某省或者WHO的标准，主要目的是获取不同年龄组所占总人口比例。
(1) 计算各年龄组的人数proportion

prop.table可以计算年龄组的proportion；确保proportion 总和为1。

(2) 计算年龄组调整的率

只需将每个年龄组的原始case乘以该年龄组的proportion即可。（由于proportion均小于1，因此HIV的年龄标化率是各个年龄组调整后的累计效应）# 1.2 HIV年龄标化率（Adjusting the Rates）---- # 首先需要通过 standard_pop标准人口来计算各个年龄组的比例，这个standard_pop可以根据某省或者WHO的标准，主要目的是获取不同年龄组所占总人口比例。 (a=df %>% mutate(CrudeRate=case/pop, proportion=prop.table(standard_pop), Adjust_rates=CrudeRate*proportion)) # A tibble: 11 x 7 # age_group case pop standard_pop CrudeRate proportion Adjust_rates # 1 <1 141 1784033 906897 0.0000790 0.0153 0.00000121 # 2 1-4 926 7065148 3794573 0.000131 0.0638 0.00000837 # 3 5-14 1253 15658730 10003544 0.0000800 0.168 0.0000135 # 4 15-24 1080 10482916 10629526 0.000103 0.179 0.0000184 # 5 25-34 1869 9939972 9465330 0.000188 0.159 0.0000299 # 6 35-44 4891 10563872 8249558 0.000463 0.139 0.0000642 # 7 45-54 14956 9114202 7294330 0.00164 0.123 0.000201 # 8 55-64 30888 6850263 5022499 0.00451 0.0845 0.000381 # 9 65-74 41725 4702482 2920220 0.00887 0.0491 0.000436 # 10 75-84 26501 1874619 1019504 0.0141 0.0171 0.000242 # 11 85+ 5928 330915 142532 0.0179 0.00240 0.0000429


# CrudeRate

100000*(sum(a$case)/sum(a$pop))

# Adjust_rates 100000*sum(a$Adjust_rates)

根据该计算方式；可以得出 CrudeRate =166.0874; Adjust_rates=143.9176

2.3 标化率置信区间

借助于epitools：

# ageadjust.direct: Age standardization by direct method, with exact confidence intervals # Original function written by TJ Aragon, based on Anderson, 1998. Function re-written and improved by MP Fay, based on Fay 1998. # 以不同年龄组所占总人口比例为标准 asr = ageadjust.direct(count = df$case, pop = df$pop, stdpop = df$standard_pop) round(100000*asr, 2) ##rate per 100,000 per year

2.4 示例2

# 示例2： ## Data from Fleiss, 1981, p. 249 population <- c(230061, 329449, 114920, 39487, 14208, 3052, 72202, 326701, 208667, 83228, 28466, 5375, 15050, 175702, 207081, 117300, 45026, 8660, 2293, 68800, 132424, 98301, 46075, 9834, 327, 30666, 123419, 149919, 104088, 34392, 319933, 931318, 786511, 488235, 237863, 61313) population <- matrix(population, 6, 6, dimnames = list(c("Under 20", "20-24", "25-29", "30-34", "35-39", "40 and over"), c("1", "2", "3", "4", "5+", "Total"))) population count <- c(107, 141, 60, 40, 39, 25, 25, 150, 110, 84, 82, 39, 3, 71, 114, 103, 108, 75, 1, 26, 64, 89, 137, 96, 0, 8, 63, 112, 262, 295, 136, 396, 411, 428, 628, 530) count <- matrix(count, 6, 6, dimnames = list(c("Under 20", "20-24", "25-29", "30-34", "35-39", "40 and over"), c("1", "2", "3", "4", "5+", "Total"))) count


### Use average population as standard

standard<-apply(population[,-6], 1, mean) #population[,6]/5

standard

### This recreates Table 1 of Fay and Feuer, 1997 re <- round(10^5*t(sapply(1:5, function(x){ageadjust.direct(count[,x],population[,x],stdpop=standard)})),1) rownames(re) <- 1:5 re # 1 2 3 4 5+ Total # Under 20 107 25 3 1 0 136 # 20-24 141 150 71 26 8 396 # 25-29 60 110 114 64 63 411 # 30-34 40 84 103 89 112 428 # 35-39 39 82 108 137 262 628 # 40 and over 25 39 75 96 295 530

3. 间接法计算

这里增加了一列数据standard_pop人口的各个感染病例数：standard_case；这样就相当于两个地区各年龄组都有HIV的发病数。合并两个地区的pop计算调整的年龄标化率。

df$standard_case=c(45, 201, 320, 670, 1126, 3160, 9723, 17935, 22179, 13461, 2238)


##implement indirect age standardization using 'ageadjust.indirect'

asr = ageadjust.indirect(count = df$case, pop = df$pop, stdcount = df$standard_case, stdpop = df$standard_pop)
round(asr$sir, 2) ##standarized incidence ratio

# observed exp sir lci uci

# 130158.00 109126.69 1.19 1.19 1.20

round(100000*asr$rate, 1) ##rate per 100,000 per year

# crude.rate adj.rate lci uci

# 166.1 142.6 141.8 143.3

##需明确选定的标准人群 asr = ageadjust.indirect(count = df$standard_case, pop = df$standard_pop,stdcount = df$case, stdpop = df$pop) round(asr$sir, 2) ##standarized incidence ratio round(100000*asr$rate, 1) ##rate per 100,000 per year # crude.rate adj.rate lci uci # 119.5 137.9 136.9 139.0

参考阅读：
(1) 标准化率： https://blog.csdn.net/xiaohukun/article/details/76603387
(2) R 计算年龄标化率（Age Adjusted Rates）：https://www.jianshu.com/p/08faaa41fab6
(3) Age Adjusted Rates - Steps for Calculating： https://rpubs.com/bpoulin-CUNY/321735

标准化率（standardized rate）R实现​

相关推荐

标准化率（standardized rate）R实现