2019年8月,我全网第一次翻译了著名医学杂志《新英格兰医学杂志》的统计学指南。如果两年过去,我重新进行了审视和修订,方便更多的读者进行学习和交流。
《新英格兰医学杂志》(NEJM)是临床研究排第一的一本期刊,医学相关领域的研究人员和学生都知道。它对包括临床试验和观察性研究都做了清晰的统计学分析和报告规定。中国目前医学研究论文的统计学规范严重滞后,NEJM的统计指南实属我们都应该好好学习的。
为了让大家了解《新英格兰医学杂志》统计分析指南全貌,郑老师凭着自己的统计学经验翻译了最新版统计分析指南。有兴致者可以了解下。全文包括三个内容:
I.总体要求
II.临床试验要求
III.观察性研究要求
I.总体要求:
The Methods section of all manuscripts should contain a brief description of sample size and power considerations for the study, as well as a brief description of the methods for primary and secondary analyses.所有稿件“方法部分”应该对研究进行一个简单的样本量和检验功效的描述,同时描述主要结局和次要结局的分析方法。The Methods section of all manuscripts should include a description of how missing data have been handled. Unless missingness is rare, a complete case analysis is generally not acceptable as the primary analysis and should be replaced by methods that are appropriate, given the missingness mechanism. Multiple imputation or inverse probability case weights can be used when data are missing at random; model-based methods may be more appropriate when missingness may be informative. For the Journal’s generalapproach to the handling of missing data in clinical trials please see Wareet al (N Engl J Med 2012;367:1353–1354).所有稿件的“方法部分”应该描述如何处理缺失数据。除非缺失非常罕见,否则只分析完整病例数据的研究是无法接受的。在这种情况下,应该基于缺失数据的机制来进行数据填补。多重填补或者逆向概率加权法可以用来应对随机缺失数据。如果缺失数据具有一定规律性(比如非随机缺失),应该采用基于模型的方法来进行处理。如何处理缺失数据,可见2012年本刊的方法学文章Wareet al (N Engl J Med2012;367:1353–1354).Significance tests should be accompanied by confidence intervals for estimated effect sizes, measures of association, or other parameters of interest. The confidence intervals should be adjusted to match any adjustment made to significance levels in the corresponding test.假设检验应该提供效应值、关联度、或者其它感兴趣结果指标以及置信区间。置信区间应该根据不同置信度来进行调整,不局限于0.05(比如两两比较的时候,当调整检验水准时,那么置信区间也要调整)Unless one-sided tests are required by study design, such as in noninferiority clinical trials, all reported P values should be two-sided. In general, P values larger than 0.01 should be reported to two decimal places, and those between 0.01 and 0.001 to three decimal places; P values smaller than 0.001 should be reported as P<0.001. Notable exceptions to this policy include P values arising from tests associated with stopping rules in clinical trials or from genome-wide association studies.除了少数研究设计类型(比如非劣效性临床研究),一般情况下所有的报告P值应该是双侧检验的P值。一般情况下,P值大于0.01时候应该保留2位小数(本人觉得3位也行,表格整齐点),如果在0.01到0.001之间,应该保留3位,如果小于0.001,应用P<0.001.表达。当然有些场合允许变通,比如有中断设计的临床试验或者全基因组关联性研究。Results should be presented with no more precision than is of scientific value and is meaningful given the available sample size. For example, measures of association, such as odds ratios, should ordinarily be reported to two significant digits. Results derived from models should be limited to the appropriate number of significant digits.结果应该除了科学、有意义的统计数值之外,不用提供更多的东西了。举个例子,比如报告关联性指标如OR值,应该报告两个重要的数值(OR值、可信区间或P值)。从模型(比如回归模型)得到的结果,应该限制于有限的重要的几个值(b值、SE值、统计量、P值,可信区间,最多这么几个,正常会更少)。II. For clinicaltrials: 临床试验的特殊要求:Original and final protocols and statistical analysis plans (SAPs) should be submitted along with the manuscript, as well as a table of amendments made to the protocol and SAP indicating the date of the change and its content.最初的和最终的研究设计方案以及相应的统计分析计划(SAPs)应该跟稿件一起递交,同时递交的还有在研究实施过程中对方案和统计计划的调整清单(包括日期和内容)。The analyses of the primary outcome in manuscripts reporting results of clinical trials should match the analyses prespecified in the original protocol, except in unusual circumstances. Analyses that do not conform to the protocol should be justified in the Methods section of the manuscript. The editors may ask for additional analyses that are not specified in the protocol。除非发生特殊的情况,临床试验的主要分析结果应该根据既定的研究方案和统计分析计划形成的。如果实际跟研究方案不一致,应该在稿件的方法中进行澄清,编委会可能会质疑并询问不在方案中的一些分析结果。When comparing outcomes in two or more groups in confirmatory analyses, investigators should use the testing procedures specified in the protocol and SAP to control overall type I error — for example, Bonferroni adjustments or prespecified hierarchical procedures. P values adjusted for multiplicity should be reported when appropriate and labeled as such in the manuscript. In hierarchical testing procedures, P values should be reported only until the last comparison for which the P value was statistically significant. P values for the first nonsignificant comparison andfor all comparisons there after should not be reported. For prespecified exploratory analyses, investigators should use methods for controlling false discovery rate described in the SAP — for example, Benjamini–Hochberg procedures.在验证性分析中,如果要进行多组比较,研究者应该采用统计分析计划所设计的控制一类错误的方法,比如Bonferroni adjustments 或事先制定的层次比较方法(例如序贯比较法)。多重比较的P值应该汇报出来。如果采用层次多重比较方法,应该只报最后一次有统计学意义的P值。第一次没有统计学意义的P值,以及接下来的两两比较都不用汇报了。(按这句话什么意思,临床试验验证性两两比较,可能根据研究设计,会按照顺序来,比如比较三组,先第一组和第二组比较,如果有意义,再比较第一次和第三组,如果没有意义,那么第二组和第三组不再进行比较了。因此只报道最后一次有统计学意义的P值)。When no method to adjust for multiplicity of inferences or controlling false discovery rate was specified in the protocol or SAP of aclinical trial, the report of all secondary and exploratory endpoints should be limited to point estimates of treatment effects with 95% confidence intervals. In such cases, the Methods section should note that the widths of the intervals have not been adjusted for multiplicity and that the inferences drawn may not be reproducible. No P values should be reported for these analyses.如果临床试验统计分析计划中没有写清楚多重比较时候采用何种的方法来调整一类错误,或者如何控制false discoveryrate,那么报告的所有次要和探索性结果中,只能报告处理效应值和95%置信区间。在这些情况下,“方法部分”要注意置信区间不要去调整检验水准,不要用P值来报告结果(这个是柳叶刀杂志最新版的重要修改,非预先设计的统计学方法,不再推荐报告P值)Please see Wanget al (N Engl J Med 2007;357:2189–2194) on recommended methods for analyzing subgroups. When the SAP prespecifies an analysis of certain subgroups, that analysis should conform to the method described in the SAP. If the study team believes a post hoc analysis of subgroups is important, the rationale for conducting that analysis should be stated. Post hoc analyses should be clearly labeled as post hoc in the manuscript.请注意Wang et al (NEngl J Med 2007;357:2189–2194) 建议的亚组分析方法。当然统计分析计划事先计划进行某一亚组分析的时候,所有的分析应该必须遵从。如果研究团队认为事后有必要进行无设计的亚组分析,那么必须阐明合理的理由,而且在报告中必须说明哪些是事后分析的结果。Forest plots are often used to present results from ananalysis of the consistency of a treatment effect across subgroups of factorsof interest. Such plots can be a useful display of estimated treatment effects across subgroups, and the editors recommend that they be included for important subgroups. If subgroups are small, however, formal inferences about the homogeneity of treatment effects may not be feasible. A list of P values for treatment by subgroup interactions is subject to the problems of multiplicity and has limited value for inference. Therefore, in most cases, no P values for interaction should be provided in the forest plots.一般会用森林图来表达不同亚组中干预效果的一致性情况。森林图对于表达不同亚组的效应是否一致非常有用,编委会建议所有报告应该针对一些重要因素开展亚组分析。如果亚组样本量非常小,之前关于治疗结果是否具有一致就没有意义了。所有亚组变量与治疗因素的交互作用分析P值不用报告,因为这个时候P值会遇到多重比较产生的问题,对于统计推断没有什么价值。If significance tests of safety outcomes (when not primary outcomes) are reported along with the treatment-specific estimates, no adjustment for multiplicity is necessary. Because information contained in the safety endpoints may signal problems within specific organ classes, the editors believe that the type I error rates larger than 0.05 are acceptable. Editors may request that P values be reported for comparisons of the frequency of adverse events among treatment groups, regardless of whether such comparisons were prespecified in the SAP.安全性的假设检验(如果不是主要效应指标)应该同时汇报,此时多重比较不再调整检验水准了。因为安全性指标是一个非常重要的不良反应指标,编委会认为一类错误大一点也不要紧,假阳性高一点也是可以接受。不良反应事件的比较应该要汇报P值,无论在统计计划中是否提及。When possible, the editors prefer that absolute eventcounts or rates be reported before relative risks or hazard ratios. The goal is to provide the reader with both the actual event frequency and the relative frequency. Odds ratios should be avoided, as they may overestimate the relative risks in many settings and be misinterpreted.有可能的话,编委会建议在报告HR或RR之前,用绝对数或者相对率指标报告阳性事件的结局,这样的目的是给作者一个绝对数概念和相对数发生概念。OR值应该避免,因为OR值会高估RR,甚至会被误解。Authors should provide a flow diagram in CONSORT format. The editors also encourage authors to submit all the relevant information included in the CONSORT checklist. Although all of this information may not be published with the manuscript, it should be provided in either the manuscriptor a supplementary appendix at the time of submission. The CONSORT statement, checklist, and flow diagram are available on the CONSORT website.作者需要按照CONSORT声明提供一张流程图。同时编委会鼓励按照CONSORT声明写出所有的信息和材料。有些时候,论文发表时候,不需要提及的内容,也可以采用补充材料的性质提供。CONSORT声明、流程图可以在CONSORT网站获取。III. For observational studies: 观察性研究特别要求:The validity of findings from observational studies depends on several important assumptions, including those relating to sample selection, measured and unmeasured confounding, and the adequacy of methods used to control for confounding. The Methods section of observational studies should describe how these and other relevant issues were managed in the design and analysis.观察性研究结果的可靠性依赖于若干个非常重要的假设前提,包括研究对象选择、可测和不可耻混杂偏倚、以及控制混杂偏倚的可靠的方法。因此“方法部分”必须要提及包括上述有关问题在设计和分析中如何实现的。If an observational study included a prespecified SAP with a description of hypotheses to be tested, a signed and dated version of that plan should be included with the manuscript submission. The Journal encourages authors to deposit SAPs for observational studies in one of the online repositories designed for this purpose.如果观察性研究也有事先的统计分析计划,那么这个版本的计划应该跟稿件同时递交。杂志社鼓励作者们将统计分析计划存放在某个在线存储平台中。When appropriate, observational studies should use prespecified accepted methods for controlling family-wise error rate or false discovery rate when multiple tests are conducted. In manuscripts reporting observational studies without a prespecified method for error control, summary statistics should be limited to point estimates and 95% confidence intervals. In such cases, the Methods section should note that the widths of the intervals have not been adjusted for multiplicity and that the inferences drawn from the inferences may not be reproducible. No P values should be reported for these analyses.观察性研究如果要进行多重比较,如果可以的话,应该采用事先设定好的方法来控制family-wise error rate 或false discovery rate,如果没有事先进行设计,而多重比较方法分析时,所有结果只能报告估计值和置信区间。同样P值是不应该报告出来的。因为基于非矫正一类错误的P值下的结论往往无法重现,意味着结论不可靠。If no prespecified analysis plan exists, the Methods section should provide an outline for the planned method of analysis, includingo Eligibility criteria for the selection of cases and method of sampling from the data, with a diagram as appropriate.o A description of the association or causal effect to be estimated and the rationale for this choice.o The prespecified method of analysis to draw inference about treatment or exposure effect or association.o如果事先没有分析计划,“方法”部分应该提供一个分析计划,包括:o研究对象合格标准和抽样方法,最好有流程图来说明过程;o事先确定的探讨治疗效应或者暴露效应的统计学方法。Studies reporting the effect of a treatment or exposure should show the distribution of potential confounders and other variables, stratified by exposure or intervention group. When the analysis depends on the confounders being balanced by exposure group, differences between groups shouldbe summarized with point estimates and 95% confidence intervals when appropriate.报告治疗效应或者暴露效应该照暴露因素或治疗因素分组进行比较,以展示展示可能的存在的混杂因素。混杂因素变量在暴露组和对照组的组间差异性以及95%置信区间应该也报道出来。Complex models and their diagnostics can often be best described in a supplementary appendix. Authors are encouraged to conduct analysis that quantifies potential sensitivity to bias from unmeasured confounding; absent that, authors must provide a discussion of potential biases induced by unmeasured confounders.详细复杂的模型结果和对模型的诊断结果可以放在附录中,鼓励作者开展敏感性分析来探讨不可测混杂因素,如果没有,必须在讨论里面涉及不可测混杂因素的影响。Authors are encouraged to retest findings in a similarbut independent study or studies to assess the robustness of their findings.鼓励作者尝试在类似的研究中重复当前结果,以确认结果的稳定性。本公众号作为医学数据分析公众号,提供一些免费医学统计学学习资源下载,欢迎点击下载。2021年,我们召集了一批富有经验的高校专业队伍,着手举行短期统计课程培训班。如果您有需求,不妨点击查看: