使用R包genefu来根据基因集进行表达谱分类

学习使用genefu这个包,首先需要安装它!source("http://bioconductor.org/biocLite.R")options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")biocLite("genefu")教程略微有点复杂:https://rdrr.io/bioc/genefu/f/inst/doc/genefu.pdf这个包自带了5个乳腺癌芯片数据集,需要了解他们,可以看我在生信技能树发的教程:https://vip.biotrainee.com/d/689-5包里面自带的数据集也非常多,可以在https://rdrr.io/bioc/genefu/man/ 查看列表,比较重要的如下:gene70Function to compute the 70 genes prognosis profile (GENE70)…gene76Function to compute the Relapse Score as published by Wang et…mod1Gene modules published in Desmedt et al. 2008mod2Gene modules published in Wirapati et al. 2008scmgene.robustSubtype Clustering Model using only ESR1, ERBB2 and AURKA…scmod1.robustSubtype Clustering Model using ESR1, ERBB2 and AURKA modules…scmod2.robustSubtype Clustering Model using ESR1, ERBB2 and AURKA modules…sig.endoPredictSignature used to compute the endoPredict signature as…sig.gene70Signature used to compute the 70 genes prognosis profile…sig.gene76Signature used to compute the Relapse Score (GENE76) as…sig.geniusGene Expression progNostic Index Using Subtypes (GENIUS) as…sig.ggiGene expression Grade Index (GGI) as published in Sotiriou et…sig.oncotypedxSignature used to compute the OncotypeDX signature as…pam50PAM50 classifier for identification of breast cancer…pik3cagsFunction to compute the PIK3CA gene signature (PIK3CA-GS)上面列出的这些数据集都是可以打开看的:library(genefu)data(pam50)data(pam50.scale)data(pam50.robust)data(scmod2.robust)pam50str(scmod2.robust, max.level=1)最重要的功能就是根据已知基因集来对乳腺癌进行分子分型所有的分型都是用molecular.subtyping函数,预背了很多可以进行乳腺癌进行分子分型的基因集,比如大名鼎鼎的PAM50,下面是演示:rm(list = ls())library(breastCancerMAINZ)library(breastCancerTRANSBIG)library(breastCancerUPP)library(breastCancerUNT)library(breastCancerNKI)data(transbig)dd=transbigcinfo <- colnames(pData(dd))cinfoddata <- t(exprs(dd))dannot <- featureData(dd)@dataddemo <- phenoData(dd)@datalibrary(genefu)SubtypePredictions<-molecular.subtyping(sbt.model = "scmod2",data = ddata,annot = dannot,do.mapping = TRUE)table(SubtypePredictions$subtype)ddemo$SCMOD2<-SubtypePredictions$subtypePAM50Preds<-molecular.subtyping(sbt.model = "pam50",data=ddata,annot=dannot,do.mapping=TRUE)table(PAM50Preds$subtype)ddemo$PAM50<-PAM50Preds$subtype分类结果如下:> table(SubtypePredictions$subtype)ER+/HER2- High Prolif  ER+/HER2- Low Prolif             ER-/HER2-                 HER2+50                    77                    49                    22> table(PAM50Preds$subtype)Basal   Her2   LumB   LumA Normal45     26     45     78      4需要懂这个数据集才能理解,还需要有一点乳腺癌研究背景知识。成功分类后的信息,就可以用来做生存分析# http://www.inside-r.org/r-doc/survival/survfit.coxphlibrary(survival)data.for.survival.SCMOD2 <- ddemo[,c("e.os", "t.os", "SCMOD2","age")]data.for.survival.PAM50 <- ddemo[,c("e.os", "t.os", "PAM50","age")]# Remove patients with missing survival informationdata.for.survival.SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),]data.for.survival.PAM50 <- data.for.survival.PAM50[complete.cases(data.for.survival.PAM50),]days.per.month <- 30.4368days.per.year <- 365.242data.for.survival.PAM50$months_to_death <- data.for.survival.PAM50$t.os / days.per.monthdata.for.survival.PAM50$vital_status <- data.for.survival.PAM50$e.os == "1"surv.obj.PAM50 <- survfit(Surv(data.for.survival.PAM50$months_to_death,data.for.survival.PAM50$vital_status) ~ data.for.survival.PAM50$PAM50)data.for.survival.SCMOD2$months_to_death <- data.for.survival.SCMOD2$t.os / days.per.monthdata.for.survival.SCMOD2$vital_status <- data.for.survival.SCMOD2$e.os == "1"surv.obj.SCMOD2 <- survfit(Surv(data.for.survival.SCMOD2$months_to_death,data.for.survival.SCMOD2$vital_status) ~ data.for.survival.SCMOD2$SCMOD2)message("KAPLAN-MEIR CURVE - USING PAM50")plot(main = "Surival Curves SCMOD2", surv.obj.SCMOD2,col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3,xlab = "Time (months)",ylab = "Probability of Survival")legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c"),legend = c("Basal","Her2","LumA","LumB"),bty = "n")plot(main = "Surival Curves PAM50", surv.obj.PAM50,col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3,xlab = "Time (months)",ylab = "Probability of Survival")legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),legend = c("Basal","Her2","LumA","LumB","Normal"),bty = "n")也可以把这两个分类画在一张图上,或者添加统计学显著指标,你们自己尝试一下哦,我在生信技能树论坛也多次提到过,就不再赘述咯。

提示一下,可以使用 Cross-validated Partial Likelihood (cvpl) 模型来检验两个分类方法的预后判断情况。更多分类标准前面我们提到过,这个包最大的优点就是内置了一系列分类指标,如下;Subtype Clustering Model using just the AURKA gene: scmgene.robust()Subtype Clustering Model using just the ESR1 gene: scmgene.robust() Subtype Clustering Model using just the ERBB2 gene: scmgene.robust()NPI: npi()GGI: ggi()GENIUS: genius()EndoPredict: endoPredict()OncotypeDx: oncotypedx()TamR: tamr()GENE70: gene70()PIK3CA: pik3cags()rorS: rorS()基本上是,可以想怎么分类,就怎么分类了。

(0)

相关推荐