MethylMix识别甲基化驱动基因
六月份的一个学徒自己有甲基化项目数据要处理,所以开始自学一些R包了,甲基化芯片数据处理我是有视频课程的:
首先需要阅读我在生信技能树的甲基化系列教程,目录如下:
01-甲基化的一些基础知识.pdf 02-甲基化芯片的一般分析流程.pdf 03-甲基化芯片数据下载的多种技巧.pdf 04-甲基化芯片数据下载如何读入到R里面.pdf 05-甲基化芯片数据的一些质控指标.pdf 06-甲基化信号值矩阵差异分析哪家强.pdf 07-甲基化芯片信号值矩阵差异分析的标准代码.pdf 08-TCGA数据库的各个癌症甲基化芯片数据重新分析.pdf 09-TCGA数据库的癌症甲基化芯片数据重分析.pdf 10-TCGA数据辅助甲基化区域的功能研究.pdf 11-按基因在染色体上的顺序画差异甲基化热图.pdf 850K甲基化芯片数据的分析.pdf 使用DSS包多种方式检验差异甲基化信号区域.pdf
然后就可以看我在B站免费分享的视频课程《甲基化芯片(450K或者850K)数据处理 》
教学视频免费在:https://www.bilibili.com/video/BV177411U7oj 课程配套思维导图:https://mubu.com/doc/1cwlFgcXMg
介绍
当联合甲基化和转录组数据时,有不少文献直接用交集求出基因。但还有个更科学的方法------MethlMix包通过Beta混合模型(beta mixture model)识别低/高甲基化状态,并通过与对应基因的表达水平的相关性分析从而找到疾病中DNA甲基化驱动的基因
参考资料
代码:https://www.bioconductor.org/packages/release/bioc/vignettes/MethylMix/inst/doc/vignettes.html
https://www.bioconductor.org/packages/release/bioc/manuals/MethylMix/man/MethylMix.pdf
R包文献:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129298/
使用此包的文献:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8231008/ https://onlinelibrary.wiley.com/doi/10.1002/mc.23300 https://www.frontiersin.org/articles/10.3389/fcell.2020.576996/full
rm(list = ls())
options(stringsAsFactors = F)
#BiocManager::install("MethylMix")
library("MethylMix")
getData功能能够同时下载DNA甲基化数据(27K和450K)以及基因转录组数据,并自动进行预处理,包括缺失值处理以及批次校正,以及探针到映射基因的注释。 R包文件中有详细代码,但因为运行时间较长,就不从getData开始处理,直接用了R包原有的数据
library(doParallel)
data(METcancer)
data(METnormal)
data(GEcancer)
head(METcancer[, 1:4])#肿瘤甲基化位点
## TCGA.02.0001 TCGA.02.0003 TCGA.02.0006 TCGA.02.0007
## ERBB2 0.066976 0.056493 0.16083 0.059322
## FAAH 0.251890 0.193620 0.29216 0.856140
## FNDC3B 0.691420 0.563600 0.54231 0.306230
## FOXD1 0.084173 0.091946 0.19121 0.752810
## HOXD10 0.511220 0.505300 0.50494 0.717150
## IRX2 0.308940 0.622350 0.50465 0.725770
head(METnormal[, 1:4])#正常甲基化位点
## X4321207019_B X4321207026_C X4321207042_E X4321207042_K
## ERBB2 0.14834 0.105520 0.088934 0.084731
## FAAH 0.21961 0.279150 0.196890 0.220890
## FNDC3B 0.76382 0.784840 0.796010 0.752130
## FOXD1 0.13591 0.074815 0.085960 0.088573
## HOXD10 0.20063 0.166590 0.162870 0.200730
## IRX2 0.12246 0.087822 0.073123 0.104640
head(GEcancer[, 1:4])#肿瘤转录组数据
## TCGA.02.0001 TCGA.02.0003 TCGA.02.0006 TCGA.02.0007
## ERBB2 -0.191840 -1.24440 -0.026743 1.524600
## FAAH -0.075087 -0.70167 -0.580130 -1.237400
## FNDC3B 0.033780 -0.64668 -0.356230 -0.050206
## FOXD1 0.102790 0.11901 0.155030 -1.002300
## HOXD10 2.371300 3.07750 1.547000 2.330500
## IRX2 -1.575300 -3.83290 -1.599900 -4.688700
MethylMix
MethylationDrivers:得到driver gene(差异甲基化,转录调控)
NrComponents: 每个driver gene的对应的甲基化状态;
MixtureStates: 每个driver gene的差异甲基化值(即肿瘤样本与正常样本之间的平均甲基化的差值);
MethylationStates: 以样本为列,driver gene为行的差异甲基化值的矩阵;
Classifications: Matrix with integers indicating to which mixture component each cancer sample was assigned to, for each gene
Models:每个driver gene的beta mixture model参数;
MethylMixResults <- MethylMix(METcancer, GEcancer, METnormal)
## Found 251 samples with both methylation and expression data.
## Correlating methylation data with gene expression...
##
## Found 9 transcriptionally predictive genes.
##
## Starting Beta mixture modeling.
## Running Beta mixture model on 9 genes and on 251 samples.
## ERBB2 : 2 components are best.
## FAAH : 2 components are best.
## FOXD1 : 2 components are best.
## ME1 : 2 components are best.
## MGMT : 2 components are best.
## OAS1 : 2 components are best.
## SOX10 : 2 components are best.
## TRAF6 : 2 components are best.
## ZNF217 : 2 components are best.
MethylMixResults$MethylationDrivers
## [1] "ERBB2" "FAAH" "FOXD1" "ME1" "MGMT" "OAS1" "SOX10" "TRAF6"
## [9] "ZNF217"
MethylMixResults$NrComponents
## ERBB2 FAAH FOXD1 ME1 MGMT OAS1 SOX10 TRAF6 ZNF217
## 2 2 2 2 2 2 2 2 2
MethylMixResults$MixtureStates
## $ERBB2
## [,1]
## [1,] 0.000000
## [2,] 0.318555
##
## $FAAH
## [,1]
## [1,] 0.0000000
## [2,] 0.3973068
##
## $FOXD1
## [,1]
## [1,] 0.0000000
## [2,] 0.2217989
##
## $ME1
## [,1]
## [1,] 0.1418296
## [2,] 0.5834757
##
## $MGMT
## [,1]
## [1,] 0.0000000
## [2,] 0.3141419
##
## $OAS1
## [,1]
## [1,] -0.3391054
## [2,] 0.0000000
##
## $SOX10
## [,1]
## [1,] 0.0000000
## [2,] 0.2076643
##
## $TRAF6
## [,1]
## [1,] 0.0000000
## [2,] 0.2452776
##
## $ZNF217
## [,1]
## [1,] -0.1724214
## [2,] 0.1875210
MethylMixResults$MethylationStates[, 1:5]
## TCGA.02.0001 TCGA.02.0003 TCGA.02.0006 TCGA.02.0007 TCGA.02.0009
## ERBB2 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## FAAH 0.0000000 0.0000000 0.0000000 0.3973068 0.3973068
## FOXD1 0.0000000 0.0000000 0.2217989 0.2217989 0.2217989
## ME1 0.5834757 0.5834757 0.5834757 0.5834757 0.1418296
## MGMT 0.0000000 0.0000000 0.3141419 0.0000000 0.0000000
## OAS1 -0.3391054 0.0000000 -0.3391054 -0.3391054 0.0000000
## SOX10 0.2076643 0.2076643 0.2076643 0.2076643 0.2076643
## TRAF6 0.2452776 0.0000000 0.0000000 0.0000000 0.0000000
## ZNF217 -0.1724214 -0.1724214 -0.1724214 -0.1724214 -0.1724214
MethylMixResults$Classifications[, 1:5]
## TCGA.02.0001 TCGA.02.0003 TCGA.02.0006 TCGA.02.0007 TCGA.02.0009
## ERBB2 1 1 1 1 1
## FAAH 1 1 1 2 2
## FOXD1 1 1 2 2 2
## ME1 2 2 2 2 1
## MGMT 1 1 2 1 1
## OAS1 1 2 1 1 2
## SOX10 2 2 2 2 2
## TRAF6 2 1 1 1 1
## ZNF217 1 1 1 1 1
MethylMix_ModelGeneExpression
输出甲基化和基因表达呈显著负相关的的基因名
# CovariateData:一般不需要设置,如果样本来源于不同的组织类型,则可以设置
MethylMix_ModelGeneExpression(METcancer, GEcancer, CovariateData = NULL)
## Found 251 samples with both methylation and expression data.
## Correlating methylation data with gene expression...
##
## Found 9 transcriptionally predictive genes.
## [1] "ERBB2" "FAAH" "FOXD1" "ME1" "MGMT" "OAS1" "SOX10" "TRAF6"
## [9] "ZNF217"
# 输出甲基化和基因表达呈显著负相关的的基因名
# 可得到结果与MethylMix结果一样,不过后者可延伸出更多信息
MethylMix_PlotModel
输出不同状态甲基化位点的直方图和散点图
MethylMix_PlotModel("ERBB2", MethylMixResults, METcancer)
## $MixtureModelPlot
##
## $CorrelationPlot
## NULL
MethylMix_PlotModel("ERBB2", MethylMixResults, METcancer, GEcancer,METnormal)
## $MixtureModelPlot
##
## $CorrelationPlot
MethylMix_Predict 新数据集预测
this function predicts the mixture component for each new sample and driver gene
newMETData <- matrix(runif(length(METcancer)), nrow = nrow(METcancer))
rownames(newMETData) <- rownames(METcancer)
colnames(newMETData) <- paste0("sample", 1:ncol(METcancer))
head(newMETData)[,1:3]
## sample1 sample2 sample3
## ERBB2 0.9118238 0.2956229 0.5851311
## FAAH 0.4588131 0.9194034 0.9343991
## FNDC3B 0.1361638 0.5685186 0.9688875
## FOXD1 0.1810451 0.4918403 0.1564577
## HOXD10 0.9357963 0.5313914 0.4468645
## IRX2 0.1206382 0.1920351 0.1692510
# newMETData 行为基因或cpg,要与MethylMixResults有对应,但样本可不同
predictions <- MethylMix_Predict(newMETData, MethylMixResults)
predictions[,1:3]
## sample1 sample2 sample3
## ERBB2 2 2 2
## FAAH 2 2 2
## FOXD1 2 2 2
## ME1 1 2 2
## MGMT 2 1 2
## OAS1 1 2 2
## SOX10 1 2 1
## TRAF6 2 2 2
## ZNF217 2 2 2