R绘图笔记 | 火山图的绘制
关于绘图,前面介绍了一些:
R绘图笔记 | 多数据系列的箱型图与带抖动散点的多数据系列箱型图
这里介绍火山图的绘制,火山图常常出现在芯片、转录组、蛋白组、代谢组等组学检测技术的结果中,并且通常伴随热图一起出现,文章中也很常见。
一.数据处理
如果你想获取该数据用于自己练习,下面是获取数据的地址:
https://docs.qq.com/sheet/DV3ZwZWl5UURjUGJi
数据格式是这样的。
二.绘图
1.读入数据
data <- read.csv("BioInfoNotesData4.csv",stringsAsFactors = F)
head(data)
我们画火山图,只需要其中的log2FC和FDR就可以了。在绘图之前,我们需要对FDR进行转换,将它的值变成-log10,如果有0,会产生Inf,需要去除。这样的话可以拉开差异表达基因之间的间距。在纵坐标上才可以很好的显示出来。上面的数据已经处理了。后面只需要取相应的列就行。
data <- data[,c(1,3,7)]
data$logP <- -log10(data$FDR)
colnames(data) <- c("Gene","Log2FC","FDR","logP")
head(data)
2.绘图
这里先介绍ggscatter函数绘图。
ggscatter(data, x, y, combine = FALSE, merge = FALSE,
color = "black", fill = "lightgray", palette = NULL, shape = 19,
size = 2, point = TRUE, rug = FALSE, title = NULL, xlab = NULL,
ylab = NULL, facet.by = NULL, panel.labs = NULL,
short.panel.labs = TRUE, add = c("none", "reg.line", "loess"),
add.params = list(), conf.int = FALSE, conf.int.level = 0.95,
fullrange = FALSE, ellipse = FALSE, ellipse.level = 0.95,
ellipse.type = "norm", ellipse.alpha = 0.1,
ellipse.border.remove = FALSE, mean.point = FALSE,
mean.point.size = ifelse(is.numeric(size), 2 * size, size),
star.plot = FALSE, star.plot.lty = 1, star.plot.lwd = NULL,
label = NULL, font.label = c(12, "plain"), font.family = "",
label.select = NULL, repel = FALSE, label.rectangle = FALSE,
cor.coef = FALSE, cor.coeff.args = list(), cor.method = "pearson",
cor.coef.coord = c(NULL, NULL), cor.coef.size = 4, ggp = NULL,
show.legend.text = NA, ggtheme = theme_pubr(), ...)
data:就是绘图的数据,x、y分别指定轴变量。palette是调色板,其实很多参数和之前的图形绘制是相通的。我不介绍,你自己查看一下帮助文档,后面用到的参数我会介绍。
ggscatter(data,x="Log2FC",y="logP") + theme_test()
但是这样太丑了,我们还是对数据分组,然后处理一下。一是高低表达的差异显示出来,二是突出极显著或者差异倍数大的基因。
data$GeneLab <- ""
sigDownGene <- data$Gene[(data$Log2FC < -5 & data$FDR < 0.005)]
sigUpGene <- data$Gene[(data$Log2FC > 4&data$FDR < 0.001)]
sigGene <- c(sigDownGene,sigUpGene)
data$GeneLab[match(sigGene,data$Gene)] <- sigGene
head(data)
在图中要显示点的名称,我们可以通过label设置,但点那么多,我只显示自己想要的,可以增加一列,不显示的全都是空值,要显示的有值(这里是基因名称),repel参数避免要显示的基因名称重叠,不好看。
ggscatter(data,x="Log2FC",y="logP",color="Group",
shape = 16,
label = data$GeneLab,
font.label=7,
xlab = "log2(Fold Change)",
ylab = "-log(pValue)",
#rug= T,
repel=T,
size = 1) + theme_test()
我还是觉得丑,可通过theme设置主题。
ggscatter(data,x="Log2FC",y="logP",color="Group",
shape = 16,
palette = c("#7B68EE", "#E0E0E0", "#FF4500"),
label = data$GeneLab,
font.label=7,
xlab = "log2(Fold Change)",
ylab = "log(pValue)",
#rug= T,
repel=T,
size = 1)+
theme(legend.title = element_blank(),
legend.text = element_text(size = 8, face = "bold"),
legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = "pt"),
legend.direction = "horizontal",
legend.position = c(0.5,0.93),
panel.background = element_rect(fill = "transparent",colour = "black"),
plot.background = element_rect(fill = "transparent",colour = "black"))
除了上面的函数以外,有一些包里面自带的火山图绘制函数,有的图也很不错。比如TCGAbiolinks包,这个包是用来分析TCGA数据库的数据的,很好用,该包的TCGAVisualize_volcano函数绘制的火山图也很好看。
library(TCGAbiolinks)##
volFig <- pdf("vol.pdf",width=10,height=6)
TCGAVisualize_volcano(data$Log2FC, data$FDR,title = "",
filename = volFig, xlab = "logFC",
names = data$Gene, show.names = "highlighted",
x.cut = 1, y.cut = 0.01,
highlight = data$Gene[which(abs(data$Log2FC) >=4.5)],
highlight.color = "orange")
dev.off()
这些绘图函数都是基于ggplot2包的,所以可以自定义很多元素。
volFig <- pdf("vol.pdf",width=10,height=8)
TCGAVisualize_volcano(data$Log2FC, data$FDR,title = "",
filename = volFig, xlab = "logFC",
names = data$Gene, show.names = "highlighted",
x.cut = 1, y.cut = 0.01,
highlight = data$Gene[which(abs(data$Log2FC) >=4.5)],
highlight.color = "orange") +
theme(legend.title = element_blank(),
legend.text = element_text(size = 8, face = "bold"),
legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = "pt"),
legend.direction = "horizontal",
legend.position = c(0.2,0.93),
panel.background = element_rect(fill = "transparent",colour = "black"),
plot.background = element_rect(fill = "transparent",colour = "black"))
dev.off()
参考资料:
数据来源:TCGA数据库