快来使用ggheatmap强化你的热图吧!
创作原因
用法
参数
可视化
综合示例
结语
创作原因
目前最为常见的热图绘制R包,主要包括pheatmap
和ComplexHeatmap
(仅个人使用习惯)。它们强大的功能,基本可以满足所有科研人员的绘图需求。ggplot2
的操作灵活性和优秀度是毋庸置疑的,也因此许多绘图包都基本需要ggplot2
作为操作对象。为了开发一个基于ggplot2的热图绘制R包,我们开发了ggheatmap
包,主要为了解决热图的拼图问题以及热图与ggplot2
对象的灵活衔接。
链接:https://github.com/XiaoLuo-boy/ggheatmap
用法
参数
data 表达矩阵(data.frame/matrix) color 热图颜色,建议使用colorRampPalette生成 legendName 热图主体图例标题,默认为“Express” scale 数据标准化方式("none", "row" or "column") cluster_rows 是否对行聚类 cluster_cols 是否对列聚类 dist_method dist函数选用的方法,默认"euclidean"(可选用"euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski") hclust_method hclust函数选用的方法,默认“complete”(可选用"ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" ) text_angle_rows 行标签角度 text_angle_cols 列标签角度 text_color_rows 行标签颜色 text_color_cols 列标签颜色 text_face_rows 行标签字体 text_face_cols 列标签字体 text_just_rows 行标签位置调整(hjust, vjust) text_just_cols 列标签位置调整(hjust, vjust) text_show_rows 需要特定展示的行标签 text_show_cols 需要特定展示的列标签 text_position_rows 行标签位置 text_position_cols 列标签位置 annotation_cols 列注释数据 annotation_rows 行注释数据 annotation_color 注释颜色 annotation_width 注释宽度 show_cluster_cols 是否展示列聚类树 show_cluster_rows 是否展示行聚类树 cluster_num 聚类树分割数目(行,列) tree_height_rows 行聚类树高度 tree_height_cols 列聚类树高度 tree_color_rows 行聚类树颜色 tree_color_cols 列聚类树颜色 levels_rows 行标签顺序 levels_cols 列标签顺序
可视化
模拟数据
devtools::install_github("XiaoLuo-boy/ggheatmap")
library(ggheatmap)
library(pheatmap)
library(aplot)
set.seed(123)
df <- matrix(runif(600,0,10),ncol = 12)
colnames(df) <- paste("sample",1:12,sep = "")
rownames(df) <- sapply(1:50, function(x)paste(sample(LETTERS,3,replace = F),collapse = ""))
head(df)# sample1 sample2 sample3 sample4
# PIK 2.875775 0.4583117 5.999890 8.474532
# KSJ 7.883051 4.4220007 3.328235 4.975273
# WBF 4.089769 7.9892485 4.886130 3.879090
# TNY 8.830174 1.2189926 9.544738 2.464490
说明:由于是随机模拟数据,故每次运行的结果都有所不同。为了验证结果,可以利用
pheatmap
包进行验证绘图结果的准确性(不一定与本文相同)。
例1.行、列聚类
说明:默认不对行、列聚类
ggheatmap(df,cluster_rows = T,cluster_cols = T)
pheatmap(df,color = colorRampPalette(c( "#0073c2","white","#efc000"))(100))
例2.数据标准化
说明:默认不标准化
ggheatmap(df,cluster_rows = T,cluster_cols = T,scale = "column")
pheatmap(df,scale = "column",color = colorRampPalette(c( "#0073c2","white","#efc000"))(100))
例3.图例标题设置
说明:默认图例标题为“Express",NULL将去除图例标题
ggheatmap(df,cluster_rows = T,cluster_cols = T,legendName = "score")
ggheatmap(df,cluster_rows = T,cluster_cols = T,legendName = NULL)
例4.行、列标签设置
位置调整
说明:标签可以自由设置角度,以及水平和垂直调整。同时也支持更换标签的位置改变。
ggheatmap(df,cluster_rows = T,cluster_cols = T,text_angle_rows = 330,text_just_rows = c(1,0))
ggheatmap(df,cluster_rows = T,cluster_cols = T,text_angle_cols = 45,text_just_cols = c(1,1))
ggheatmap(df,cluster_rows = T,cluster_cols = T,text_position_cols = "top")
标签颜色字体设置
说明:标签的颜色或者字体可以是单个字符,也可以是多个同等长度的字符,以实现特定标签展示。比如特异展示标志基因等等
ggheatmap(df,cluster_rows = T,cluster_cols = T,text_color_cols = "red")
ggheatmap(df,cluster_rows = T,cluster_cols = T,text_color_cols = c(rep("black",6),"red",rep("black",5)))
ggheatmap(df,cluster_rows = T,cluster_cols = T,
text_color_rows = c("red",rep("black",49)),
text_face_rows = c("bold",rep("italic",49)))
特定标签的展示
说明:ggheatmap支持特定标签的展示,输入原数据存在的行名或者列名,即可展示所需要展示的基因。
text_rows <- sample(rownames(df),3)
text_rows# "XDV" "WFH" "ALR"
ggheatmap(df,scale = "row",cluster_rows = T,cluster_cols = T,
text_show_rows = text_rows)
指定行、列标签的顺序
说明:只有不进行行、列聚类时,该设置才有意义
ggheatmap(df,cluster_rows = F,cluster_cols = F)
ggheatmap(df,cluster_rows = F,cluster_cols = F,
levels_cols = c(paste("sample",1:12,sep = "")))
例5.聚类树的可视化
设定聚类树的高度
说明:show_cluster_cols或show_cluster_rows必须展示,高度才有设置的意义
ggheatmap(df,cluster_rows = T,cluster_cols = T,show_cluster_cols = F)
ggheatmap(df,cluster_rows = T,cluster_cols = T,tree_height_rows = 0.05,tree_height_cols = 0.05)
聚类树的颜色设定
说明:同上,show_cluster_cols或show_cluster_rows必须展示
ggheatmap(df,cluster_rows = T,cluster_cols = T,cluster_num = c(5,4))
ggheatmap(df,cluster_rows = T,cluster_cols = T,cluster_num = c(5,4),
tree_color_rows = c("#3B4992FF","#EE0000FF","#008B45FF","#631879FF","#008280FF"),
tree_color_cols = c("#0073C2FF", "#EFC000FF" ,"#868686FF", "#CD534CFF"))
注释
说明:注释的方式基本同pheatmap
row_metaData <- data.frame(exprtype=sample(c("Up","Down"),50,replace = T),
genetype=sample(c("Metabolism","Immune","None"),50,replace = T))
rownames(row_metaData) <- rownames(df)
col_metaData <- data.frame(tissue=sample(c("Normal","Tumor"),12,replace = T),
risklevel=sample(c("High","Low"),12,replace = T))
rownames(col_metaData) <- colnames(df)
exprcol <- c("#EE0000FF","#008B45FF" )
names(exprcol) <- c("Up","Down")
genecol <- c("#EE7E30","#5D9AD3","#D0DFE6FF")
names(genecol) <- c("Metabolism","Immune","None")
tissuecol <- c("#98D352","#FF7F0E")
names(tissuecol) <- c("Normal","Tumor")
riskcol <- c("#EEA236FF","#46B8DAFF")
names(riskcol) <- c("High","Low")
col <- list(exprtype=exprcol,genetype=genecol,tissue=tissuecol,risklevel=riskcol)
ggheatmap<- ggheatmap(df,cluster_rows = T,cluster_cols = T,scale = "row",
cluster_num = c(5,3),
tree_color_rows = c("#3B4992FF","#EE0000FF","#008B45FF","#631879FF","#008280FF"),
tree_color_cols = c("#1F77B4FF","#FF7F0EFF","#2CA02CFF"),
annotation_rows = row_metaData,
annotation_cols = col_metaData,
annotation_color = col
)
综合示例
情景:假设在绘制热图的时候,一方面,基因的数量太多;另一方面,你希望展示研究基因在不同样本的表达情况。
Example1
text_rows <- sample(rownames(df),5)
text_rows# "WFH" "TAC" "MLZ" "KSJ" "BPW"
ggheatmap(df,scale = "row",cluster_rows = T,cluster_cols = T,cluster_num = c(2,3),
text_show_rows = text_rows,
text_face_rows = "bold",
text_color_rows = "red",
text_angle_cols = 45,
text_just_cols = c(1,1),
tree_color_rows = c( "#3B4992FF","#EE0000FF"),
tree_color_cols = c("#1F77B4FF","#FF7F0EFF","#2CA02CFF")
)
Example2
dat <- data.frame(marker=sample(c(1,NA),50,replace = T),
gene=rownames(df),
shape=sample(c("T","F"),50,replace = T))
p <- ggplot(dat,aes(x=1,y=gene,size=marker,color=shape,shape=shape))+
geom_point()+theme_classic()+
scale_color_manual(values = c("#D2691E","#1E87D2"))+
theme(line = element_blank(),axis.text = element_blank(),axis.title = element_blank())+
guides(size = FALSE)
ggheatmap%>%insert_right(p,width = 0.1)
结语
本R包的优势是:1.比较灵活的标签设置;2.实现了heatmap与ggplot2绘图系统的联动,有助于拼图等操作;3.简单易于操作;4.聚类树可视化的优化。
本R依赖了ggplot2、ggpubr、aplot、factoextra、grDevices、stats、tibble、tidyr等R包。作为一名R语言爱好者和生信小白,真心地感激各位开发前辈的无私付出。秉承着“尊重知识,致敬原创”的原则,对于aplot包,作者已经获得余老师的许可。
此外,作为一名医学生,在学习生信和R语言的过程中,实实在在遇到过很多困难。在此特别感激“生信技能树”团队的健明老师及其他老师为众多生信小白的付出。无论是推文教程亦或线上交流群,还是公开课都能够见证各位老师对后辈的关心和照顾。