标准是需要因地制宜的
根据inferCNV结果进行打分
主要是读取一个文件 infercnv.observations.txt , 然后根据里面的的数值进行归类,阈值是 0.3,0.7,1.3,1.5,2 分别代表拷贝数缺失或者扩展的程度界限。其中位于 0.7-1.5之间就是拷贝数正常。全部的代码如下:
cnv_table <- read.table("inferCNV_output/infercnv.observations.txt", header=T)
# Score cells based on their CNV scores
# Replicate the table
cnv_score_table <- as.matrix(cnv_table)
cnv_score_mat <- as.matrix(cnv_table)
# Scoring
cnv_score_table[cnv_score_mat > 0 & cnv_score_mat < 0.3] <- "A" #complete loss. 2pts
cnv_score_table[cnv_score_mat >= 0.3 & cnv_score_mat < 0.7] <- "B" #loss of one copy. 1pts
cnv_score_table[cnv_score_mat >= 0.7 & cnv_score_mat < 1.3] <- "C" #Neutral. 0pts
cnv_score_table[cnv_score_mat >= 1.3 & cnv_score_mat <= 1.5] <- "D" #addition of one copy. 1pts
cnv_score_table[cnv_score_mat > 1.5 & cnv_score_mat <= 2] <- "E" #addition of two copies. 2pts
cnv_score_table[cnv_score_mat > 2] <- "F" #addition of more than two copies. 2pts
# Check
table(cnv_score_table[,1])
# Replace with score
cnv_score_table_pts <- cnv_table
rm(cnv_score_mat)
#
cnv_score_table_pts[cnv_score_table == "A"] <- 2
cnv_score_table_pts[cnv_score_table == "B"] <- 1
cnv_score_table_pts[cnv_score_table == "C"] <- 0
cnv_score_table_pts[cnv_score_table == "D"] <- 1
cnv_score_table_pts[cnv_score_table == "E"] <- 2
cnv_score_table_pts[cnv_score_table == "F"] <- 2
可以看到,这个 阈值是 0.3,0.7,1.3,1.5,2 实际上是我们示例数据集里面的,是一个基于Smart-seq2的单细胞转录组数据哦。
Smart-seq2和10X的代码差异
其实仔细读文档, 就可以看到:cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
# cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
infercnv_obj2 = infercnv::run(infercnv_obj,
cutoff=0.1,
out_dir= "infercnv_output", # dir is auto-created for storing outputs
cluster_by_groups=F, # cluster
hclust_method="ward.D2", plot_steps=F)
infercnv_obj1 = infercnv::run(infercnv_obj,
cutoff= 1 , #
out_dir= "Smart-seq2", # dir is auto-created for storing outputs
cluster_by_groups=F, # cluster
hclust_method="ward.D2", plot_steps=F)
这两个代码,跑出来的图表是有差异的哦!如下所示:
如果我们的数据就来源于 10X技术,那么使用 cutoff=1 works well for Smart-seq2 就确实是不适合的,但是呢,在计算cnv_score的时候,很明显阈值 0.3,0.7,1.3,1.5,2 需要因地制宜了。
那么阈值修改为多少合适呢
这个问题非常有意思,具体的话,我需要摸索后才能获得,如果大家有比较好的解决方案也欢迎交流,发邮件给我或者留言均可,我的邮箱地址是 jmzeng1314@163.com