硬核干货：如果样本量不一一样多，或者不是一一对应关系，如何做差异？相关？ / 开普饭

写在前面

因为我们测定的样本不总是可以匹配上的，但是最少的样本也不能太少，我们测定的三个样本做相关其实还是被质疑，但是测定了5或者6个重复，这个时候直接将样本比较多的样本过滤掉比较少的样本不就可以了吗？这样做一次是没有什么意义的，但是如果将这个过程重复多次，让每个重复都使用的上，就会大大增加结果的可行性。当然审稿人就可以承认了。

例如下面这个文章

参考原文 ismej Metabolic regulation of the maize rhizobiome by benzoxazinoids

作者信息：T. E. Anne Cotton1,2 ● Pierre Pétriacq1,2,3,5 ● Duncan D. Cameron1,2 ● Moaed Al Meselmani1,2,4 ● Roland Schwarzenbacher1,2 ● Stephen A. Rolfe 1,2 ● Jurriaan Ton 1,2

发表：2019年

作者亲自书写：

The correlations are calcuated on an average (as the data are not paired) but this will give a false sense of precision in some interactions. One way to cope with this is to randomly assign ions to specific OTU counts and calculate the correlation. If we do this multiple times and then take the average (and SD) we have a better measure of the true correlation.

To see what occurs by chance we can assign the ions randomly across all of the samples rather than just within a treatment. Some false positives will occur by chance but we can then assess whether we get more in specific interactions.

Algorithm

8 bacterial samples

6 metabolomic samples

Within a treatment assign 6 of the metabolomic samples at random to the bacterial samples

Calculate the correlation coefficient

Repeat this process x times and calculate the mean value

那么这个过程是如何实现的呢，下面我带大家一起来实现这个过程：

构建随机OTU矩阵

library(tidyverse)library(vegan)library (plyr)# 构建OTU表格 otu<-matrix(sample(1:500, 100),ncol=16,nrow = 300) row.names(otu) = paste("ASV_",1:nrow(otu),sep = "") colnames(otu) = c(paste("A",1:4,sep = ""),paste("B",1:4,sep = ""),paste("C",1:4,sep = ""),paste("D",1:4,sep = "")) head(otu)## A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 D1 D2 D3 D4 ## ASV_1 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 42 ## ASV_2 259 259 259 259 259 259 259 259 259 259 259 259 259 259 259 259 ## ASV_3 407 407 407 407 407 407 407 407 407 407 407 407 407 407 407 407 ## ASV_4 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 272 ## ASV_5 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 209 ## ASV_6 304 304 304 304 304 304 304 304 304 304 304 304 304 304 304 304#--相对丰度标准化 otunorm = t(t(otu)/colSums(otu)) otunorm = as.data.frame(t(otunorm))

otunorm$Group = c(rep("A",4),rep("B",4),rep("C",4),rep("D",4))

构建代谢组随机矩阵

代谢组表格和otu表格处理相同，但是有的重复为3个，有的重复为5个，不仅仅自己不统一重复数量，同时和otu的重复数量也不一致。

# 构建代谢组矩阵 #

GC<-matrix(sample(1:500, 100),ncol=16,nrow = 100) row.names(GC) = paste("GC_",1:nrow(GC),sep = "") colnames(GC) = c(paste("A",1:3,sep = ""),paste("B",1:5,sep = ""),paste("C",1:3,sep = ""),paste("D",1:5,sep = "")) head(GC)## A1 A2 A3 B1 B2 B3 B4 B5 C1 C2 C3 D1 D2 D3 D4 D5 ## GC_1 186 186 186 186 186 186 186 186 186 186 186 186 186 186 186 186 ## GC_2 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 171 ## GC_3 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 ## GC_4 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 321 ## GC_5 419 419 419 419 419 419 419 419 419 419 419 419 419 419 419 419 ## GC_6 199 199 199 199 199 199 199 199 199 199 199 199 199 199 199 199#--相对丰度标准化 GCnorm = t(t(GC)/colSums(GC)) # mapGC = data.frame(ID =c(paste("A",1:3,sep = ""),paste("B",1:5,sep = ""),paste("C",1:3,sep = ""),paste("D",1:5,sep = "")), # Group = c(rep("A",3),rep("B",5),rep("C",3),rep("D",5))) GCnorm = as.data.frame(t(GCnorm))

GCnorm$Group = c(rep("A",3),rep("B",5),rep("C",3),rep("D",5))

提取各自的重复数量

#make the code robust to allow different sample depths otu_reps<-data.frame(dplyr::count(otunorm,Group)) met_reps<-data.frame(dplyr::count(GCnorm,Group)) #get the minimum number otu_min_reps<-min(otu_reps$n) otu_min_reps## [1] 4met_min_reps<-min(met_reps$n) met_min_reps## [1] 3min_reps=min(otu_min_reps,met_min_reps) min_reps## [1] 3

准备运行参数

相关参数和运行结果存储对象，顺便做一个进度条

depth=min_reps repeats=10 cormethod = "spearman" pb <- txtProgressBar(min=1,max=repeats,style=3,width=50)

cor_list<-vector("list",repeats) a = 1 R = c() p = c()for (a in 1:repeats){ Sys.sleep(0.1) setTxtProgressBar(pb,a) #this has two effects - it randomises the order and reduces the sample number to depth otu_select<-data.frame(otunorm %>% group_by(Group) %>% sample_n(depth)) otu_select$Group = NULL distotu <- vegan::vegdist(otu_select)

met_select<-data.frame(GCnorm %>% group_by(Group) %>% sample_n(depth)) met_select$Group = NULL distGC <- vegan::vegdist(otu_select,"euclid") res <- vegan::mantel(distotu, distGC, method="spearman") p[a] = res$signif R[a] = res$statistic }## | | | 0%result = data.frame(R_mean = mean(R),R_sd = sd(R),p_mean = mean(p),p_sd = sd(p))

这里的R值和p值都是NA，因为通过随机矩阵并没有差异，所以距离矩阵都是0，使得两个模拟数据不存在相关或者差异。

result## R_mean R_sd p_mean p_sd ## 1 NA NA NA NA

根际互作生物学研究室简介

根际互作生物学研究室是沈其荣教授土壤微生物与有机肥团队下的一个关注于根际互作的研究小组。本小组由袁军副教授带领，主要关注：1.植物和微生物互作在抗病过程中的作用；2 环境微生物大数据整合研究；3 环境代谢组及其与微生物过程研究体系开发和应用。团队在过去三年中在 isme J， Microbiome， PCE，SBB，Horticulture Research等期刊上发表了多篇文章。欢迎关注微生信生物公众号对本研究小组进行了解。

团队工作及其成果（点击查看）

了解交流合作

团队成员邮箱袁军：unyuan@njau.edu.cn；文涛：2018203048@njau.edu.cn

硬核干货：如果样本量不一一样多，或者不是一一对应关系，如何做差异？相关？

写在前面

参考原文 ismej Metabolic regulation of the maize rhizobiome by benzoxazinoids

构建随机OTU矩阵

构建代谢组随机矩阵

提取各自的重复数量

准备运行参数

根际互作生物学研究室 简介

团队工作及其成果 （点击查看）

了解 交流 合作

相关推荐

根际互作生物学研究室简介

团队工作及其成果（点击查看）

了解交流合作