加载数据

能够同时检测来自同一细胞的多种数据类型，称为多模式分析，代表了单细胞基因组学的一个新的和令人兴奋的前沿。例如CITE-seq能够同时检测来自同一细胞的转录组和细胞表面蛋白质。其他令人兴奋的技术，如[10 XGenomics]，允许对 scRNA-seq和scATAC-seq进行配对检测。Seurat 4.0，可以无缝存储、分析和探索多样化的多模式细胞数据集。

在这里，我们分析8，617个脐带血单核细胞（CBMCs）的数据集，其中转录组与11种表面蛋白质的丰度配对，对这些蛋白质的水平与DNA进行量化。首先，我们加载两个计数矩阵：一个用于RNA测量，另一个用于抗体衍生标签（ADT）。您可以在此处下载ADT文件^[1]和RNA文件^[2]

library(Seurat) library(ggplot2) library(patchwork) # Load in the RNA UMI matrix


# Note that this dataset also contains ~5% of mouse cells, which we can use as negative controls

# for the protein measurements. For this reason, the gene expression matrix has HUMAN_ or MOUSE_

# appended to the beginning of each gene.

cbmc.rna <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz", sep = ",", 

    header = TRUE, row.names = 1))
# To make life a bit easier going forward, we're going to discard all but the top 100 most

# highly expressed mouse genes, and remove the 'HUMAN_' from the CITE-seq prefix

cbmc.rna <- CollapseSpeciesExpressionMatrix(cbmc.rna)
# Load in the ADT UMI matrix

cbmc.adt <- as.sparse(read.csv(file = "../data/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz", sep = ",", 

    header = TRUE, row.names = 1))

# Note that since measurements were made in the same cells, the two matrices have identical # column names all.equal(colnames(cbmc.rna), colnames(cbmc.adt)) ## [1] TRUE

设置Seurat对象，添加RNA和蛋白质数据

现在，我们创建一个 Seurat 对象，并将 ADT 数据添加为第二个观测

# creates a Seurat object based on the scRNA-seq data cbmc <- CreateSeuratObject(counts = cbmc.rna)


# We can see that by default, the cbmc object contains an assay storing RNA measurement

Assays(cbmc)
## [1] "RNA"
# create a new assay to store ADT information

adt_assay <- CreateAssayObject(counts = cbmc.adt)
# add this assay to the previously created Seurat object

cbmc[["ADT"]] <- adt_assay
# Validate that the object now contains multiple assays

Assays(cbmc)
## [1] "RNA" "ADT"
# Extract a list of features measured in the ADT assay

rownames(cbmc[["ADT"]])
##  [1] "CD3"    "CD4"    "CD8"    "CD45RA" "CD56"   "CD16"   "CD10"   "CD11c" 

##  [9] "CD14"   "CD19"   "CD34"   "CCR5"   "CCR7"
# Note that we can easily switch back and forth between the two assays to specify the default

# for visualization and analysis
# List the current default assay

DefaultAssay(cbmc)

## [1] "RNA" # Switch the default to ADT DefaultAssay(cbmc) <- "ADT" DefaultAssay(cbmc) ## [1] "ADT"

基于 scRNA-seq 数据进行细胞聚类

下面的步骤表示基于 scRNA-seq 数据的 PBMC 的快速聚类。有关单个步骤或更高级选项的更多详细信息，请参阅此处的 PBMC 聚类引导教程^[3]

# Note that all operations below are performed on the RNA assay Set and verify that the default # assay is RNA DefaultAssay(cbmc) <- "RNA" DefaultAssay(cbmc) ## [1] "RNA" # perform visualization and clustering steps cbmc <- NormalizeData(cbmc) cbmc <- FindVariableFeatures(cbmc) cbmc <- ScaleData(cbmc) cbmc <- RunPCA(cbmc, verbose = FALSE) cbmc <- FindNeighbors(cbmc, dims = 1:30) cbmc <- FindClusters(cbmc, resolution = 0.8, verbose = FALSE) cbmc <- RunUMAP(cbmc, dims = 1:30) DimPlot(cbmc, label = TRUE)

并排可视化多模式数据

现在，我们已经从 scRNA-seq 文件中获得了聚类，我们可以在数据集中可视化蛋白质或RNA分子的表达。重要的是，Seurat 提供了在模式之间切换的几种方法，并指定您感兴趣的分析或可视化模式。这一点尤其重要，因为在某些情况下，相同的功能可能以多种方式存在。例如，此数据集包含 B 细胞标记 CD19（蛋白质和 RNA 水平）的独立观测。

# Normalize ADT data, DefaultAssay(cbmc) <- "ADT" cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2) DefaultAssay(cbmc) <- "RNA"


# Note that the following command is an alternative but returns the same result

cbmc <- NormalizeData(cbmc, normalization.method = "CLR", margin = 2, assay = "ADT")
# Now, we will visualize CD14 levels for RNA and protein By setting the default assay, we can

# visualize one or the other

DefaultAssay(cbmc) <- "ADT"

p1 <- FeaturePlot(cbmc, "CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein")

DefaultAssay(cbmc) <- "RNA"

p2 <- FeaturePlot(cbmc, "CD19") + ggtitle("CD19 RNA")

# place plots side-by-side p1 | p2

# Alternately, we can use specific assay keys to specify a specific modality Identify the key # for the RNA and protein assays Key(cbmc[["RNA"]])


## [1] "rna_"
Key(cbmc[["ADT"]])
## [1] "adt_"

# Now, we can include the key in the feature name, which overrides the default assay p1 <- FeaturePlot(cbmc, "adt_CD19", cols = c("lightgrey", "darkgreen")) + ggtitle("CD19 protein") p2 <- FeaturePlot(cbmc, "rna_CD19") + ggtitle("CD19 RNA") p1 | p2

识别 scRNA-seq 亚群的细胞表面marker

我们可以利用我们的配对 CITE-seq 测量来帮助注释源自 scRNA-seq 的cluster，并识别蛋白质和RNA标记。

# as we know that CD19 is a B cell marker, we can identify cluster 6 as expressing CD19 on the # surface VlnPlot(cbmc, "adt_CD19")

# we can also identify alternative protein and RNA markers for this cluster through differential # expression adt_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "ADT") rna_markers <- FindMarkers(cbmc, ident.1 = 5, assay = "RNA")


head(adt_markers)
##                p_val avg_log2FC pct.1 pct.2     p_val_adj

## CD10   1.161293e-206  0.4512418     1     1 1.509680e-205

## CCR7   2.052649e-189  0.2835441     1     1 2.668443e-188

## CD34   9.647958e-188  0.4379917     1     1 1.254234e-186

## CCR5   4.601039e-150  0.2871257     1     1 5.981350e-149

## CD45RA  6.699498e-86 -2.2198583     1     1  8.709348e-85

## CD14    3.093576e-62 -0.7499958     1     1  4.021649e-61
head(rna_markers)

## p_val avg_log2FC pct.1 pct.2 p_val_adj ## AC109351.1 0 0.3203893 0.265 0.005 0 ## CTD-2090I13.1 0 2.0024376 0.972 0.062 0 ## DCAF5 0 0.6637418 0.619 0.055 0 ## DYNLL2 0 2.0387603 0.984 0.094 0 ## FAM186B 0 0.3000479 0.244 0.002 0 ## HIST2H2AB 0 1.3104432 0.812 0.013 0

多模式数据的其他可视化方法

# Draw ADT scatter plots (like biaxial plots for FACS). Note that you can even 'gate' cells if # desired by using HoverLocator and FeatureLocator FeatureScatter(cbmc, feature1 = "adt_CD19", feature2 = "adt_CD3")

# view relationship between protein and RNA FeatureScatter(cbmc, feature1 = "adt_CD3", feature2 = "rna_CD3E")

FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8")

# Let's look at the raw (non-normalized) ADT counts. You can see the values are quite high, # particularly in comparison to RNA values. This is due to the significantly higher protein copy # number in cells, which significantly reduces 'drop-out' in ADT data FeatureScatter(cbmc, feature1 = "adt_CD4", feature2 = "adt_CD8", slot = "counts")

加载来自 10x Genomics的多模式数据

Seurat 还能够分析使用 CellRanger v3 处理的多摸式10x Genomics的数据：例如，我们使用 7，900 个外周血单核细胞（PBMC）的数据集重新创建上述图，可从此处^[4]的 10X Genomics中免费获得。

pbmc10k.data <- Read10X(data.dir = "../data/pbmc10k/filtered_feature_bc_matrix/") rownames(x = pbmc10k.data[["Antibody Capture"]]) <- gsub(pattern = "_[control_]*TotalSeqB", replacement = "", x = rownames(x = pbmc10k.data[["Antibody Capture"]]))


pbmc10k <- CreateSeuratObject(counts = pbmc10k.data[["Gene Expression"]], min.cells = 3, min.features = 200)

pbmc10k <- NormalizeData(pbmc10k)

pbmc10k[["ADT"]] <- CreateAssayObject(pbmc10k.data[["Antibody Capture"]][, colnames(x = pbmc10k)])

pbmc10k <- NormalizeData(pbmc10k, assay = "ADT", normalization.method = "CLR")

plot1 <- FeatureScatter(pbmc10k, feature1 = "adt_CD19", feature2 = "adt_CD3", pt.size = 1) plot2 <- FeatureScatter(pbmc10k, feature1 = "adt_CD4", feature2 = "adt_CD8a", pt.size = 1) plot3 <- FeatureScatter(pbmc10k, feature1 = "adt_CD3", feature2 = "CD3E", pt.size = 1) (plot1 + plot2 + plot3) & NoLegend()

文中链接

[1]

ADT文件: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz

[2]

RNA文件: ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz

[3]

教程: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html

[4]

此处: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_10k_protein_v3

多模式数据联合分析