Seurat3教程: 自定义降维方法MDS / 开普饭

Seurat - Dimensional Reduction Vignette

我们知道单细胞转录组数据一个主要的特点就是数据稀疏，维度较高。基于此，Seurat提供了不少降维的方法：

主要是PCA,TSNE,UMAP三种，其实降维方法何其的多：

那么，我们如果想对我们的数据应用其他降维方法，我们需要如何操作呢？今天我们就带大家走一走，Seurat对象的【multi-dimensional scaling (MDS)】降维方法。若要求原始空间中样本之间的距离在低维空间中得以保持，即得到"多维缩放" (Multiple Dimensional Scaling，简称 MDS)，基于此，来探究降维的一般方法以及进一步了解Seurat的数据结构。

什么，PCA，TSNE，UMAP我还没搞明白呢？MDS是什么意思？看看运来哥上一段感情经历的笔记啊:

数量生态学笔记||非约束排序|NMDS

Seurat3 中的降维结构

在Seurat v3.0中，存储和与维度缩减信息的交互已经被一般化并正式化为DimReduc对象。每个维度缩减过程作为一个命名列表的元素存储在object@slot中的DimReduc对象中。访问这些缩减可以通过[[操作符调用所需的缩减的名称来完成。例如，在使用RunPCA运行主成分分析之后，object[['pca']]将包含pca的结果。通过向列表中添加新元素，用户可以添加额外的、自定义的维度缩减。每个存储的维度缩减包含以下slot:

cell.embeddings: stores the coordinates for each cell in low-dimensional space.
feature.loadings: stores the weight for each feature along each dimension of the embedding
feature.loadings.projected:Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated with ProjectDim ) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now feature loadings for all feature
stdev: The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension.
key: Sets the column names for the cell.embeddings and feature.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”.
jackstraw: Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA.
misc: Bonus slot to store any other information you might want

为了访问这些插槽，我们提供了Embeddings、Loadings和Stdev函数:

1library(Seurat) 2pbmc_small[["pca"]] 3 4A dimensional reduction object with key PC_ 5 Number of dimensions: 19 6 Projected dimensional reduction calculated: TRUE 7 Jackstraw run: TRUE 8 Computed using assay: RNA

我们用相应的函数方法来查看一下啊

1> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5]) # 细胞 PCA坐标值 2 PC_1 PC_2 PC_3 PC_4 PC_5 3ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838 4CATGGCCTGTGCAT -0.02602702 -0.3466795 0.6651668 0.4182900 0.5853204 5GAACCTGATGAACC -0.45650250 0.1795811 1.3175907 2.0137210 -0.4818851 6TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232 7AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838 8TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838 9> head(Loadings(pbmc_small, reduction = "pca")[, 1:5]) # 基因在每个主成分中的loading值 10 PC_1 PC_2 PC_3 PC_4 PC_5 11PPBP 0.33832535 0.04095778 0.02926261 0.03111034 -0.090420744 12IGLL5 -0.03504289 0.05815335 -0.29906272 0.54744454 0.214603428 13VDAC3 0.11990482 -0.10994433 -0.02386025 0.06015126 -0.809207588 14CD1C -0.04690284 0.19835522 -0.35090617 -0.51112169 -0.130306281 15AKR1C3 -0.03894635 -0.42880452 0.08845847 -0.27274386 0.087791646 16PF4 0.34392057 0.02474860 -0.02519515 -0.01231411 -0.006725932 17> head(Stdev(pbmc_small, reduction = "pca")) # 标准差 18[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531

Seurat提供了RunPCA (pca)和RunTSNE (tsne)，并表示了通常应用于scRNA-seq数据的降维技术。当使用这些功能时，所有插槽都会自动填充。

我们还允许用户添加单独计算的自定义维缩减技术的结果(例如，多维缩放(MDS)或零膨胀因子分析)。您所需要的只是一个矩阵，其中包含低维空间中每个单元的坐标，如下所示.

存储自定义维度缩减计算

Classical (Metric) Multidimensional Scaling
Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).

虽然不是作为Seurat包的一部分，但它很容易在r中运行多维缩放(MDS)。如果你有兴趣运行MDS并将输出存储在Seurat对象中:

1# Before running MDS, we first calculate a distance matrix between all pairs of cells. Here we 2# use a simple euclidean distance metric on all genes, using scale.data as input 3d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data"))) 4# Run the MDS procedure, k determines the number of dimensions 5mds <- cmdscale(d = d, k = 2) 6 7head(mds) 8 [,1] [,2] 9ATGCCAGAACGACT 0.77403708 -0.8996461 10CATGGCCTGTGCAT 0.02602702 -0.3466795 11GAACCTGATGAACC 0.45650250 0.1795811 12TGACTGGATTCTCA 0.81163243 -1.3795340 13AGTCAGACTGCACA 0.77403708 -0.8996461 14TCTGATACACGTGT 0.77403708 -0.8996461 1# cmdscale returns the cell embeddings, we first label the columns to ensure downstream 2# consistency 3colnames(mds) <- paste0("MDS_", 1:2) 4# We will now store this as a custom dimensional reduction called 'mds' 5pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small)) 6 7pbmc_small 8An object of class Seurat 9230 features across 80 samples within 1 assay 10Active assay: RNA (230 features) 11 3 dimensional reductions calculated: pca, tsne, mds

我们的对象中已经有了mds这个slot了，下面我们像pca , tsne. umap,那样可视化它:

1# We can now use this as you would any other dimensional reduction in all downstream functions 2DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)

1pbmc_small <- ProjectDim(pbmc_small, reduction = "mds") 2MDS_ 1 3Positive: HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3 4 PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR 5Negative: SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB 6 VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1 7MDS_ 2 8Positive: HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR 9 PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY 10Negative: GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP 11 SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1 12Warning message: 13In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print, : 14 Only 2 dimensions have been computed.1# Display the results as a heatmap 2DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)

1VlnPlot(pbmc_small, features = "MDS_1")

查看MDS1维度如何与PC1维度相关性：

1# See how the first MDS dimension is correlated with the first PC dimension 2FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")

1FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")

References

[1] 数量生态学笔记||非约束排序|NMDS: https://www.jianshu.com/p/39021ec7d1dd
[2] Dimensional Reduction Vignette: https://links.jianshu.com/go?to=https%3A%2F%2Fsatijalab.org%2Fseurat%2Fv3.0%2Fdim_reduction_vignette.html

Seurat3教程: 自定义降维方法MDS

Seurat3 中的降维结构

存储自定义维度缩减计算

References

相关推荐