为什么CD4阳性T细胞并不是表达CD4最多的 / 开普饭

大家已经跟着我们跑了很多次我们对官方 pbmc3k 例子，只需要自己按照如下所示链接下载 pbmc3k_filtered_gene_bc_matrices.tar.gz 并且解压即可，然后使用 Seurat 包里面的 Read10X 函数读取解压好的文件夹路径：

标准代码如下所示：

library(Seurat) # https://cf.10xgenomics.com/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz ## Load the PBMC dataset pbmc.data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")


## Initialize the Seurat object with the raw (non-normalized data).

pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", 

                           min.cells = 3, min.features = 200)

## Identification of mithocondrial genes

pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
## Filtering cells following standard QC criteria.

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & 

                 percent.mt < 5)
## Normalizing the data

pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", 

                      scale.factor = 10000)
pbmc <- NormalizeData(pbmc)
## Identify the 2000 most highly variable genes

pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
## In addition we scale the data

all.genes <- rownames(pbmc)

pbmc <- ScaleData(pbmc, features = all.genes)

pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc), verbose = FALSE) pbmc <- FindNeighbors(pbmc, dims = 1:10, verbose = FALSE) pbmc <- FindClusters(pbmc, resolution = 0.5, verbose = FALSE) pbmc <- RunUMAP(pbmc, dims = 1:10, umap.method = "uwot", metric = "cosine") table(pbmc$seurat_clusters) # pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25, verbose = FALSE) DimPlot(pbmc, reduction = "umap", group.by = 'seurat_clusters', label = TRUE, pt.size = 0.5) DotPlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"), group.by = 'seurat_clusters') ## Assigning cell type identity to clusters new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", "CD8 T", "FCGR3A+ Mono", "NK", "DC", "Platelet") names(new.cluster.ids) <- levels(pbmc) pbmc <- RenameIdents(pbmc, new.cluster.ids) DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend() pbmc$cluster_by_counts=Idents(pbmc) table(pbmc$cluster_by_counts)

虽然是进行初步生物学命名，看起来合情合理：

初步生物学命名

但是我在检查CD4基因表达量的时候，发现了很有意思的现象：

各个细胞亚群，都是有CD4基因表达的

可以看到各个细胞亚群，都是有CD4基因表达的，我们虽然命名了 Naive CD4 T和Memory CD4 T"，但是它们并没有特异性的高表达CD4基因哦！

上面的可视化代码如下所示：

sce=pbmc sce$celltype=Idents(sce)

p1=FeaturePlot(sce,'CD4') p2=DimPlot(sce, reduction = "umap", label = TRUE, repel = T,pt.size = 0.5) + NoLegend() p3=VlnPlot(sce,'CD4',group.by = 'celltype') library(patchwork) p1+p2 p1+p3

这个时候有粉丝提问，能不能在第一幅图umap里面，加上第二幅图FeaturePlot看CD4基因表达信息。文献出处是：《IL-11 is a crucial determinant of cardiovascular fibrosis》

如下所示，可以看到作者其实就是想展现IL-11这个基因呢，在其中一个fibroblasts细胞亚群里面是表达量比较高！

其中一个fibroblasts细胞亚群里面是表达IL-11这个基因

我查了一下， Seurat 包里面确实没有这个函数，不过 Seurat 包绘制的图形都是ggplot体系，所以比较容易自定义。

其实上面的图就是在umap上面叠加FeaturePlot信息，我给出来的代码如下所示：

p2=DimPlot(sce, reduction = "umap", label = TRUE, repel = T,pt.size = 0.5) + NoLegend() pos=sce@reductions$umap@cell.embeddings pos=pos[sce@assays$RNA@counts['CD4',]>1,] head(pos) library(ggplot2) p2+geom_point(aes(x=UMAP_1,y=UMAP_2), shape = 21, colour = "black", fill = "blue", size = 0.5, data = as.data.frame(pos))

效果如下所示：

在umap上面叠加FeaturePlot信息

首先需要对seurat对象有所理解

代码并不多，首先需要对seurat对象有所理解！单细胞数据看起来种类很多，有CEL-seq、MARS-seq、Drop-seq、Chromium 10x和SMART-seq的fastq数据。但是最终都是得到表达量矩阵哦，大家通常是5个R包，分别是: scater,monocle,Seurat,scran,M3Drop，需要熟练掌握它们的对象，：一些单细胞转录组R包的对象而且分析流程也大同小异：

step1: 创建对象
step2: 质量控制
step3: 表达量的标准化和归一化
step4: 去除干扰因素(多个样本整合)
step5: 判断重要的基因
step6: 多种降维算法
step7: 可视化降维结果
step8: 多种聚类算法
step9: 聚类后找每个细胞亚群的标志基因
step10: 继续分类

如果你也对10x单细胞转录组感兴趣，参考我们的《明码标价》专栏里面的单细胞内容

单细胞转录组数据分析的标准降维聚类分群，并且进行生物学注释后的结果。可以参考前面的例子：人人都能学会的单细胞聚类分群注释，我们演示了第一层次的分群。

如果你对单细胞数据分析还没有基础认知，可以看基础10讲：

其次需要对ggplot语法有所了解

一张统计图就是从数据到几何对象（点、线、条形等）的图形属性（颜色、形状、大小等）的一个映射。

✦ 数据（Data），最基础的是可视化的数据和一系列图形映射（aesthetic mappings），该映射描述了数据中的变量如何映射到可见的图形属性。
✦ 几何对象（Geometric objects, geoms）代表在图中实际看到的点、线、多边形等。
✦ 统计转换（Statistical trassformations, stats）是对数据进行某种汇总，例如将数据分组创建直方图，或将一个二维的关系用线性模型进行解释。
✦ 标度（Scales）是将数据的取值映射到图形空间，例如用颜色、大小或形状来表示不同的取值，展现标度的常见做法是绘制图例和坐标轴。
✦ 坐标系（Coordinate system, coord）描述数据是如何映射到图形所在的平面，同时提供看图所需的坐标轴和网格线。
✦ 分面（faceting）如何将数据分解为子集，以及如何对子集作图并展示。
✦ 主题（theme）控制细节显示，例如字体大小和图形的背景色。

ggplot2作者亲自写的书

链接：https://ggplot2-book.org/facet.html

书名是：ggplot2: Elegant Graphics for Data Analysis 作者：Hadley Wickham

This is the online version of work-in-progress 3rd edition of “ggplot2: elegant graphics for data analysis”

虽然这本书有对应的中文译本，但是时间上相对滞后，建议直接看这个在线实时更新版本。

Getting started

1 Introduction
2 Getting started with ggplot2
3 Frequently asked questions
II Toolbox

Introduction

4 Individual geoms
5 Collective geoms
6 Statistical summaries
7 Maps
8 Annotations
9 Arranging plots

III The Grammar

10 Mastering the grammar
11 Build a plot layer by layer
12 Scales, axes and legends
13 Coordinate systems
14 Facetting
15 Themes

IV Extending ggplot2

16 Programming with ggplot2
17 ggplot2 internals
18 Writing ggplot2 extensions
19 Extension Case Study: Springs, Part 1
References

看完你一定会觉得不虚此行！至少花十天时间哦。

知识点参考卡片（速记表，小抄）

链接：https://ggplot2.tidyverse.org/reference/

内容如下：

Plot basics
Layer: geoms
Layer: stats
Layer: position adjustment
Layer: annotations
Aesthetics
Scales
Guides: axes and legends
Facetting
Facetting: labels
Coordinate systems
Themes
Programming with ggplot2
Extending ggplot2
Vector helpers
Data
Autoplot and fortify

读这个知识点参考卡片，可以检验你ggplot2语法的记忆程度。

sthda网站的ggplot核心图表示例

链接：http://www.sthda.com/english/wiki/ggplot2-essentials

书籍本身提供售卖，价格是17欧元，不过内容都是电子化了，大家直接网页浏览，就是免费的哈！

内容：

qplot(): Quick plot with ggplot2

Scatter plots
Bar plot
Box plot, violin plot and dot plot
Histogram and density plots

Box plots

Change box plot line colors
Change box plot fill colors
Basic box plots
Box plot with dots
Change box plot colors by groups
Change the legend position
Change the order of items in the legend
Box plot with multiple groups
Functions: geom_boxplot(), stat_boxplot(), stat_summary()

··· 中间省略 25个章节

Rotate a plot: flip and reverse

Horizontal plot : coord_flip()
Reverse y axis
Functions: coord_flip(), scale_x_reverse(), scale_y_reverse()

Faceting: split a plot into a matrix of panels

Facet with one variable
Facet with two variables
Facet scales
Facet labels
facet_wrap
Functions: facet_grid(), facet_wrap(), label_both(), label_bquote(), label_parsed()

内容之丰富，起码需要五天左右时间完全follow下来。

还包括以下扩展包：

factoextra - Extract and Visualize the outputs of a multivariate analysis: PCA (Principal Component Analysis), CA (Correspondence Analysis), MCA (Multiple Correspondence Analysis) and clustering analyses.
easyggplot2: Perform and customize easily a plot with ggplot2: box plot, dot plot, strip chart, violin plot, histogram, density plot, scatter plot, bar plot, line plot, etc, …
ggplot2 - Easy way to mix multiple graphs on the same page
ggplot2: Correlation matrix heatmap. Functions: geom_raster() and geom_tile()
ggfortify: Allow ggplot2 to handle some popular R packages. These include plotting 1) Matrix; 2) Linear Model and Generalized Linear Model; 3) Time Series; 4) PCA/Clustering; 5) Survival Curve; 6) Probability distribution
GGally: GGally extends ggplot2 for visualizing correlation matrix, scatterplot plot matrix, survival plot and more.
ggRandomForests: Graphical analysis of random forests with the randomForestSRC and ggplot2 packages.
ggdendro: Create dendrograms and tree diagrams using ggplot2
ggmcmc: Tools for Analyzing MCMC Simulations from Bayesian Inference
ggthemes: Package with additional ggplot2 themes and scales
Theme used to create journal ready figures easily

ggplot2之所以备受推崇，就是因为它已经成为了一个生态，层出不穷的新奇想法会在它的基础上面生长起来。

绘图菜谱

链接：http://www.cookbook-r.com/Graphs/

这个有中文翻译版本，务必直接下单购买，放在书桌旁边随时翻阅。

内容：

Bar and line graphs (ggplot2)
Plotting means and error bars (ggplot2)
Plotting distributions (ggplot2) - Histograms, density curves, boxplots
Scatterplots (ggplot2)
Titles (ggplot2)
Axes (ggplot2) - Control axis text, labels, and grid lines.
Legends (ggplot2)
Lines (ggplot2) - Add lines to a graph.
Facets (ggplot2) - Slice up data and graph the subsets together in a grid.
Multiple graphs on one page (ggplot2)
Colors (ggplot2)

学了那么多语法，就在菜谱里面把握细节吧！

最后一个是 https://stackoverflow.com/

你会发现，你想实现的各种稀奇古怪的绘图需求，只需要你能使用英文描述出来，就是能找到答案的！

用谷歌搜索来使用ggplot2做可视化（上）：https://mp.weixin.qq.com/s/WN4TSMNjH4b6vZgYVjaRvQ
用谷歌搜索来使用ggplot2做可视化（下）：https://mp.weixin.qq.com/s/_Q16zDZgCr3XoO0r3wqRkw

如果我说，全部学完，需要一年的时间，不知道你还是否愿意入坑呢？

不过，如果你是R语言都没有掌握好，那么可能需要先学习我给初学者的六步系统入门R语言，知识点路线图如下：

了解常量和变量概念
加减乘除等运算（计算器）
多种数据类型（数值，字符，逻辑，因子）
多种数据结构（向量，矩阵，数组，数据框，列表）
文件读取和写出
简单统计可视化
无限量函数学习

为什么CD4阳性T细胞并不是表达CD4最多的