如何确定细胞聚类的PC数
准备
官网上PC数目的确定(https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html)
1library(Seurat)
2
3load(file = 'Cluster_seurat.Rdata') # data.filt
4seurat_data <- data.filt
方法一:DimHeatmap函数
1# Explore heatmap of PCs
2DimHeatmap(seurat_data, dims = 1:6, cells = 500, balanced = TRUE)
1DimHeatmap(seurat_data , dims = 7:12, cells = 500, balanced = TRUE)
方法二:ElbowPlot函数
1# Plot the elbow plot
2ElbowPlot(object = seurat_data , ndims = 30)
方法三:JackStrawPlot函数
1# Slow slow slow
2seurat_data <- JackStraw(object = seurat_data, dims = 50)
3seurat_data <- ScoreJackStraw(seurat_data, dims = 1:50)
4JackStrawPlot(object = seurat_data, dims = 1:50)
5
上面三种方法只能给出PC数的粗略范围,选择不同PC数目,细胞聚类效果差别较大,因此,需要一个更具体的PC数目。作者提出一个确定PC阈值的三个标准:
主成分累积贡献大于90%
PC本身对方差贡献小于5%
两个连续PCs之间差异小于0.1%
1# Determine percent of variation associated with each PC
2pct <- seurat_data [["pca"]]@stdev / sum( seurat_data [["pca"]]@stdev) * 100
3
4
5# Calculate cumulative percents for each PC
6cumu <- cumsum(pct)
7
8
9# Determine which PC exhibits cumulative percent greater than 90% and % variation associated with the PC as less than 5
10co1 <- which(cumu > 90 & pct < 5)[1]
11co1
12
13# Determine the difference between variation of PC and subsequent PC
14co2 <- sort(which((pct[1:length(pct) - 1] - pct[2:length(pct)]) > 0.1), decreasing = T)[1] + 1
15
16
17# last point where change of % of variation is more than 0.1%.
18co2
19
20# Minimum of the two calculation
21pcs <- min(co1, co2)
22pcs
23
24# Create a dataframe with values
25plot_df <- data.frame(pct = pct, cumu = cumu, rank = 1:length(pct))
26
27
28# Elbow plot to visualize
29ggplot(plot_df, aes(cumu, pct, label = rank, color = rank > pcs)) +
30 geom_text() +
31 geom_vline(xintercept = 90, color = "grey") +
32 geom_hline(yintercept = min(pct[pct > 5]), color = "grey") +
33 theme_bw()
查看PC相关高可变基因。如果我们看到一种罕见细胞类型的已知标记基因的PC数,那么可以选择从1~直到该PC值的所有PC数目。
1# Printing out the most variable genes driving PCs
2print(x = seurat_data [["pca"]], dims = 1:25, nfeatures = 5)1PC_ 1
2Positive: NEIL1, LTB, KLF2, TP53INP1, CD27
3Negative: TYMS, MKI67, PCLAF, RRM2, NUSAP1
4PC_ 2
5Positive: GZMA, ARL4C, PRF1, CST7, GZMM
6Negative: SLC35E3, ID3, PRDX1, TOP2B, RPLP0
7PC_ 3
8Positive: HBA2, HBB, HBA1, AHSP, HBD
9Negative: RPS18, RPL18A, RPS2, RPSA, RPL37A
10PC_ 4
11Positive: IGLL1, SLC35E3, PCDH9, CD38, F13A1
12Negative: CCL17, HMBS, BLVRB, AQP1, CD36
13PC_ 5
14Positive: GYPC, RPS18, RPS2, C1QTNF4, RPL18A
15Negative: MNDA, LYZ, S100A9, S100A8, FCN1
16PC_ 6
17Positive: PLK1, CDC20, CENPA, HMMR, CENPE
18Negative: GINS2, MCM6, HELLS, MCM4, MCM3
19PC_ 7
20Positive: GYPC, C1QTNF4, LIMS1, NRIP1, S100A9
21Negative: SPIB, TAGLN2, MS4A1, IGLC6, PTPRC
22PC_ 8
23Positive: FCGR3A, GZMB, SPON2, KLRF1, MYOM2
24Negative: CCR7, CD3G, CD3D, IL7R, GPR183
25PC_ 9
26Positive: CCL17, LTB, TMEM154, CCND2, HSPA12B
27Negative: ACTG1, LGALS1, IGLL1, CCDC81, TOP2B
28PC_ 10
29Positive: AHNAK, VIM, EMP1, LMNA, CD27
30Negative: MT1X, CCL17, FTL, HSP90B1, NSMCE1
31PC_ 11
32Positive: NEIL1, LTB, FTH1, CFD, CST3
33Negative: LCN2, RETN, S100A8, LTF, CAMP
34PC_ 12
35Positive: RPS12, RPLP1, RPL18A, EEF1B2, RPS5
36Negative: HNRNPU, NCL, AHNAK, AC245060.5, EMP1
37PC_ 13
38Positive: CD3D, TRAC, CD3G, IGLC6, CD27
39Negative: MARCH1, MS4A1, BANK1, ADAM28, LINC02397
40PC_ 14
41Positive: SCIMP, SRGN, GUSB, SHISA2, MARCH1
42Negative: MS4A1, ZNF608, ENAM, CCND2, CCL17
43PC_ 15
44Positive: ATF5, HSPA5, PSAT1, PHGDH, MARCH1
45Negative: NT5E, GIMAP4, TP53INP1, SHISA2, DBI
46PC_ 16
47Positive: ACSM3, IGLC6, SHISA2, REXO2, MT1X
48Negative: CD82, GCHFR, PRDX1, UBASH3B, PTGDR
49PC_ 17
50Positive: MARCKSL1, FTH1, S100A1, CRIP2, EMP2
51Negative: HSP90B1, HSPA5, UBASH3B, PPIB, FKBP5
52PC_ 18
53Positive: MARCH1, H3F3A, CALM2, ACTB, PRDX1
54Negative: HSP90B1, ATF5, HSPA5, MT-ND6, CANX
55PC_ 19
56Positive: TRGC2, LGALS1, KLRG1, CCL5, PTMS
57Negative: CCR7, TXK, FCER1G, CD7, TCF7
58PC_ 20
59Positive: PIM1, SOCS3, ADGRE5, RGCC, EPHA4
60Negative: LRMP, BANK1, MS4A1, CLEC4E, NME1
61PC_ 21
62Positive: CCR7, CMTM2, S100A11, LRMP, TXK
63Negative: TRGC2, RPS12, KLRG1, LCN6, RPS18
64PC_ 22
65Positive: CTGF, PMAIP1, FOS, KLF6, FOSB
66Negative: FUT7, SLC9A3R2, LCN6, PPP1R14A, EMP3
67PC_ 23
68Positive: ATF5, PSAT1, HSP90B1, PHGDH, HSPA5
69Negative: CTHRC1, NSMCE1, MAP1A, IGLL1, BTNL9
70PC_ 24
71Positive: SERINC2, LST1, NAMPT, MT1X, SLC25A37
72Negative: SHISA2, DEPP1, GADD45A, PSTPIP2, CD33
73PC_ 25
74Positive: CDKN1C, RHOB, BATF3, CX3CR1, SERPINA1
75Negative: FOS, ALDH2, MGST1, MPO, FOSB