表达量矩阵并不一定要上传到GEO或者ArrayExpress
最近在系统性整理单细胞转录组图谱计划,发现了一个有意思的数据共享方式,就是2018的小鼠单细胞图谱,文章标题是:《A single-cell transcriptomic atlas characterizes ageing tissues in the mouse》,链接是:https://www.nature.com/articles/s41586-020-2496-1
作者:Tabula Muris Consortium · 2018 · 截止到2021-06-11被引用次数:480
该文描述了斯坦福大学、陈-扎克伯格生物中心以及加州大学旧金山分校的研究人员建立的名为Tabula Muris的开源数据库,主要包括对小鼠20个器官和组织的超10万个单细胞的转录组图谱,及对不同组织和细胞类型的基因表达的比较。
以figshare形式分享
FigShare接受研究者上传图表、多媒体、海报、论文(包括预印本)和多文件、数据集等,提供了当前学术出版所不具备的一种文件共享模式。采用Creative Commons 许可协议共享数据,减少版权纠纷,使全球科学家可以存取、共享信息。
这篇文章在文章给出来了两个数据分享链接:
10.6084/m9.figshare.5715040 for FACS/Smartseq2 10.6084/m9.figshare.5715025 for 10X data.
而且如此出名的数据集,在R语言的bioconductor也有整理好的数据对象:https://bioconductor.org/packages/devel/data/experiment/vignettes/TabulaMurisData/inst/doc/TabulaMurisData.html
suppressPackageStartupMessages({
library(ExperimentHub)
library(SingleCellExperiment)
library(TabulaMurisData)
})
#> snapshotDate(): 2021-05-05
eh <- ExperimentHub()
#> snapshotDate(): 2021-05-05
query(eh, "TabulaMurisData")
#> ExperimentHub with 2 records
#> # snapshotDate(): 2021-05-05
#> # retrieve records with, e.g., 'object[["EH1617"]]'
#>
#> title
#> EH1617 | TabulaMurisDroplet
#> EH1618 | TabulaMurisSmartSeq2
可以看到,同样的也是两个分开了的表达量矩阵,他们走到是 SummarizedExperiment 流派,并不是seurat流派,所以有自己的一套对象规则, 也有 自己的网页工具:(2018). “iSEE: Interactive SummarizedExperiment Explorer.” F1000Research, 7, 741. doi: 10.12688/f1000research.14966.1.
也有纯粹文章附件形式分享
比如文章 2021 Mar 11. doi: 10.1016/j.ccell.2021.02.013,标题是:《Progressive immune dysfunction with advancing disease stage in renal cell carcinoma》,数据仅仅是附件:
supplementary Data S1: Data S1.
ScRNA-seq raw count matrix (part 1 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 1–6, S1–3, and S5.
NIHMS1692222-supplement-supplementary_Data_S1.zip (143M) 这个是压缩包,解压后是5个多G的csv文件,有3万多行的基因
GUID: 217E8B40-EB49-4FF5-AEF5-57BBBA4DAE61
supplementary Data S2: Data S2.
ScRNA-seq raw count matrix (part 2 of 2), after quality control filtering, with genes as rows and cell barcodes as columns, related to Figure 1–6, S1–3, and S5.
NIHMS1692222-supplement-supplementary_Data_S2.csv (1.7G) ,仅仅是1万多行的基因
GUID: 34477B69-0F73-4D9A-B926-66981E1D5D4A
文章对其单细胞实验设计描述的很清楚是:We performed single-cell RNA and T cell receptor sequencing (scRNA-seq/scTCR-seq) on 164,722 individual cells from tumor and adjacent non-tumor tissue in patients with ccRCC across disease stages – early, locally advanced, and advanced/metastatic.
但是让我失望的是,文章附件展示的csv文件是不全的!!!
为什么不老老实实的上传到GEO或者ArrayExpress呢?