

之所以想到要专门写教程来宣传ArrayExpress,是因为最近有粉丝发邮件问我一个wgcna问题,我发现他举例的文章是:Identification of hub genes and pathways associated with bladder cancer based on co-expression network analysis,非常老套的分析策略了,发表在Oncol Lett. 2017 Jul; 而且膀胱癌是TCGA里面有的,所以我下意识以为是TCGA数据挖掘,结果进去看了看数据集下载自ArrayExpress,使用了两个数据集

  • The dataset E-MTAB-1940  included 4 controls (samples from normal bladders) and 82 cases (samples from BC tissue);
  • the dataset E-GEOD-3167  included 14 controls and 46 cases.


  • Subsequently, the data were screened by the feature filter method of the genefilter package.
  • Each probe was mapped to one gene using getSYMBOL, whoch is is a function in package annotate of the genefilter package  and the probe was discarded if it did not match any genes.
  • The two expression datasets were merged and synthetically analyzed using Batch Mean-centering, a merged data method (19), following adaptation according to Support Vector Machines, through the inSilicoMerging package (20).

8. ArrayExpress数据库的基因芯⽚原始数据处理,3D主成分图及聚类热 图 这个学徒作业,我们其实分享过ArrayExpress数据库,而且里面很清楚的讲解了 oligo::read.celfiles 可以处理affymetrix的CEL原始芯片文件,非常简单。

拿到表达矩阵后的差异分析,火山图,热图等等标准流程,基本上读一下我几年前在生信技能树的表达芯片的公共数据库挖掘系列推文 就明白了;



