





这篇文献仅仅是aCGH芯片只能拿到CNV信号,数据有点单薄,所以作者结合了CCLE数据库的公共数据的芯片表达矩阵做了一下多组学联合分析。是2014就发表的文章:Molecular Integrative Clustering of Asian Gastric Cell Lines Revealed Two Distinct Chemosensitivity Clusters , 该课题组自己做的是:array comparative genomic hybridization (aCGH) on 18 Asian gastric cell lines.


  • **step1:**Affymetrix U133 Plus2 DNA microarray gene expressions of 27 gastric cancer cell lines (Kato-III, IM95, SNU-620, SNU-16, OCUM-1, NUGC-4, 2313287, HUG1N, MKN45, NCIN87, KE39, AGS, SNU-5, SNU-216, NUGC-3, NUGC-2, MKN74, MKN7, RERFGC1B, GCIY, KE97, Fu97, SH10TC, MKN1, SNU-1, Hs746 T, HGC27) were downloaded from Cancer Cell Line Encyclopedia (CCLE) [16] in March 2013.

  • step2: Robust Multi-array Average (RMA) normalization was performed. Principal component analysis plot show no obvious batch effect.

  • step3: The normalized data is then collapsed by taking the probe sets with highest gene expression.

前三步是为了得到27个胃癌相关细胞系的 **mRNA表达矩阵,**方法是下载cel文件,用RMA归一化,对多探针基因去最大表达量探针,供后续分析使用!

  • **step4: ** Unsupervised hierarchical clustering (1-Spearman distance, average linkage) was performed on the cell lines using the aCGH data.
    • 18 Asian gastric cell lines的层次聚类
    • Putative driver genes of which copy number aberrations correlated to mRNA gene expression were identified to determine subtypes or clusters that are driven by different mechanisms. This was done using Mann Whitney U-test with p<0.05, and Spearman Correlation Coefficient test with Rho >0.6.
    • 挑选表达量和CNV具有相似性的基因
  • step5: We then performed consensus clustering[17] on the gene expression data of the 27 gastric cancer cell lines from CCLE using these putative driver genes. We selected k=2 as it gives sufficiently stable similarity matrix.
    • 差异分析,功能富集
  • step6: In order to assign new samples to this integrative cluster, significance analysis of microarray (SAM) 18 with threshold q<2.0 was used to generate subtype signature based on the mRNA expression data of the 1762 genes from the 27 gastric cancer cell lines in CCLE.

也就是说,这里先用CNV信号数据来聚类,得到putative driver genes(就是CNV和表达量一起被改变的基因),然后再用这些基因的表达数据来再次聚类,分成两类,然后对这两类进行SAM找差异基因。


  • **step7:**ssGSEA (single sample GSEA)was used to estimate pathway activities of the gastric cancer cell line in the Molecular Signature Database v3.1 (Msigdb v3.1) [19], [20]. The pathway activities are represented in enrichment scores which were rank normalized to [0.0, 1.0].
  • **step8:**SAM analysis was performed with threshold q<0.2, and fold change >2.0 (for up-regulated pathways), or <0.5 (for down-regulated pathways) to obtain subtype-specific pathways from the 27 gastric cell lines in CCLE.


  • Cells in IC1 have enrichment of genes associated with oxidative phosphorylation and mitochondria functions.
  • gastric cells in IC2 are enriched for genes involved in cell signalling

最后的结论同样是“无病呻吟” :In conclusion, combination of aCGH and gene expression analysis to identify potential candidate oncogenes or tumor suppressor genes is a powerful and proven approach that has been reported in other cancer studies.







