使用ESTIMATE来对转录组表达数据根据stromal和immune细胞比例估算肿瘤纯度
文章发表于 (2013).
"Inferring tumour purity and stromal and immune cell admixture from expression data."
Nature Communications doi:10.1038/ncomms3612.
ESTIMATE (Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data) is a tool for predicting tumor purity, and the presence of infiltrating stromal/immune cells in tumor tissues using gene expression data. ESTIMATE algorithm is based on single sample Gene Set Enrichment Analysis and generates three scores:
1) stromal score (that captures the presence of stroma in tumor tissue),
2) immune score (that represents the infiltration of immune cells in tumor tissue), and
3) estimate score (that infers tumor purity).
预先处理了所有的TCGA数据
只需要根据每个样本的表达矩阵来计算3个得分,The website presents the scores for all TCGA tumor types.
在其网站上面可以直接下载整个分析结果哦
R语言包
安装如下:
library(utils)
rforge <- "http://r-forge.r-project.org"
install.packages("estimate", repos=rforge, dependencies=TRUE)
library(estimate)
help(package="estimate")
运行R包自带的测试数据
library(estimate)
OvarianCancerExpr <- system.file("extdata", "sample_input.txt",
package="estimate")
read.table(OvarianCancerExpr)[1:4,1:4]
filterCommonGenes(input.f=OvarianCancerExpr,
output.f="OV_10412genes.gct",
id="GeneSymbol")
estimateScore(input.ds = "OV_10412genes.gct",
output.ds="OV_estimate_score.gct",
platform="affymetrix")
plotPurity(scores="OV_estimate_score.gct", samples="s516",
platform="affymetrix")
scores=read.table("OV_estimate_score.gct",skip = 2,header = T)
rownames(scores)=scores[,1]
scores=t(scores[,3:ncol(scores)])
scores
可以看到很简单的代码,首先把txt文档里面的表达矩阵读入R里面转为gct格式,然后对gct格式的input表达矩阵使用estimateScore得到计算好的3个score值并且保存到本地文件。值如下:
StromalScore ImmuneScore ESTIMATEScore TumorPurity
s516 -281.81487 171.5411 -110.2737 0.8316075
s518 -426.14692 105.3890 -320.7580 0.8483668
s519 -57.14977 -365.2374 -422.3871 0.8561698
s520 1938.82379 2339.0707 4277.8944 0.3314725
s521 -671.64710 147.6183 -524.0288 0.8637832
s522 1458.13837 1176.8159 2634.9543 0.5472110
s523 -268.89216 -928.4953 -1197.3875 0.9092887
s525 973.42289 1320.0869 2293.5098 0.5884565
s526 552.64161 2162.4612 2715.1029 0.5373262
s527 -709.33568 1312.8416 603.5059 0.7689656
最后一个 plotPurity函数,根据保存好的文件来挑选对应的样本进行可视化,出图如下:
其实对大部分使用该包的的文章来说,需要的反而是该包定义的2个基因集,stromal 和 immune , 列表是:
StromalSignature estimate DCN PAPPA SFRP4 THBS2 LY86 CXCL14 FOXF1 COL10A1 ACTG2 APBB1IP SH2D1A SULF1 MSR1 C3AR1 FAP PTGIS ITGBL1 BGN CXCL12 ECM2 FCGR2A MS4A4A WISP1 COL1A2 MS4A6A EDNRA VCAM1 GPR124 SCUBE2 AIF1 HEPH LUM PTGER3 RUNX1T1 CDH5 PIK3R5 RAMP3 LDB2 COX7A1 EDIL3 DDR2 FCGR2B LPPR4 COL15A1 AOC3 ITIH3 FMO1 PRKG1 PLXDC1 VSIG4 COL6A3 SGCD COL3A1 F13A1 OLFML1IGSF6 COMP HGF GIMAP5 ABCA6 ITGAM MAF ITM2A CLEC7A ASPN LRRC15 ERG CD86 TRAT1 COL8A2 TCF21 CD93 CD163 GREM1 LMOD1TLR2 ZEB2 C1QB KCNJ8 KDR CD33 RASGRP3 TNFSF4 CCR1 CSF1R BTK MFAP5 MXRA5 ISLR ARHGAP28 ZFPM2 TLR7 ADAM12 OLFML2B ENPP2 CILP SIGLEC1 SPON2 PLXNC1 ADAMTS5 SAMSN1 CH25H COL14A1 EMCN RGS4 PCDH12 RARRES2 CD248 PDGFRB C1QA COL5A3 IGF1 SP140TFEC TNN ATP8B4 ZNF423 FRZB SERPING1 ENPEP CD14 DIO2 FPR1 IL18R1 HDC TXNDC3 PDE2A RSAD2 ITIH5 FASLG MMP3 NOX4 WNT2 LRRC32 CXCL9 ODZ4 FBLN2 EGFL6 IL1B SPON1 CD200
ImmuneSignature estimate LCP2 LSP1 FYB PLEK HCK IL10RA LILRB1 NCKAP1L LAIR1 NCF2 CYBB PTPRC IL7R LAPTM5 CD53 EVI2BSLA ITGB2 GIMAP4 MYO1F HCLS1 MNDA IL2RG CD48 AOAH CCL5 LTB GMFG GIMAP6 GZMK LST1 GPR65 LILRB2 WIPF1 CD37 BIN2 FCER1G IKZF1 TYROBP FGL2 FLI1 IRF8 ARHGAP15 SH2B3 TNFRSF1B DOCK2 CD2 ARHGEF6 CORO1A LY96 LYZ ITGAL TNFAIP3 RNASE6TGFB1 PSTPIP1 CST7 RGS1 FGR SELL MICAL1 TRAF3IP3 ITGA4 MAFB ARHGDIB IL4R RHOH HLA-DPA1 NKG7 NCF4 LPXN ITK SELPLG HLA-DPB1 CD3D CD300A IL2RB ADCY7 PTGER4 SRGN CD247 CCR7 MSN ALOX5AP PTGER2 RAC2 GBP2 VAV1 CLEC2B P2RY14 NFKBIAS100A9 IFI30 MFSD1 RASSF2 TPP1 RHOG CLEC4A GZMB PVRIG S100A8 CASP1 BCL2A1 HLA-E KLRB1 GNLY RAB27A IL18RAP TPST2 EMP3 GMIP LCK IL32 PTPRCAP LGALS9 CCDC69 SAMHD1 TAP1 GBP1 CTSS GZMH ADAM8 GLRX PRF1 CD69 HLA-B HLA-DMA CD74 KLRK1 PTPRE HLA-DRA VNN2 TCIRG1 RABGAP1L CSTA ZAP70 HLA-F HLA-G CD52 CD302 CD27