批量IGV截图【直播】我的基因组83 / 开普饭

下载代码是：

mkdir -p ~/reference/gtf/gencode
cd ~/reference/gtf/gencode
## https://www.gencodegenes.org/releases/current.html
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.2wayconspseudos.gtf.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.long_noncoding_RNAs.gtf.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.polyAs.gtf.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gtf.gz
## https://www.gencodegenes.org/releases/25lift37.html
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh37_mapping/gencode.v25lift37.annotation.gtf.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh37_mapping/gencode.v25lift37.metadata.HGNC.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh37_mapping/gencode.v25lift37.metadata.EntrezGene.gz
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/GRCh37_mapping/gencode.v25lift37.metadata.RefSeq.gz

代码是：

zcat gencode.v25.long_noncoding_RNAs.gtf.gz |perl -alne '{next unless $F[2] eq "gene" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >lncRNA.hg38.position
zcat gencode.v25.2wayconspseudos.gtf.gz |perl -alne '{next unless $F[2] eq "transcript" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >pseudos.hg38.position
zcat gencode.v25.annotation.gtf.gz| grep protein_coding |perl -alne '{next unless $F[2] eq "gene" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >protein_coding.hg38.position
zcat gencode.v25.annotation.gtf.gz|perl -alne '{next unless $F[2] eq "gene" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >allGene.hg38.position
zcat gencode.v25lift37.annotation.gtf.gz | grep protein_coding |perl -alne '{next unless $F[2] eq "gene" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >protein_coding.hg19.position
zcat gencode.v25lift37.annotation.gtf.gz | perl -alne '{next unless $F[2] eq "gene" ;/gene_name \"(.*?)\";/; print "$F[0]\t$F[3]\t$F[4]\t$1" }' >allGene.hg19.position

PS：这里面有一个小问题，gencode里面的数据有着HAVANA和ENSEMBL的区别，尤其是在hg38里面，需要区别对待！

批量IGV截图脚本

这里我想把我的全基因组重测续数据载入到IGV并且根据所有的基因查看并且截图保存。其中IGV脚本很简单，格式如下：

参考：https://software.broadinstitute.org/software/igv/batch

就是进入基因的区域，然后截图保存即可。但是要注意IGV对bam文件的分辨率上线是37kb。但是人类的近2万个编码蛋白的基因里面有接近一半都超过了这个上线，所以这些基因都需要截多张图！

制作这个IGV脚本的代码如下：

cat ~/reference/gtf/gencode/protein_coding.hg19.position| \
perl -alne '{if (($F[2]-$F[1])<30000){print "goto $F[0]:$F[1]-$F[2]\nsnapshot $F[3].png"} \
else{$n=int(($F[2]-$F[1])/30000)+1;foreach (1..$n){$start=$F[1]+($_-1)*30000;$end=$start+30000;print "goto $F[0]:$start-$end\nsnapshot $F[3]_$_.png" }}}' \
>igv_all_gene_snapshot.txt
goto chr1:7745384-7775384
snapshot CAMTA1_31.png
goto chr1:7775384-7805384
snapshot CAMTA1_32.png
goto chr1:7805384-7835384
snapshot CAMTA1_33.png
goto chr1:7831329-7841492
snapshot VAMP3.png
goto chr1:7844380-7874380
snapshot PER3_1.png
goto chr1:7874380-7904380
snapshot PER3_2.png
goto chr1:7904380-7934380
snapshot PER3_3.png
goto chr1:7903143-7913572
snapshot UTS2.png
goto chr1:7975954-8003225
snapshot TNFRSF9.png
goto chr1:8014351-8044351
snapshot PARK7_1.png
goto chr1:8044351-8074351
snapshot PARK7_2.png
goto chr1:8064464-8086368
snapshot ERRFI1.png

比如CAMTA1这个基因长约1M，所以会被切割成33个片段才出图。

这个IGV脚本运行指导流程是：

运行结果如下：