STAR的速度为何如此诡异
同样的fastq文件大小,时间完全不一样!
希望走过路过的小伙伴能检查一下自己的转录组数据,帮我测试一下你们的star,看看能不能发现这样的现象哈,欢迎发邮件跟我交流,我的邮箱是 jmzeng1314 在 163邮箱。
这个数据耗时约3个小时。
1.5G Dec 1 23:45 /MK2305-4LA_1.fq.gz
1.7G Dec 1 23:47 /MK2305-4LA_2.fq.gz
MK2305-4LA /MK2305-4LA_1.fq.gz /MK2305-4LA_2.fq.gz
May 23 20:22:34 ..... started STAR run
May 23 20:22:35 ..... loading genome
May 23 20:25:15 ..... started 1st pass mapping
May 23 21:10:43 ..... finished 1st pass mapping
May 23 21:10:44 ..... inserting junctions into the genome indices
May 23 21:13:32 ..... started mapping
May 23 23:14:01 ..... finished successfully
下面的耗时约16个小时
1.9G Dec 2 10:17 /MK2313-3RA_1.fq.gz
2.1G Dec 2 10:02 /MK2313-3RA_2.fq.gz
MK2313-3RA /MK2313-3RA_1.fq.gz /MK2313-3RA_2.fq.gz
May 23 23:56:03 ..... started STAR run
May 23 23:56:03 ..... loading genome
May 23 23:58:30 ..... started 1st pass mapping
May 24 05:05:39 ..... finished 1st pass mapping
May 24 05:05:41 ..... inserting junctions into the genome indices
May 24 05:08:35 ..... started mapping
May 24 15:42:39 ..... finished successfully
从文件大小来看,看不出区别:
2.3G May 23 23:26 MK2305-4LA.bam
2.4M May 23 23:27 MK2305-4LA.bam.bai
71M May 23 23:13 MK2305-4LA_Chimeric.out.junction
587M May 23 23:13 MK2305-4LA_Chimeric.out.sam
20M May 23 23:49 MK2305-4LA.counts.txt
332 May 23 23:49 MK2305-4LA.counts.txt.summary
439 May 23 23:28 MK2305-4LA.flagstat
1.9K May 23 23:13 MK2305-4LA_Log.final.out
24K May 23 23:13 MK2305-4LA_Log.out
15K May 23 23:13 MK2305-4LA_Log.progress.out
8.1M May 23 23:13 MK2305-4LA_SJ.out.tab
2.8G May 24 15:58 MK2313-3RA.bam
2.5M May 24 15:59 MK2313-3RA.bam.bai
108M May 24 15:42 MK2313-3RA_Chimeric.out.junction
886M May 24 15:42 MK2313-3RA_Chimeric.out.sam
20M May 24 16:27 MK2313-3RA.counts.txt
332 May 24 16:27 MK2313-3RA.counts.txt.summary
439 May 24 16:00 MK2313-3RA.flagstat
1.9K May 24 15:42 MK2313-3RA_Log.final.out
24K May 24 15:42 MK2313-3RA_Log.out
42K May 24 15:42 MK2313-3RA_Log.progress.out
8.2M May 24 15:42 MK2313-3RA_SJ.out.tab
我的star命令是:
if [ ! -f $sample.bam ]; then
#$bin_star --runThreadN 5 --genomeLoad LoadAndKeep --genomeDir $star_index --readFilesCommand zcat --readFilesIn $analysis_dir/clean/${fq1_base}_val_1.fq.gz $analysis_dir/clean/${fq2_base}_val_2.fq.gz --outFileNamePrefix ${sample}_
$bin_star --runThreadN 5 \
--genomeDir $star_index \
--twopassMode Basic --outReadsUnmapped None \
--chimSegmentMin 12 \
--alignIntronMax 100000 \
--chimSegmentReadGapMax parameter 3 \
--alignSJstitchMismatchNmax 5 -1 5 5 \
--readFilesCommand zcat \
--readFilesIn $analysis_dir/clean/${fq1_base}_val_1.fq.gz \
$analysis_dir/clean/${fq2_base}_val_2.fq.gz \
--outFileNamePrefix ${sample}_
fi