单细胞基因组拷贝数变异流程 / 开普饭

主要是上游流程，读文章拿到数据后走标准的比对流程，计算覆盖度测序深度，文章是（2020年4月份）第16周（总第112周）- 单细胞基因组测序表明TNBC的CNV发展是爆发式的 http://www.bio-info-trainee.com/4381.html

软件环境安装

首先安装conda

# https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/ # https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/ wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc ## 安装好conda后需要设置镜像。 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda conda config --set show_channel_urls yes

隔离一个CNV的conda环境来配置软件

conda create -n cnv source activate cnv # fastqc multiqc trim-galore deeptools qualimap conda install -y -c bioconda bwa samtools bedtools sambamba sra-tools bowtie2 samblaster

下载参考基因组

这里一步到位下载bowtie2的参考基因组:http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

wget ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.bowtie_index.tar.gz # 参考基因组可以不需要下载，上面的索引就足够做比对了。 # wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz

下载文章数据

首先读文章拿到ID号列表，如下：

$head SRR_Acc_List.txt SRR3080554 SRR3080555 SRR3080553 SRR3080549 SRR3080552 SRR3080613 SRR3080616 SRR3080632 SRR3080615 SRR3080617

然后使用prefetch批量下载，如果你在中国大陆，可能需要加速，自己在生信技能树公众号内搜索如何加速下载sra文件吧。

cat SRR_Acc_List.txt |while read id;do ( prefetch $id -X 100G );done

接着解压下载到的sra文件，这个步骤也可以加速，或者并行。

$dump -A $sample -O $analysis_dir --gzip --split-3 sra/$srr.sra

质控可以继续使用 trim_galore 软件。

使用bowtie2和samblaster一步到位的干净比对

命令很简单

bowtie2 -x $index -U $id | samblaster -e -d $sample.disc.sam -s $sample.split.sam | samtools view -Sb - > $sample.clean.bam # 然后根据文章需要对bam文件进行过滤。 # 发现没有进行sort

批量运行

number1=$1 number2=$2 index=/home/yb77613/reference/index/bowtie/hg38


ls /home/yb77613/data/public/scDNA_TNBC/clean/*gz |while read id

do

echo $id

sample=$(basename $id "_trimmed.fq.gz")

echo $sample;
if((i%$number1==$number2))

    then

        bowtie2 -x $index -U $id | samblaster -e -d $sample.disc.sam -s $sample.split.sam | samtools view -Sb -q 1  - > $sample.clean.bam

    fi  ## end for  number1
i=$((i+1))

done ## end for $config_file

对bam文件批量构建索引

# 构建 index的bam需要先sort ls *.clean.bam| while read id;do(samtools sort -o ${id%%.*}_sort.bam $id);done ls *.bam |xargs -i samtools index {}


## 实际上可以直接

bowtie2 -x $index -U $id | samblaster -e -d $sample.disc.sam -s $sample.split.sam | samtools view -Sb -q 1  | samtools sort -o ${id%%.*}_sort.bam -

# 其实还可以去除PCR重复 sambamba markdup --overflow-list-size 600000 --tmpdir='./' -r $sample.bam ${sample}_rmd.bam

对bam文件按照200kb窗口计算表达量

参考生信菜鸟团博客：http://www.bio-info-trainee.com/4452.html

# 首先拿到基因组坐标染色体坐标 pip install pyfaidx faidx hg38.fasta -i chromsizes > sizes.genome # 然后使用 bedtools 工具基因组染色体坐标进行窗口划分 bedtools makewindows -g sizes.genome -w 200000 > 200k.bed # 再依据窗口根据参考基因组进行GC含量计算。 bedtools nuc -fi hg38.fa -bed 200k.bed | cut -f 1-3,5 > 200k_gc.bed # 对原始bam文件进行计算 ls ../alignment/*bam |while read id;do (bedtools multicov -bams $id -bed 200k.bed > $(basename $id .bam)_200K_counts.txt );done # 对干净的bam文件进行计算

单细胞基因组拷贝数变异流程

软件环境安装

隔离一个CNV的conda环境来配置软件

下载参考基因组

下载文章数据

使用bowtie2和samblaster一步到位的干净比对

对bam文件按照200kb窗口计算表达量

导入R做CNV分析

第二期单细胞视频笔记汇总

免疫治疗

发育

单细胞基因组拷贝数变异流程

软件环境安装

隔离一个CNV的conda环境来配置软件

下载参考基因组

下载文章数据

使用bowtie2和samblaster一步到位的干净比对

对bam文件按照200kb窗口计算表达量

导入R做CNV分析

第二期单细胞视频笔记汇总

免疫治疗

发育

相关推荐