现代生物学所需要的现代统计学 / 开普饭

看到了一本有意思的书籍：《现代生物学所需要的现代统计学》，名字是我自己翻译的。

主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识，恰好这个《Modern Statistics for Modern Biology》把二者涵盖了，在线阅读链接：https://www.huber.embl.de/msmb/index.html

全书还配套代码哦：

source("https://www.huber.embl.de/msmb/install_packages.R")

Data

Zipped data directory，压缩包自己下载，https://www.huber.embl.de/msmb/data.tar.gz

Code

Rfiles folder，链接是：https://www.huber.embl.de/msmb/code/

章节目录：

Home
Book supplements
Physical Copy
Introduction
1 Generative Models for Discrete Data
2 Statistical Modeling
3 High Quality Graphics in R
4 Mixture Models
5 Clustering
6 Testing
7 Multivariate Analysis
8 High-Throughput Count Data
9 Multivariate methods for heterogeneous data
10 Networks and Trees
11 Image data
12 Supervised Learning
13 Design of High Throughput Experiments and their Analyses
Statistical Concordance
Acknowledgements
References

确实非常详细，图表代码丰富，比如第8节是高通量测序数据表达量矩阵处理：

Goals of this chapter
Some core concepts
Count data
Modeling count data
A basic analysis
Critique of default choices and possible modifications
Multi-factor designs and linear models
Generalized linear models
Two-factor analysis of the pasilla data
Further statistical concepts
Summary of this chapter
Further reading
Exercises

使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息：

fn = system.file("extdata", "pasilla_gene_counts.tsv", package = "pasilla", mustWork = TRUE) counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id")) annotationFile = system.file("extdata", "pasilla_sample_annotation.csv", package = "pasilla", mustWork = TRUE) pasillaSampleAnno = readr::read_csv(annotationFile) pasillaSampleAnno

然后根据分组，构建好比较信息，使用DESeq2包如下所示代码即可差异分析：

library("dplyr") pasillaSampleAnno = mutate(pasillaSampleAnno, condition = factor(condition, levels = c("untreated", "treated")), type = factor(sub("-.*", "", type), levels = c("single", "paired"))) library("DESeq2") pasilla = DESeqDataSetFromMatrix( countData = counts, colData = pasillaSampleAnno[mt, ], design = ~ condition) class(pasilla)


pasilla = DESeq(pasilla)

res = results(pasilla) res[order(res$padj), ] %>% head

是不是超级方便啊！

现代生物学所需要的现代统计学

Data

Code

章节目录：

相关推荐