现代生物学所需要的现代统计学

看到了一本有意思的书籍:《现代生物学所需要的现代统计学》,名字是我自己翻译的。

主要是因为太多小伙伴在咱们《生信技能树》后台咨询过想不错生物学知识和统计学知识,恰好这个《Modern Statistics for Modern Biology》把二者涵盖了,在线阅读链接:https://www.huber.embl.de/msmb/index.html

全书还配套代码哦:

source("https://www.huber.embl.de/msmb/install_packages.R")

Data

  • Zipped data directory,压缩包自己下载,https://www.huber.embl.de/msmb/data.tar.gz

Code

  • Rfiles folder,链接是:https://www.huber.embl.de/msmb/code/

章节目录:

  • Home
  • Book supplements
  • Physical Copy
  • Introduction
  • 1 Generative Models for Discrete Data
  • 2 Statistical Modeling
  • 3 High Quality Graphics in R
  • 4 Mixture Models
  • 5 Clustering
  • 6 Testing
  • 7 Multivariate Analysis
  • 8 High-Throughput Count Data
  • 9 Multivariate methods for heterogeneous data
  • 10 Networks and Trees
  • 11 Image data
  • 12 Supervised Learning
  • 13 Design of High Throughput Experiments and their Analyses
  • Statistical Concordance
  • Acknowledgements
  • References

确实非常详细,图表代码丰富,比如第8节是高通量测序数据表达量矩阵处理:

  • Goals of this chapter
  • Some core concepts
  • Count data
  • Modeling count data
  • A basic analysis
  • Critique of default choices and possible modifications
  • Multi-factor designs and linear models
  • Generalized linear models
  • Two-factor analysis of the pasilla data
  • Further statistical concepts
  • Summary of this chapter
  • Further reading
  • Exercises

使用了一个R包《pasilla》里面的果蝇的表达量矩阵和分组信息:

fn = system.file("extdata", "pasilla_gene_counts.tsv",
   package = "pasilla", mustWork = TRUE)
counts = as.matrix(read.csv(fn, sep = "\t", row.names = "gene_id"))
 
annotationFile = system.file("extdata",
  "pasilla_sample_annotation.csv",
  package = "pasilla", mustWork = TRUE)
pasillaSampleAnno = readr::read_csv(annotationFile)
pasillaSampleAnno

然后根据分组,构建好比较信息,使用DESeq2包如下所示代码即可差异分析 :

library("dplyr")
pasillaSampleAnno = mutate(pasillaSampleAnno,
condition = factor(condition, levels = c("untreated", "treated")),
type = factor(sub("-.*", "", type), levels = c("single", "paired")))
 
library("DESeq2")
pasilla = DESeqDataSetFromMatrix(
  countData = counts,
  colData   = pasillaSampleAnno[mt, ],
  design = ~ condition)
class(pasilla)

pasilla = DESeq(pasilla)

res = results(pasilla)
res[order(res$padj), ] %>% head

是不是超级方便啊!

(0)

相关推荐