Skip to content
Applied Statistics for High-Throughput Biology
Jupyter Notebook R
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Syllabus: Applied Statistics for High-Throughput Biology


Levi Waldron, PhD
Associate Professor of Biostatistics
City University of New York School Graduate of Public Health and Health Policy
New York, NY, U.S.A.

Hangouts: lwaldron.research
Skype: levi.waldron


Please come to the first class with the following installed:

Please create an account at, and use it to introduce yourself at


This course will provide biologists and bioinformaticians with practical statistical and data analysis skills to perform rigorous analysis of high-throughput biological data. The course assumes some familiarity with genomics and with R programming, but does not assume prior statistical training. It covers the statistical concepts necessary to design experiments and analyze high-dimensional data generated by genomic technologies, including: exploratory data analysis, linear modeling, analysis of categorical variables, principal components analysis, and batch effects.


Related Resources


Each day will include a hands-on lab session, that students should attempt in full.

Session detail by day

All course materials will be available from

  1. introduction
    • random variables
    • distributions
    • hypothesis testing for one or two samples (t-test, Wilcoxon test, etc)
    • hypothesis testing for categorical variables (Fisher's Test, Chi-square test)
    • data manipulation using dplyr
  2. linear modeling
    • linear and generalized linear modeling
    • model matrix and model formulae
    • multiple testing
  3. unsupervised analysis
    • graphics for exploratory data analysis
    • distance in high dimensions
    • principal components analysis and multidimensional scaling
    • unsupervised clustering
    • batch effects
  4. multi'omic data analysis lab session
    • core data classes in Bioconductor: GRanges, SummarizedExperiment, RaggedExperiment, MultiAssayExperiment
    • creating a MultiAssayExperiment
    • subsetting, reshaping, growing, and extraction of a MultiAssayExperiment
    • lotting, correlation, and other statistical analyses
    • multi'omics lab code and html
You can’t perform that action at this time.