Skip to content

A curated collection of free resources to help the aspiring computational biologist learn about the R programming language.

License

Notifications You must be signed in to change notification settings

timyers/r-learning-resources-for-genomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

R Learning Resources for Genomics

Lifecycle badge GitHub followers Twitter Follow contributions License


The R Learning Resources for Genomics repository is a curated collection of free resources to help the aspiring computational biologist learn about the R programming language. R is a free and open source language for statistical analyses and graphics. Biology and medicine generate more data than ever before. Data analysis skills and understanding of computational genomics is more important than ever.

Illustration by Allison Horst - CC BY 4.0

Getting Started with R

  • RStudio Education - RStudio provides free and open source tools for R, including educational tools. Choose your R learning path: beginner, intermediate, expert. Resources include cheatsheets, books and tutorials.
  • How to Get Started with R: A Video - In this brief YouTube video, you will find all the essentials you need to get started with the R Programming Language. Make sure to download R and RStudio to your local machine first. Brought to you by the Duke Center for Computational Thinking.
  • R for Data Science (R4DS) - This book by Hadley Wickham will teach you how to do data science with R. You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. It’s designed to take you from knowing nothing about R or the tidyverse (see below) to having all the basic tools of data science at your fingertips. Find exercise solutions for R4DS here.
  • A Succinct Intro to R - This online book by Steve Haroz is a short introduction to the R language. It assumes you are familiar with programing concepts but want to learn R.
  • Introduction to Data Science - This book by Rafael A. Irizarry which started out as the class notes used in the HarvardX Data Science Series introduces concepts and skills that can help you tackle real-world data analysis challenges.
  • Data Science in R: A Gentle Introduction - This online book is structured as a series of walk-through lessons in R that will have you doing real data science in no time.
  • An(other) introduction to R - An(other) gentle introduction to R and how you can use it to work with data.
  • An Introduction to R - A basic introduction to R. Official CRAN documentation.
  • Swirl - Learn R, in R. Swirl teaches you R programming and data science interactively, at your own pace, and right in the R console.
  • Data Flair - The tutorials are grouped by skill level (beginner, intermediate, expert).
  • Computational Genomics with R - The aim of this online book is to provide the fundamentals for data analysis for genomics. It contains practical and theoretical aspects of computational genomics. Since computational genomics is interdisciplinary, this book aims to be accessible for biologists, medical scientists, computer scientists and people from other quantitative backgrounds.
  • R for Reproducible Scientific Analysis - The goal of R for Reproducible Scientific Analysis is to teach novice programmers to write modular code and best practices for using R for data analysis.
  • R Programming Examples - This web resource contains examples on basic concepts of R programming.
  • Using R for Common Scientific Tasks - R tutorial for common scientific data analysis and visualization tasks. View slides related to this tutorial here: (https://raw.githack.com/etmckinley/Coffey-Lab-R-Tutorial/main/R-tutorial-xaringnan.html).
  • R Graphical User Interface Comparison - A comparison of Graphical User interfaces for R posted on February 9, 2022 by Bob Muenchen in R bloggers.
  • How to read an R help page - "How to read an R help page" taken from Data Visualization - A practical introduction by Kieran Healy.

Learn the Tidyverse

  • Tidyverse - The tidyverse is a collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
  • R for Data Science (R4DS) - This book by Hadley Wickham will teach you how to do data science with R. You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. It’s designed to take you from knowing nothing about R or the tidyverse to having all the basic tools of data science at your fingertips. Find exercise solutions for R4DS here.
  • Teaching the tidyverse in 2021 - A new blog post for to update teaching the tidyverse in 2021, by Mine Çetinkaya-Rundel.
  • RStudio Cheatsheets - RStudio cheatsheets make it easy to use some of our favorite R packages.
  • Statistical Inference via Data Science: A ModernDive into R and the Tidyverse - “Help! I’m new to R and RStudio and I need to learn them! What do I do?” If you’re asking yourself this, this book is for you.
  • ggplot2: elegant graphics for data science - This book by Hadley Wickham goes into greater depth into the ggplot2 visualisation system. ggplot2 is an R package for producing statistical, or data, graphics.
  • The Evolution of a ggplot - In this blog post, Cédric Scherer shows you how to turn a default ggplot into a plot that visualizes information in an appealing and easily understandable way.
  • ggplot Wizardry Hands-On - A Step-by-Step tutorial: Tricks and secrets for a beautiful plot in R, by Cédric Scherer.
  • Tidymodels - The tidymodels framework is a collection of packages for modeling and machine learning using tidyverse principles.
  • Tidy Modeling with R - This book is a guide to using a new collection of software in the R programming language for model building.
  • ggside - The R package ggside expands on the ggplot2 package. This package allows the user to add graphical information about one of the main panel’s axis.
  • Principal Component Analysis (PCA) with tidyverse - A blog post by Benjamin Nowak to show how to use tidyverse tools and syntax to perform PCA.

Beyond the basics

  • What They Forgot to Teach You About R - Free online resource by Jenny Bryan and Jim Hester. The material is based on in-person workshops that focused on building holistic and project-oriented workflows that address the most common sources of friction in data analysis, outside of doing the statistical analysis itself.
  • Advanced R (2nd Edition) - This is the website for the 2nd edition of Advanced R, a book in Chapman & Hall’s R Series. The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as help you to understand why R works the way it does.
  • R Packages - The goal of this book, by Hadley Wickham and Jenny Bryan, is to teach you how to develop R packages so that you can write your own.
  • Big Book of R - An online book by Oscar Baruffa and contributors that lists 200+ R books, most are free.
  • R for Reproducible Scientific Analysis - The goal of R for Reproducible Scientific Analysis is to teach novice programmers to write modular code and best practices for using R for data analysis.
  • “Do More with R” video tutorials - An article by Sharon Machlis about quick video tips on useful things you can do in R. Most videos are shorter than 10 minutes.
  • Project-oriented workflow - An article by Jenny Bryan that includes advice on setting up your R life to maximize effectiveness and reduce frustration.
  • Supervised Machine Learning for Text Analysis in R - This book serves as a thorough introduction to prediction and modeling with text, along with detailed practical examples.
  • JavaScript for R - Did you know R works just as well with JavaScript?! This book delves into the various ways both the R and JavaScript languages can work together.
  • tidyquery - SQL and R - The tidyquery package runs SQL queries on R data frames. For an introduction to tidyquery and queryparser, watch the recording of the talk “Bridging the Gap between SQL and R" from rstudio::conf(2020).
  • Geocomputation with R - Geocomputation with R is for people who want to analyze, visualize and model geographic data.
  • Data Visualization with R - This book helps you create the most popular visualizations - from quick and dirty plots to publication-ready graphs.
  • Data Visualization - A practical introduction - A treasure trove of R-dataviz help by Kieran Healy.
  • gghighlight: An R package for improved visualization - Highlight lines and points in a ggplot2 data visualization object. An introductory YouTube video is available.
  • Connecting to databases using R - A resource from RStudio for connecting to databases.
  • Awesome R Package Development - A curated list of awesome tools to assist R 📦 development.
  • Apache Arrow - Apache Arrow contains a set of technologies that enable big data systems to process and move data fast. The arrow package for R can:
    • Read and write Parquet files,
    • Read and write Feather files
    • Analyze, process, and write multi-file, larger-than-memory datasets. Click here for vignette.
    • Much, much more.
    In order to learn how to use `arrow` in `R`, refer to this [documentation specific for the `R` environment](https://arrow.apache.org/docs/r/).

Finding Help with R


Continuous Learning

  • TidyTuesday - A weekly social data project in R - A weekly social data project in R. Every Monday they release a new dataset on the TidyTuesday Github page for participants to clean, wrangle, tidy, and plot.
  • Twitter for R programmers - The R community is very active on Twitter. You can learn a lot about the language, about new approaches to problems, and make new friends. This online book will show you how.
  • R Weekly - R is growing very quickly, and there are lots of great blogs, tutorials and other formats of resources coming out every day. R Weekly wants to keep track of these great things in the R community and make it more accessible to everyone.
  • Keeping up to date with R news - Following R news helps you learn about new tools and their applications.
  • RStudio Books - In addition to software tools, RStudio has also authored many books, some already highlighted in this curated list. Another tool for your toolbox.

Genomics

  • Computational Genomics with R - The aim of this online book is to provide the fundamentals for data analysis for genomics. It contains practical and theoretical aspects of computational genomics. Since computational genomics is interdisciplinary, this book aims to be accessible for biologists, medical scientists, computer scientists and people from other quantitative backgrounds.
  • bigstatsr - A package that provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk.
  • bigsnpr - An R package for the analysis of massive SNP arrays, primarily designed for human genetics. It enhances the features of package bigstatsr for the purpose of analyzing genotype data. Here is a short demonstration of bigsnpr and bigstatsr.
  • Computing Polygenic Risk Scores (PRS) using LDpred2 - A tutorial to show how to compute polygenic risk scores (PRS) using LDpred2.
  • Polygenic Risk Score (PRS) Tutorial - This tutorial provides a step-by-step guide to performing basic polygenic risk score (PRS) analyses.
  • Manhattan plot in R: a review - An article about creating a Manhattan plot with R.
  • Seurat - R toolkit for single cell genomics.
  • Orchestrating Single-Cell Analysis with Bioconductor - The goal of this book is to provide a solid foundation in the usage of Bioconductor tools for single-cell RNA-seq analysis by walking through various steps of typical workflows using example datasets.
  • Analysis of single cell RNA-seq data - An online course (2021) that is taught through the University of Camgridge Bioinformatics training unit. This course discusses some of the questions that can be addressed using scRNA-seq as well as the available computational and statistical methods available
  • Assigning cell types with SingleR - The Bioconductor package SingleR implements an automatic annotation method for single-cell RNA sequencing (scRNA-seq) data.
  • ANALYSIS OF SINGLE CELL RNA-SEQ DATA - A course (2019) from the Broad Institute that is aimed at researchers and technical workers who are or will be analyzing single cell RNA-seq data.
  • recount3 - recount3 is an online resource consisting of RNA-seq gene, exon, and exon-exon junction counts as well as coverage bigWig files for 8,679 and 10,088 different studies for human and mouse, respectively.
  • Survival Analysis in R - This tutorial provides an introduction to survival analysis and to conducting a survival analysis in R. It was originally presented at the Memorial Sloan Kettering Cancer Center R-Presenters series on August 30, 2018 and updated March 2019.
  • Doing Meta Analysis in R - This book serves as an accessible introduction into how meta-analyses can be conducted in R.
  • The Epidemiologist R Handbook - R for applied epidemiology and public health. Written by epidemiologists, for epidemiologists.
  • chromoMap - chromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes.
  • maftools: Summarize, Analyze and Visualize MAF Files - This Bioconductor R package attempts to summarize, analyze, annotate and visualize Mutation Annotation Format (MAF) files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.
  • plotgardener - plotgardener is a coordinate-based, genomic visualization package for R.
  • 42basepairs - File browser for public genomic data - A better way to explore genomics data in the cloud.
  • 9 tools for interactive exploring single-cell RNAseq data - A list of tools by Ming "Tommy" Tang, dated 2022-11-07.
  • 8 Resources to study Transcription factor binding, enhancers and histone modification distribution - A list of resources by Ming "Tommy" Tang, dated 2022-11-15.
  • Orchestrating Single-Cell Analysis with Bioconductor - Teaches users common workflows for the analysis of singe-cell RNA-seq (scRNA-seq) data.
  • ggplate - The ggplate package enables users to create simple plots of biological culture plates as well as microplates.
  • CuratedAtlasQueryR - CuratedAtlasQueryR is a query interface that allow the programmatic exploration and retrieval of the harmonised, curated and reannotated CELLxGENE single-cell Human Cell Atlas. Data can be retrieved at cell, sample, or dataset levels based on filtering criteria.
  • Orchestrating Hi-C analysis with Bioconductor - The primary aim of this book is to introduce the R user to Hi-C analysis. This book starts with key concepts important for the analysis of chromatin conformation capture and then presents Bioconductor tools that can be leveraged to process, analyze, explore and visualize Hi-C data. By Jacques Serizay.

Statistics


Reproducibility

  • R for Reproducible Scientific Analysis - The goal of R for Reproducible Scientific Analysis is to teach novice programmers to write modular code and best practices for using R for data analysis.
  • Reproducible Research Data and Project Management in R - A workshop that discusses R and Rstudio tools and conventions that offer a powerful framework for making modern, open, reproducible and collaborative computational workflows more accessible to researchers.
  • Packaging Data Analytical Work Reproducibly Using R - The purpose of this article is to show how the R package can be a suitable template for organising files into a research compendium to enhance the reproducibility of research.
  • The Turing Way - A community and handbook to reproducible, ethical and collaborative data science. The Turing Way also maintains a Zero-to-Binder tutorial for 3 common languages, Julia, Python, and R. For more information about Binder, see Getting Started with Binder below.
  • rrtools - Tools for Writing Reproducible Research in R. This package documents the key steps and provides convenient functions for quickly creating a new research compendium.
  • rcompendium - rcompendium makes it easy to create of R packages or research compendia with a predefined files/folders structure.
  • targets: pipeline tool for R - The targets package is a Make-like pipeline toolkit for Statistics and data science in R. With targets, you can maintain a reproducible workflow that skips costly runtime for tasks that are already performed. The user manual includes a walkthrough chapter.
  • Reproducible analysis and Research Transparency - Transparency, open sharing, and reproducibility are core values of science, but not always part of daily practice. This workshop (2017) provided an overview of current status in reproducible analysis in order to provide transparency in research.
  • orderly - orderly is a package designed to help make analysis more reproducible. Its principal aim is to automate a series of basic steps in the process of writing analyses.
  • How to name files - Slide deck from Jenny Bryan about the why's and how's of naming files.
  • groundhog - Write R scripts that are reproducible using the groundhog package.
  • renv - The renv package helps you create reproducible environments for R projects.
  • Draw me a project - A great blog post by Maëlle Salmon about reproducibility.
  • Docker - Get started with Docker.
  • A Docker tutorial for reproducible research. - This is an introduction to Docker designed for participants with knowledge about R and RStudio.
  • An Introduction to Docker for R Users - A quick introduction on using Docker for reproducibility in R, by Colin Fay.
  • Generating Dockerfiles for Reproducible Research with R - The R package containerit aims to make reproducible and archivable research with containers easy.
  • Transparent reproducible R environment with Docker + renv - A quick introduction to setting up a transparent reproducible R environment with Docker + renv, by Elio Campitelli.
  • Conducting reproducible research with Docker (Part 1 of 3) - A blog post by Derek Powell where he describes how to use to produce statistically and computationally reproducible researc using Docker.
  • Singularity - Singularity is an alternative container platform to Docker. You can build a container on your laptop and then run it on many of the largest HPC clusters in the world, local university or company clusters, a single server, in the cloud, or on a workstation down the hall. Additional details and help is also available here.
  • Get Started with Binder - This page will help you get started building your own repositories and sharing them with Binder. Binder is a code repository that contains (1) code or content that you'd like people to be able to run (e.g. R script or Jupyter Notebook) and (2) configuration files used by Binder to build the environment to run you code.
  • Containerize conda - Instructions about how to package an existing environment into a Docker or Singularity container which should be more portable and can also easily be integrated into a fully reproducible data analysis workflow.
  • conda-pack - Conda-pack is a command line tool for creating archives of conda environments that can be installed on other systems and locations. A tool like conda-pack is necessary because conda environments are not relocatable. Simply moving an environment to a different directory can render it partially or completely inoperable. conda-pack addresses this challenge by building archives from original conda package sources and reproducing conda’s own relocation logic.
  • CRAN Task View: Reproducible Research - The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified. Packages in R for this purpose can be split into groups for: literate programming, pipeline toolkits, package reproducibility, project workflows, code/data formatting tools, format convertors, and object caching.
  • R Workflow: Reproducible Biomedical Research Using Quarto - This online book was written to foster best practices in reproducible data documentation and manipulation, statistical analysis, graphics, and reporting. By Frank E Harrell Jr, Department of Biostatistics, School of Medicine, Vanderbilt University.
  • 12 resources for reproducible computational research - A list of resources by Ming "Tommy" Tang, article dated 2022-11-09.
  • Packaging data analytical work reproducibly using R (and friends) - Using real-world examples, the authors show how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools.
  • Posit Public Package Manager - Provides standard mirrors of CRAN, Bioconductor, and PyPI, and can track changes over time or freeze packages to specific versions, to help ensure reproducibility and ease collaboration.
  • pracpac - Practical R Packaging with Docker.
  • Data Skills for Reproducible Research - This book provides an overview of skills needed for reproducible and open research using the statistical programming language R and tidyverse packages.
  • Building reproducible analytical pipelines with R - The aim of this book is to teach you how to use some of the best practices from software engineering and DevOps to make your projects robust, reliable and reproducible. By Bruno Rodrigues (2023 October).
  • rig - The R Installation Manager - Make it easier to switch between multiple version of R on one computer.

Markdown & R Markdown

  • R Markdown from RStudio - Getting started with R Markdown.
  • R Markdown Cookbook - R Markdown is a powerful tool for combining analysis and reporting into the same document. This book provides practical and short examples to show the interesting and useful usage of R Markdown.
  • R Markdown: The Definitive Guide - An online reference book for R Markdown. And provides a detailed reference on the built-in R Markdown output formats of the rmarkdown package, as well as several other extension packages.
  • Visual R Markdown - RStudio v1.4 includes a new visual markdown editing mode. It provides improved productivity for composing longer-form articles and analyses with R Markdown.
  • Up and running with officedown - A blog article by Alison Hill that describes the officedown package which allows users to write Word and Powerpoint documents using R Markdown.
  • bookdown - An R package to facilitate writing books and long-form articles/reports with R Markdown. The home page includes a list of featured books.
  • List of featured books written with bookdown - A list of featured books published to bookdown.org. Click here for a list generated automatically and roughly ordered by date.
  • RMarkdown Tips and Tricks - A collection of tweets by Indrajeet Patil containing tips and tricks related to R Markdown.

Anti-R

  • jamovi - A free and open-source point-and-click statistical program. jamovi is built on top of the R statistical language making it easy to access the code underneath for your analysis.
  • JASP - JASP is an open-source statistical package that uses a graphical interface to facilitate both Bayesian and Frequentist analyses.
  • sandbox.bio - An interactive tool for learning bioinformatics command-line tools like bedtools, bowtie2, and samtools right from inside your web browser.
  • Polygenic Risk Score (PRS) Tutorial - This tutorial provides a step-by-step guide to performing basic polygenic risk score (PRS) analyses.
  • R Graphical User Interface Comparison - A comparison of Graphical User interfaces for R posted on February 9, 2022 by Bob Muenchen in R bloggers.
  • GrENEPipe - Snakemake pipeline for variant calling from raw sample sequences. One command to run the whole pipeline with many tools to choose from for each step.
  • iDEP: Integrated Differential Expression and Pathway analysis - Web interface for end-to-end bulk RNA-seq analysis. Use it for basic QC thru pathway analysis and more. See the helpful YouTube tutorial video.
  • ICARUS: Interactive single Cell RNA-seq Analysis with R shiny Using Seurat - A web application designed to guide the user through single cell RNA-seq analysis using the Seurat scRNA-seq analysis toolkit via a tutorial style interface. Its purpose is to allow the user to interactively visualize single cell RNA-seq data without the requirement of previous R programming knowledge.
  • reticulate - The reticulate package provides a comprehensive set of tools for interoperability between Python and R.
  • Data Science at the Command Line - This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist, by Jeroen Janssens.

Shiny


Git and Version Control

  • Git is a useful tool for version control and GitHub sits on top of Git and supports collaborative and distributed working.
  • Pro Git - What is "version control" and why we should care? This free online book by Scott Chacon and Ben Straub is explains it all. Dead tree versions are available image.
  • Excuse me, do you have a moment to talk about version control? - This article by Jennifer Bryan describes the use of the version control system Git and and the hosting site GitHub for statistical and data scientific workflows. Special attention is given to projects that use the statistical language R and, optionally, R Markdown documents.
  • Happy Git and GitHub for the useR - An online book that introduces Git, GitHub and version control.
  • How to Write a Git Commit Message - Commit messages matter. Here's how to write them well, by Chris Beams.
  • The Pro Git Book - The entire Pro Git book, written by Scott Chacon and Ben Straub and published by Apress.
  • GitHub actions with R - This book introduces GitHub actions, which help you automate tasks within your software development life cycle. More information available from GitHub Actions Documentation.
  • gert - The gert package is a simple git client based on ‘libgit2’. What this means for R users is that you can work with local and remote Git repositories from the comfort of R!
  • Enhanced support for citations on GitHub - GitHub now has built-in support for CITATION.cff files. This new feature enables academics and researchers to let people know how to correctly cite their work, especially in academic publications/materials.
  • Github actions with R - Trigger GitHub actions that allow you to automate steps after launching GitHub interactions such as when you push, pull, submit a pull request, or write an issue.

Other

  • Making Your Code Citable - If you’re a researcher writing software, this guide will show you how to make the work you share on GitHub citable by archiving one of your GitHub repositories and assigning a Digital Object Identifiers (DOI) with the data archiving tool Zenodo.
  • Enhanced support for citations on GitHub - GitHub now has built-in support for CITATION.cff files. This new feature (2021) enables academics and researchers to let people know how to correctly cite their work, especially in academic publications/materials.
  • grateful - The goal of grateful is to make it very easy to cite the R packages used in any report or publication.
  • Twitter for Scientists - This book by Daniel S. Quintana, University of Oslo, will walk you through the ins and outs of using Twitter as a scientist to share your work. With Twitter and social media, the opportunity to share your work with everyone is available. People need to know your work exists before they can read it.
  • tidyquery - SQL and R - The tidyquery package runs SQL queries on R data frames. For an introduction to tidyquery and queryparser, watch the recording of the talk “Bridging the Gap between SQL and R" from rstudio::conf(2020).
  • Doing Meta Analysis in R - This book serves as an accessible introduction into how meta-analyses can be conducted in R.
  • Building a team of internal R packages - A blog post by Emily Riederer.
  • The Story Behind rspatialdata - A blog post by Dilinie Seimon & Varsha Ujjinni Vijay Kumar about rspatialdata, a repository of data sources and simple tutorials on how to retrieve and visualize spatial data using R.
  • Building a Data Package - Building an R package to make datasets readily available.
  • Awesome R package development - A curated list of awesome tools to assist with R 📦 development.
  • Deep Exploratory Data Analysis (EDA) in R - Exploratory Data Analysis is an important first step in data science. In this excellent post by Yury Zablotski, he provide the simplest and most effective ways to explore data in R.

Miscellaneous


Contributing

All are welcome to contribute to this curated list. The best way is by using the GitHub issues system. Pull Requests for contributions or suggestions are also welcome.

Here are some simple ways in which you can contribute:

  • Correct typos.

  • Report and fix any links that do not work.

  • Suggest a new learning resource that could benefit others.

  • Propose removal of a learning resource that is out of date.

Please note that this list is released with a Contributor Code of Conduct. By participating you agree to abide by its terms.

Thank you!

About

A curated collection of free resources to help the aspiring computational biologist learn about the R programming language.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published