Skip to content

2. Package implementation

yocra3 edited this page Nov 4, 2020 · 5 revisions

Introduction

epimutacions package should contain all the functions required to run a full epimutation analysis.

Input data

The input of epimutacions will be a GenomicRatioSet, an object implemented in minfi package to manage DNA methylation data. This object is an extension of SummarizedExperiment, so it coordinates DNA methylation measurements with phenotype data and CpG annotation.

Epimutations workflow

A typical epimutations analyses will comprise three steps:

  1. Definition of epimutations
  2. Annotation of epimutations
  3. Visualization of epimutations

Definition of epimutations

epimutacions packages will allow the user to detect epimutations using a variety of algorithms. A more in-depth description of these algorithms can be found in 1. Approaches to epimutations' detection. A main function will be implemented that will incorporate the different algorithms. This function will have the following features:

  • Have as input a matrix of beta values
  • Allow missing data (identified with NAs) and ignore it during computation. This step will reduce the number of false-positives due to deletions.
  • Report results in a tibble with the following columns:
    • Epi_ID: epimutation ID
    • samp_ID: Sample ID
    • chromosome
    • start: Start position
    • end: End position
    • length: length of the epimutation
    • N_CpGs: number of CpGs comprising the epimutation
    • CpG_ids: ids of the CpGs included in the epimutation. This column will be useful for plotting
    • Additional columns depending on the method (e.g. p-value, adjusted p-value, magnitude estimate...) could also be included.

Algorithms should be encapsulated in different functions to improve code efficiency.

Annotation of epimutations

Additional information of the epimutation should be added to the results. This information will be independent of the statistical method so a stand-alone function to annotate the results will be implemented. Some suggested fields that can be included:

  • Proximal gene
  • Gene position (i.e. promoter, body, 5'UTR...)
  • OMIM entries for proximal gene
  • CpG islands
  • Imprinted regions
  • cRE: overlap with cis-regulatory elements from ENCODE
  • chromatin state: overlap with chromatin states from ENCODE

Visualization of epimutations

The results from the previous steps should be visualized to allow manual inspection of the epimutations. Plots should include two parts.

  • Methylation of the proband compared with the rest of the population. Visualization of CpGs could extend further the epimutation.
  • Genetic context of epimutation: this plot can be performed with Gviz package and might involve:
    • Proximal genes: genes close to the epimutation
    • Regulatory elements: regulatory elements proximal to the epimutation.

Additional tasks

Previous steps describe the core functionality of the epimutacions packages. However, we should also work on other aspects during the Biohackathon:

  • Documentation: Each function should be documented following roxygen2 format.
  • Testing: appropriate testing code based on testthat framework should be provided for most of the functions (if not all).
  • Vignette: a vignette exemplifying how to apply epimutacions to a new dataset should be developed.

Resources

In this google drive link, you will find some useful data to work on this project.