-
Notifications
You must be signed in to change notification settings - Fork 2
2. Package implementation
epimutacions
package should contain all the functions required to run a full epimutation analysis.
The input of epimutacions
will be a GenomicRatioSet
, an object implemented in minfi
package to manage DNA methylation data. This object is an extension of SummarizedExperiment
, so it coordinates DNA methylation measurements with phenotype data and CpG annotation.
A typical epimutations analyses will comprise three steps:
- Definition of epimutations
- Annotation of epimutations
- Visualization of epimutations
epimutacions
packages will allow the user to detect epimutations using a variety of algorithms. A more in-depth description of these algorithms can be found in 1. Approaches to epimutations' detection. A main function will be implemented that will incorporate the different algorithms. This function will have the following features:
- Have as input a matrix of beta values
- Allow missing data (identified with NAs) and ignore it during computation. This step will reduce the number of false-positives due to deletions.
- Report results in a
tibble
with the following columns:- Epi_ID: epimutation ID
- samp_ID: Sample ID
- chromosome
- start: Start position
- end: End position
- length: length of the epimutation
- N_CpGs: number of CpGs comprising the epimutation
- CpG_ids: ids of the CpGs included in the epimutation. This column will be useful for plotting
- Additional columns depending on the method (e.g. p-value, adjusted p-value, magnitude estimate...) could also be included.
Algorithms should be encapsulated in different functions to improve code efficiency.
Additional information of the epimutation should be added to the results. This information will be independent of the statistical method so a stand-alone function to annotate the results will be implemented. Some suggested fields that can be included:
- Proximal gene
- Gene position (i.e. promoter, body, 5'UTR...)
- OMIM entries for proximal gene
- CpG islands
- Imprinted regions
- cRE: overlap with cis-regulatory elements from ENCODE
- chromatin state: overlap with chromatin states from ENCODE
The results from the previous steps should be visualized to allow manual inspection of the epimutations. Plots should include two parts.
- Methylation of the proband compared with the rest of the population. Visualization of CpGs could extend further the epimutation.
- Genetic context of epimutation: this plot can be performed with
Gviz
package and might involve:- Proximal genes: genes close to the epimutation
- Regulatory elements: regulatory elements proximal to the epimutation.
Previous steps describe the core functionality of the epimutacions
packages. However, we should also work on other aspects during the Biohackathon:
- Documentation: Each function should be documented following
roxygen2
format. - Testing: appropriate testing code based on
testthat
framework should be provided for most of the functions (if not all). - Vignette: a vignette exemplifying how to apply
epimutacions
to a new dataset should be developed.
In this google drive link, you will find some useful data to work on this project.