Skip to content

yoonsikp/cnv-pathogenicity-prediction

Repository files navigation

Predicting the Pathogenicity of Copy Number Variations

Copy number variations (CNVs) describe a subset of the wide variety of genetic modifications that occur in humans. However, it remains difficult for researchers to predict the effects a CNV will have on an individual. CNVs exhibit a spectrum of phenotypic effects ranging from benign to pathogenic to even beneficial. This project aims to detect pathogenic CNVs, while safely discarding CNVs that are confidently predicted to be benign.

This repository contains the code and datasets required to replicate the results of the project. Furthermore, the libraries used for Feature Extraction can be repurposed for any project involving regions of genetic data aligned to the hg19 reference genome. Every top level folder contains a descriptive README. The following links are example notebooks from the project.

Feature Extraction

Libraries

Feature Extraction

Model Training

Logistic Regression

Neural Network

XGBoost

Model Testing

Logistic Regression

Neural Network

XGBoost

Presentation

Final Presentation Excerpt

Requirements

This project depends on Python 3. The Python 3 libraries needed are listed in requirements.txt.

About

Using ML to Predict Pathogenicity of Copy Number Variations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published