Skip to content
Using Machine Learning to Predict Pathogenicity of Copy Number Variations
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
0_Initial_Dataset
1_Gene_Annotations
2_Feature_Extraction
3_Model_Training
4_Model_Testing
5_Misc
BCB330_Presentation_Excerpt.pdf
LICENSE
README.md
requirements.txt

README.md

Predicting the Pathogenicity of Copy Number Variations

Copy number variations (CNVs) describe a subset of the wide variety of genetic modifications that occur in humans. However, it remains difficult for researchers to predict the effects a CNV will have on an individual. CNVs exhibit a spectrum of phenotypic effects ranging from benign to pathogenic to even beneficial. This project aims to detect pathogenic CNVs, while safely discarding CNVs that are confidently predicted to be benign.

This repository contains the code and datasets required to replicate the results of the project. Furthermore, the libraries used for Feature Extraction can be repurposed for any project involving regions of genetic data aligned to the hg19 reference genome. Every top level folder contains a descriptive README. The following links are example notebooks from the project.

Feature Extraction

Libraries

Feature Extraction

Model Training

Logistic Regression

Neural Network

XGBoost

Model Testing

Logistic Regression

Neural Network

XGBoost

Presentation

Final Presentation Excerpt

Requirements

This project depends on Python 3. The Python 3 libraries needed are listed in requirements.txt.

You can’t perform that action at this time.