Skip to content

ulasisik/balancing-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BaSe

BaSe (abbreviation for Balancing Selection ) is a supervised machine learning algorithm, implementing both regular artifical neural network (ANN) and convolutional neural network (CNN), to predict selection and distinguish it between incomplete sweep and balancing selection. It includes 3 sequantial tests:

  • Test 1: Neutrality vs Selection
  • Test 2: Incomplete sweep vs Balancing selection
  • Test 3: Overdominance vs Negative frequency dependent selection

Each test performs a binary classification. The first test is to determine if the target allele is under selection. Here, the selection class is a mixture of both balancing and positive selection scenarios. If the target allele is under selection, the second test is to determine whether it is balancing selection or incomplete sweep. If it is a balancing selection, the third test distinguishes different types of balancing selection: overdominance and negative frequency dependent selection.

To generate training data, we performed simulations using SLiM simulation program. To train ANN, a number of potentially informative summary statistics are calculated for each simulation to represent output sequence, with the help of scikit-allel package. To train CNN, simulation outputs are converted into images. Then, trained models are used to perform prediction on real data.

Usage

Please look at the Simulations/README.md for a quick tutorial on how to generate simulations. The jupyter notebooks Tutorials/ANN_training.ipybn and Tutorials/CNN_training.ipybn provide examples on how to use BaSe to train ANN and CNN, respectively. Then, the jupyter notebook Tutorials/Prediction.ipybn demonstrate how to use trained models for prediction on real data.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published