Skip to content

wendlerc/ssl_genopheno

Repository files navigation

Self-supervised learning for genotype-phenotype mappings

A genotype-phenotype mapping associates genotypes with phenotypes, e.g., HIV genotypes with their resistance to a certain drug. In this project we pretrain a neural network on the genotypes and thereby achieve better genotype-phenotype regression performance in the low data regime.

The following plot summarizes our results.

summary plot

  • random: we use a randomly initiallized fully connected neural network to extract features for a linear regression model,
  • frozen_3: we use a pretrained neural network to extract features for a linear regression model,
  • trained: we train a neural network to solve the regression task,
  • tuned_3: we finetune the pretrained neural network on the regression task.

In order to compute confidence intervals, we repeated experiment 10 times.

How to run

First, install dependencies

# clone project   
git clone git@github.com:wendlerc/ssl_genopheno.git

# install project   
cd csss
conda create -n ssl_genopheno python=3.9
conda activate ssl_genopheno
pip install -r requirements.txt
conda install jupyter

Next, pretrain a neural network with either

python genotype/pretrain.py --wandb_name pretrained 

or

python genotype/pretrain_vicreg.py --wandb_name pretrained_vicreg

Finetune the resulting model either using

python genotype/regression.py --wandb_pretrained pretrained

or

python project/cfo/regression_vicreg.py --wandb_pretrained pretrained_vicreg

Evaluate using a frozen pretrained model using

python project/cfo/regression_sklearn.py --wandb_pretrained pretrained

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published