asymptotic MK calculations accounting for background selection and weak adaptation
Included in this repository are the scripts we used to simulate, infer parameters, and evaluate our method in Uricchio, Petrov & Enard (https://www.biorxiv.org/content/10.1101/427633v2)
Most of the scripts (especially those that we used to perform ABC) are currently set up to run on the Stanford cluster.
We are working on software that will be more general use.
The data that we used to parameterize our model are available in the 'data' folder, in the file called 'mk_with_positions_BGS.txt'. The genomic data correspond to the 1000 Genomes Phase 3, specifically for the 661 individuals from African populations.
The columns in this file are:
1st column is Ensembl Gene ID.
3nd column is pN, 3rd column is the corresponding derived frequencies.
4th column is pS, 5th column is the corresponding derived frequencies.
6th column is DN, seventh column is DS.
8th column is chromosome number
9th and 10th columns are genomic coordinates (note that they are not exactly the coordinates of the CDS. Rather they correspond to the closets regions to the CDS start and end from the McVicker et al 2009 PLoS genetics background selection dataset https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000471)
11th, 12th, and 13th columns are background selection estimates from McVicker for the start and end of the gene. Last column is just a naive average of the two. Many genes fall within a single region of BGS estimates but some straddle breakpoints.