adding microsatellite mutation models #2013

andrewkern · 2022-02-26T06:38:21Z

i'm hoping to add some simple mutation models that would be appropriate for microsatellite data.
i think this will be quite useful to folks who are working on non-model organisms, ecological genetics,
and maybe even forensics?

The simplest, most classical model is the Stepwise Mutation Model of Kimura and Ohta (1978).
I reckon that's a good place to start, although this nice paper by Calabrese and Durrett
lays out a number of other models that would be simple enough to implement.

Here is how I would imagine the implementation to look for the stepwise microsat model.
generally these models have a lo and hi parameter which governs the bounds on the
number of repeat units of the microsatelllite

class SMM_MicrosatMutationModel(MatrixMutationModel):
    """
    Stepwise mutation model (Kimura and Ohta, 1978), 
    for microsatellite repeats.

    This is a :class:`.MatrixMutationModel` with alleles ``[lo, .. , hi]``,
    root distribution is uniform ``(lo, hi)``, and a transition matrix that reflects
    single-step mutation.

    :param int lo: lower bound of the microsatellite repeat length
    :param int hi: upper bound of the microsatellite repeat length
        See :ref:`??` for more details.
    """

    def __init__(self, lo, hi):
        alleles = list(range(lo, hi+1))
        root_distribution = [1/len(alleles)] * len(alleles)
        transition_matrix = np.zeros((len(alleles), len(alleles)))
        for i in range(len(alleles)):
            if i == 0:
                transition_matrix[i, i+1] = 1
            elif i == len(alleles)-1:
                transition_matrix[i, i-1] = 1
            else:
                transition_matrix[i, i-1] = 0.5
                transition_matrix[i, i+1] = 0.5

        super().__init__(alleles, root_distribution, transition_matrix)

what do others think?

jeromekelleher · 2022-02-28T11:12:30Z

LGTM. I'd make hi exclusive just to keep consistent with things like range, but otherwise looks very sensible.

So, to be concrete, the plan is to make alleles equal to [str(j) for j in range(lo, hi)] and then if you have allele j, you can only ever mutate to j - 1 or j + 1?

andrewkern · 2022-02-28T14:17:48Z

So, to be concrete, the plan is to make alleles equal to [str(j) for j in range(lo, hi)] and then if you have allele j, you can only ever mutate to j - 1 or j + 1?

yep that's the SMM model in a nutshell

petrelharp · 2022-02-28T14:34:09Z

This sounds fine, although we should make sure to tell people not to set lo and hi to the actual min and max observed in their data. Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can set hi=Inf (or, something large). It would be straightforward.

jeromekelleher · 2022-02-28T14:36:16Z

Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can set hi=Inf (or, something large). It would be straightforward.

That's not quite the same thing though is it? With infinite alleles you can only every go from j to j + 1, whereas the model above you can go back to j - 1 with equal proba.

petrelharp · 2022-02-28T14:39:10Z

That's not quite the same thing though is it? With infinite alleles you can only every go from j to j + 1, whereas the model above you can go back to j - 1 with equal proba.

It's totally not the same thing - infinite alleles maintains a global counter for the next allele, and this would look at the previous state to choose a new one, but it provides a template. (so, maybe the SLiM mutation model code is better, as it actually modifies the previous allele)

jeromekelleher · 2022-02-28T14:40:08Z

Right, gotcha

andrewkern · 2022-02-28T14:40:28Z

This sounds fine, although we should make sure to tell people not to set lo and hi to the actual min and max observed in their data. Alternatively, we could modify the infinite alleles or SLiM model code (not use the mutation matrix model), so that we can set hi=Inf (or, something large). It would be straightforward.

another issue is the ancestral distribution. I think uniformly distributed is probably wrong?

jeromekelleher · 2022-02-28T14:43:47Z

The matrix method @andrewkern outlined above would work, though, right? Probably a good place to start if someone is doing this rather than diving into the C code immediately.

andrewkern · 2022-02-28T14:45:07Z

Yeah i think it should work just fine.

jeromekelleher · 2022-02-28T14:46:29Z

Unless the number of alleles is 10_000 or something it may be good enough.

petrelharp · 2022-02-28T14:53:57Z

Good enough, but I don't think anyone's doing it right now, and it'd be nice to not clutter up this list of models with out of date ones? Re: the root distribution - it'd be natural to allow a slight left/right bias and use the stationary distribution (I think there's a Slatkin paper on this?).

andrewkern · 2022-02-28T15:12:09Z

Good enough, but I don't think anyone's doing it right now, and it'd be nice to not clutter up this list of models with out of date ones?

not sure what you mean? SMM is the most popular microsat model in analyses and simulations AFAIK

petrelharp · 2022-02-28T16:48:34Z

I mean if we implement another slightly different but better SMM later. But I suppose that allowing hi=Inf is backwards-compatible, as is adding left/right jump probs. So, never mind.

andrewkern mentioned this issue Jun 15, 2022

Microsatellite mutations models #2071

Merged

petrelharp closed this as completed in #2071 Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding microsatellite mutation models #2013

adding microsatellite mutation models #2013

andrewkern commented Feb 26, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

petrelharp commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

petrelharp commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

petrelharp commented Feb 28, 2022

andrewkern commented Feb 28, 2022

petrelharp commented Feb 28, 2022

adding microsatellite mutation models #2013

adding microsatellite mutation models #2013

Comments

andrewkern commented Feb 26, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

petrelharp commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

petrelharp commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

andrewkern commented Feb 28, 2022

jeromekelleher commented Feb 28, 2022

petrelharp commented Feb 28, 2022

andrewkern commented Feb 28, 2022

petrelharp commented Feb 28, 2022