Discrepancy between expected and observed coalescence time in beta coalescent for large values of alpha #2256

Sendrowski · 2024-02-05T12:16:25Z

Hey!

There appears to be an upward bias in the simulated coalescence times of the beta coalescent. This is relative to my expectation based on the time scaling provided in the documentation. The bias starts to be noticeable from alpha=1.8 and increases substantially for values closer to 2.

Relative error between observed and expected coalescent times plotted against alpha:

Code to generate figure:

import multiprocessing as mp

import matplotlib.pyplot as plt
import msprime
import numpy as np
import scipy


def compute_beta_timescale(pop_size, alpha, ploidy):
    """
    Compute the generation time for the beta coalescent exactly as done in msprime when testing.
    https://github.com/tskit-dev/msprime/blob/804e0361c4f8b5f5051a9fbf411054ee8be3426a/verification.py#L3447
    """

    if ploidy > 1:
        N = pop_size / 2
        m = 2 + np.exp(
            alpha * np.log(2) + (1 - alpha) * np.log(3) - np.log(alpha - 1)
        )
    else:
        N = pop_size
        m = 1 + np.exp((1 - alpha) * np.log(2) - np.log(alpha - 1))
    ret = np.exp(
        alpha * np.log(m)
        + (alpha - 1) * np.log(N)
        - np.log(alpha)
        - scipy.special.betaln(2 - alpha, alpha)
    )
    return ret


def simulate_tree_height(alpha):
    """
    Simulate genealogies under the beta coalescent and compute the average tree height.
    Since we only have two lineages, the coalescence time should coincide with `compute_beta_timescale ()`.
    """

    # simulate genealogies under the beta coalescent
    g = msprime.sim_ancestry(
        samples=2,
        num_replicates=100000,
        model=msprime.BetaCoalescent(alpha=alpha),
        ploidy=1
    )

    return np.mean([ts.first().total_branch_length / 2 for ts in g])


if __name__ == "__main__":
    alphas = np.linspace(1.99, 1.999, 30)

    # simulate average tree heights in parallel
    with mp.Pool() as pool:
        heights_observed = np.array(pool.map(simulate_tree_height, alphas))

    # compute theoretical tree heights
    heights_theoretical = np.array([compute_beta_timescale(1, alpha, 1) for alpha in alphas])

    # compute relative difference between observed and theoretical
    diff_rel = (heights_observed - heights_theoretical) / heights_theoretical

    # plot relative difference against alpha
    plt.plot(alphas, diff_rel)
    plt.margins(0)
    plt.xlabel('alpha')
    plt.ylabel('diff_rel')
    plt.show()

Am I perhaps missing something?

The text was updated successfully, but these errors were encountered:

jeromekelleher · 2024-02-05T13:12:53Z

Thanks for the report @Sendrowski - the figure seems to have gone missing, would you mind posting again please?

Sendrowski · 2024-02-05T13:29:44Z

Hm that’s interesting. Did it work this time?

jeromekelleher · 2024-02-05T14:10:10Z

No - maybe a problem with GitHub?

Sendrowski · 2024-02-05T14:20:02Z

Possibly — I managed to view it with an independent device. Here the link to the image if that is of any help.

jeromekelleher · 2024-02-05T14:41:29Z

Sorry, still not showing up for me (I got a 404 on that link)

Sendrowski · 2024-02-05T14:49:41Z

And here is an iCloud link. Hope it works :)

jeromekelleher · 2024-02-05T15:09:15Z

jeromekelleher · 2024-02-05T15:09:38Z

Any thoughts here @JereKoskela ?

JereKoskela · 2024-02-05T16:22:08Z

I ran a C++ script simulating Beta(2-a, a) random variables for a very close to 2, and relative errors of 5 or 6 per cent in an empirical mean from 10k samples are common because the true mean is so small. I think that explains what is going on below alpha = 1.995 or so.

As for the spike very close to 2, the timescale of the Schweinsberg Beta-coalescent model is unpleasant. It goes to infinity at alpha = 1 and zero at alpha = 2, so I wouldn't be surprised if numerical issues creep in near those boundaries. At the alpha = 2 boundary in particular, you're essentially sampling 0/0. The practical thing to do is just use the Hudson model, since probability of seeing a multiple merger in the the Beta-coalescent is effectively zero in that regime.

Sendrowski · 2024-02-05T17:34:39Z

I agree that a relative error of this magnitude might perhaps not seem so surprising (100k replicates in my code example), but it is also important to note that the simulated values are consistently larger than the theoretical ones (I am not taking absolute values). For alpha=1.9, I obtain a relative error of ~2.7% (1e6 replicates, very stable estimate), and for this value we have a (not so extreme) time scaling of about 0.14 relative to the Kingman coalescent. This plot is probably doing a better job (iCloud link in case you still can’t see my images):

JereKoskela · 2024-02-05T18:56:34Z

You're right @Sendrowski, I missed the lack of a modulus sign, and evidently also one zero on the number of replicates. I'll do some more digging. It looks like a very odd bug; the code just calculates the timescale and multiplies it by an Exp(1) random variable. And somehow the resulting mean does not equal the timescale.

Sendrowski · 2024-02-06T08:09:19Z

Yes, very odd indeed... The bias also appears to drop somewhat near alpha=1.99 before shooting up again. I hope it won't be very hard to find out. I didn't debug the C code, so perhaps the time scaling is simply different when calculated there. Or perhaps it is caused by including the incomplete beta function although I could not observe that when translating into native Python.

JereKoskela · 2024-02-06T20:58:58Z

Ok, this turned out to be an insufficiently sensitive polynomial approximation for evaluating (essentially) x^2 / x^2 for very small x. It was using a polynomial approximation for x < 1e-9, which I've now hiked to x < 1e-5. @Sendrowski, your Python script now yields unbiased estimators with errors on both sides of zero.

@jeromekelleher, any comments or are you happy to merge the linked PR? I've rerun the tests in verification.py for the Beta-coalescent and all still look good.

jeromekelleher · 2024-02-07T09:36:43Z

Thanks for sorting this out so quickly @JereKoskela! Change LGTM.

@Sendrowski, are you happy with the changes? Shall we push out a bugfix release?

Sendrowski · 2024-02-07T12:03:22Z

Also looks good to me! Thank you for the quick fix @JereKoskela, and I look forward to meeting you in Warwick in April. As for the release, it’s totally up to you how soon you would like to push it out.

Sendrowski · 2024-02-10T13:42:07Z

Hey again. The bias due to numerical imprecision for values of alpha very close to 2 persists, however. Maybe one should prohibit values greater than 1.99 or so to make sure this won't be causing any problems.

jeromekelleher · 2024-02-11T11:18:44Z

Sorry, still can't see the image @Sendrowski . I think maybe you're uploading a tiff instead of a png or something?

Sendrowski · 2024-02-11T11:38:04Z

Hm it should be a png, but the graph I enclosed actually looks identical to the one I originally posted.

JereKoskela · 2024-02-11T18:25:25Z

Yeah, the timescale will always break close to alpha = 1 or 2. There is no way to avoid that. I suppose we could disallow values very close to the boundaries as a result. I'd like to make sure that 1.01 and 1.99 work even if they come from floating point calculations in some other script, so maybe 1.009 and 1.991?

Sendrowski · 2024-02-11T19:11:00Z

Sounds good! Of note perhaps that I could not observe the same kind of numerical instability for values near 1 — at least for similar distances to the boundary.

JereKoskela · 2024-02-11T19:13:04Z

Thanks, that's useful to know. It's possible that we can get much closer to 1 than 2, but eventually everything will be infinite for alpha too close to 1. I'll see if I can find a reasonable lower bound.

jeromekelleher · 2024-02-12T13:14:18Z

Shall we reopen this issue, or make a new one?

JereKoskela · 2024-02-12T13:21:26Z

I've reopened this one. I'm hoping to get to this within a few days.

JereKoskela mentioned this issue Feb 6, 2024

Numerical stability of beta model #2257

Merged

mergify bot closed this as completed in #2257 Feb 8, 2024

JereKoskela reopened this Feb 12, 2024

JereKoskela mentioned this issue Feb 19, 2024

Disallow unstable beta coalescents #2267

Merged

mergify bot closed this as completed in #2267 Feb 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between expected and observed coalescence time in beta coalescent for large values of alpha #2256

Discrepancy between expected and observed coalescence time in beta coalescent for large values of alpha #2256

Sendrowski commented Feb 5, 2024 •

edited

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

JereKoskela commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

JereKoskela commented Feb 5, 2024

Sendrowski commented Feb 6, 2024

JereKoskela commented Feb 6, 2024 •

edited

jeromekelleher commented Feb 7, 2024

Sendrowski commented Feb 7, 2024

Sendrowski commented Feb 10, 2024

jeromekelleher commented Feb 11, 2024

Sendrowski commented Feb 11, 2024

JereKoskela commented Feb 11, 2024

Sendrowski commented Feb 11, 2024

JereKoskela commented Feb 11, 2024

jeromekelleher commented Feb 12, 2024

JereKoskela commented Feb 12, 2024

Discrepancy between expected and observed coalescence time in beta coalescent for large values of alpha #2256

Discrepancy between expected and observed coalescence time in beta coalescent for large values of alpha #2256

Comments

Sendrowski commented Feb 5, 2024 • edited

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

jeromekelleher commented Feb 5, 2024

JereKoskela commented Feb 5, 2024

Sendrowski commented Feb 5, 2024

JereKoskela commented Feb 5, 2024

Sendrowski commented Feb 6, 2024

JereKoskela commented Feb 6, 2024 • edited

jeromekelleher commented Feb 7, 2024

Sendrowski commented Feb 7, 2024

Sendrowski commented Feb 10, 2024

jeromekelleher commented Feb 11, 2024

Sendrowski commented Feb 11, 2024

JereKoskela commented Feb 11, 2024

Sendrowski commented Feb 11, 2024

JereKoskela commented Feb 11, 2024

jeromekelleher commented Feb 12, 2024

JereKoskela commented Feb 12, 2024

Sendrowski commented Feb 5, 2024 •

edited

JereKoskela commented Feb 6, 2024 •

edited