-
Notifications
You must be signed in to change notification settings - Fork 79
Closed
Description
It's now trivial to add a tskit.Tree.generate_random() method for generating random bifurcating trees with equal probability for each topology. All we have to do is:
def generate_random(self, num_leaves, random_seed=None):
return self.generate_star(num_leaves).split_polytomy(random_seed)
And we probably don't need another unit test for this, because we already test it here, so all we need to do (IMO) is change that function to generate_random() rather than split_polytomies(), and we should be covered.
The arguments for this (taken from #1030 (comment)) are
- It allows tskit to produce random trees without relying on an external dependency. That could be useful e.g. for SLiM testing, or what have you.
- It's the sort of thing that a phylogeneticist might expect in a phylogenetic library (if we are trying to appeal to that audience)
- The distribution of coalescence tree topologies is not equiprobable - see below - the top plot is msprime topologies, the bottom is from split_polytomy.
import matplotlib.pyplot as plt
import collections
import tqdm
import msprime
import tskit
msprime_counts = collections.defaultdict(int)
split_poly_counts = collections.defaultdict(int)
for s in tqdm.trange(1, 10000):
msprime_counts[msprime.simulate(5, random_seed=s).first().rank()] += 1
for s in tqdm.trange(1, 10000):
split_poly_counts[tskit.Tree.generate_star(5).split_polytomies(random_seed=s).rank()] += 1
fig, axes = plt.subplots(2)
axes[0].bar(list(range(len(msprime_counts))), msprime_counts.values())
axes[1].bar(list(range(len(split_poly_counts))), split_poly_counts.values())I'm happy to submit this as a PR if the approach I describe is the right one.
Metadata
Metadata
Assignees
Labels
No labels
