Add Tree.generate_random() method

It's now trivial to add a `tskit.Tree.generate_random()` method for generating random bifurcating trees with equal probability for each topology. All we have to do is:

```
def generate_random(self, num_leaves, random_seed=None):
    return self.generate_star(num_leaves).split_polytomy(random_seed)
```

And we probably don't need another unit test for this, because we already test it [here](https://github.com/tskit-dev/tskit/blob/71beec57e14720346973a6a62be94d368c6ef03a/python/tests/test_combinatorics.py#L1100), so all we need to do (IMO) is change that function to `generate_random()` rather than `split_polytomies()`, and we should be covered.

The arguments for this (taken from https://github.com/tskit-dev/tskit/pull/1030#issuecomment-733721001) are

1. It allows tskit to produce random trees without relying on an external dependency. That could be useful e.g. for SLiM testing, or what have you.
2. It's the sort of thing that a phylogeneticist might expect in a phylogenetic library (if we are trying to appeal to that audience)
3. The distribution of coalescence tree topologies is not equiprobable - see below - the top plot is msprime topologies, the bottom is from split_polytomy.

```python
import matplotlib.pyplot as plt
import collections

import tqdm
import msprime
import tskit

msprime_counts = collections.defaultdict(int)
split_poly_counts = collections.defaultdict(int)
for s in tqdm.trange(1, 10000):
    msprime_counts[msprime.simulate(5, random_seed=s).first().rank()] += 1

for s in tqdm.trange(1, 10000):
    split_poly_counts[tskit.Tree.generate_star(5).split_polytomies(random_seed=s).rank()] += 1

fig, axes = plt.subplots(2)
axes[0].bar(list(range(len(msprime_counts))), msprime_counts.values())
axes[1].bar(list(range(len(split_poly_counts))), split_poly_counts.values())
```

![image](https://user-images.githubusercontent.com/4699014/100236620-ce843b80-2f25-11eb-88be-ed11357b869c.png)

I'm happy to submit this as a PR if the approach I describe is the right one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Tree.generate_random() method #1033

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Tree.generate_random() method #1033

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions