diff --git a/c/tskit/trees.h b/c/tskit/trees.h index 21495edbf7..499fd4e8c0 100644 --- a/c/tskit/trees.h +++ b/c/tskit/trees.h @@ -933,17 +933,20 @@ path from `p` to `c`. For instance, if `p` is the parent of `n` and `n` is the parent of `c`, then the span of the edges from `p` to `n` and `n` to `c` are extended, and the span of the edge from `p` to `c` is reduced. However, any edges whose child node is a sample are not -modified. The `node` of certain mutations may also be remapped; to do this +modified. See Fritze et al. (2025): +https://doi.org/10.1093/genetics/iyaf198 for more details. + +The method works by iterating over the genome to look for edges that can +be extended in this way; the maximum number of such iterations is +controlled by ``max_iter``. + +The `node` of certain mutations may also be remapped; to do this unambiguously we need to know mutation times. If mutations times are unknown, use `tsk_table_collection_compute_mutation_times` first. The method will not affect any tables except the edge table, or the node column in the mutation table. -The method works by iterating over the genome to look for edges that can -be extended in this way; the maximum number of such iterations is -controlled by ``max_iter``. - @rst **Options**: None currently defined. diff --git a/python/tskit/trees.py b/python/tskit/trees.py index 1e31048075..2ff90747e4 100644 --- a/python/tskit/trees.py +++ b/python/tskit/trees.py @@ -7395,7 +7395,9 @@ def extend_haplotypes(self, max_iter=10): `n` to `c` are extended, and the span of the edge from `p` to `c` is reduced. Thus, the ancestral haplotype represented by `n` is extended to a longer span of the genome. However, any edges whose child node is - a sample are not modified. + a sample are not modified. See + `Fritze et al. (2025) `_ + for more details. Since some edges may be removed entirely, this process usually reduces the number of edges in the tree sequence. @@ -7418,15 +7420,15 @@ def extend_haplotypes(self, max_iter=10): known mutation times. See :meth:`.impute_unknown_mutations_time` if mutation times are not known. - The method will not affect the marginal trees (so, if the original tree - sequence was simplified, then following up with `simplify` will recover - the original tree sequence, possibly with edges in a different order). - It will also not affect the genotype matrix, or any of the tables other - than the edge table or the node column in the mutation table. + .. note:: + The method will not affect the marginal trees (so, if the original tree + sequence was simplified, then following up with `simplify` will recover + the original tree sequence, possibly with edges in a different order). + It will also not affect the genotype matrix, or any of the tables other + than the edge table or the node column in the mutation table. - :param int max_iters: The maximum number of iterations over the tree + :param int max_iter: The maximum number of iterations over the tree sequence. Defaults to 10. - :return: A new tree sequence with unary nodes extended. :rtype: tskit.TreeSequence """