-
Notifications
You must be signed in to change notification settings - Fork 79
Description
One helpful enhancement to tskit is to create a "brick table" from a tree sequence. A brick table is an augmented edge
table where edges are bifurcated when they have different descendants across adjacent trees. These bifurcated edges can be used to trace tracts of genome which are inherited as units. This is useful in applications where one is interested in patterns of descent where position matters, for instance if looking at which bits of an ancestral haplotype are inherited by its descendants. The term "brick" refers to how the result can be visualised as stacked bricks.
For example, consider the following tree sequence:

Note that the edge from node 6 to node 7 exists in both trees, but has different descendants. The brick version of this tree sequence would bifurcate this edge, resulting in two edges with 6 as the child and 7 as the parent, the first from 0-0.62 and the second from 0.62-1.
I've implemented this in a straightforward manner using ts.trees() and ts.edge_diffs() where we identify nodes with different parents across adjacent trees. We then climb the adjacent trees from the node with different parents, bifurcating edges along this path which straddle both trees.
I'm not sure what the best name for this method is... one idea is to simply call it split_edges. It would also make sense add an option map_edges, along the lines of map_nodes in simplify. If this parameter is specified, the method would returns a tuple containing the resulting tree sequence and a numpy array mapping edge IDs in the current tree sequence to their corresponding edge IDs in the returned tree sequence.
If this sounds interesting to the community, I can create a PR with my implementation.