Characterise propagation up the tree performance with seeking #2792
Labels
enhancement
New feature or request
Performance
This issue addresses performance, either runtime or memory
#2786 added the new
tsk_tree_position_t
class which takes care of tracking the edges that need to go in and out of a tree in order to transform it into a target tree. This works really well for the basicnext
andprev
operations, and there is an implementation ofseek_forward
, which gives a straightforward way of transforming a tree into any other tree in the sequence, touching the minimum number of edges. It's worth looking at some of the key bits of code here:Here we fill in the
in_range
andout_range
ranges of indexes into the left and right edge sortings which must be examined. We can be sure that we don't need to look at any edges outside of these ranges. Then, on the client side, we do something like this:I think this works quite well, and should correspond to something that works well in practise.
HOWEVER there is a problem: the edges that are removed and inserted are not necessarily in time-sorted order. This is a basic assumption that we lean on for incremental algorithms because it guarantees postorder-like behaviour, where we do the minimum amount of work in order to propagate information up the tree. In pathological cases, I think this could lead to O(tree height^2) performance on incremental algorithms that propagate information up the tree (like sample counting).
Unfortunately, this also applies to the current implementation of seek_from_null. Since we see great performance for this in benchmarks, I guess this means that either we've turned off sample counting for these benchmarks (which I doubt), or we have the possibility for some extra performance. See #2661 for more details.
I think the simplest thing to do initially would be to implement
seek_from_null
using thetree_pos
based approach outlined above, and to defer propagating sample counts until after all the edges have been inserted. We can then do it using a standard postorder algorithm (which is as good as you're going to do anyway, if you insert the edges in the right time-order way).How we approach the problem for the general seeking around the tree case, is more subtle. I guess we could either
The text was updated successfully, but these errors were encountered: