Skip to content

Conversation

@jeromekelleher
Copy link
Member

Adds a "direction" argument to edge_diffs to allow us to sequentially generate trees in reverse.

Moves the implementation to Python resulting in a roughly 2X perf regression. I doubt this will be significant in practise, because people will usually be doing something with the diffs, which will probably be more expensive. Nonetheless, it is a regression so we should be careful.

If we decide we don't mind about the perf loss for this operation, we can delete some of the Python-C interface boilderplate.

I also modernised the tests for this a bit, while I was in there.

cc @astheeggeggs

@codecov
Copy link

codecov bot commented Feb 4, 2022

Codecov Report

Merging #2120 (0d47adb) into main (bf1603d) will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2120      +/-   ##
==========================================
+ Coverage   93.22%   93.29%   +0.06%     
==========================================
  Files          28       28              
  Lines       26718    26764      +46     
  Branches     1204     1227      +23     
==========================================
+ Hits        24908    24969      +61     
+ Misses       1776     1762      -14     
+ Partials       34       33       -1     
Flag Coverage Δ
c-tests 92.24% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 71.20% <60.86%> (-0.29%) ⬇️
python-tests 98.86% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/_tskitmodule.c 90.66% <ø> (+0.04%) ⬆️
python/tskit/metadata.py 99.01% <100.00%> (+0.25%) ⬆️
python/tskit/tables.py 98.92% <100.00%> (+0.03%) ⬆️
python/tskit/trees.py 98.33% <100.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf1603d...0d47adb. Read the comment docs.

@hyanwong
Copy link
Member

hyanwong commented Feb 4, 2022

Moves the implementation to Python resulting in a roughly 2X perf regression. I doubt this will be significant in practise, because people will usually be doing something with the diffs, which will probably be more expensive. Nonetheless, it is a regression so we should be careful.

I'm fairly sure that we use the edge diffs a lot in tsdate, don't we @awohns ? Perhaps it's worth testing if this causes any perf problems there (I suspect not though, as we are doing some maths on each diff)

@awohns
Copy link
Member

awohns commented Feb 4, 2022

That's right @hyanwong, we use the edge diffs a lot. We iterate over them in both the prior and the inside/outside algorithm. Might be good to check if this has a significant effect on tsdate runtime?

@jeromekelleher
Copy link
Member Author

That's a good test I think - if this doesn't have a significant impact on a time of a run of tsdate, then I think getting rid of the C linkage is fine, and a small perf regression is something we can live with. If it makes tsdate 2x slower though, then I think we'll have to keep the current C-based implementation for the forwards direction anyway.

@jeromekelleher
Copy link
Member Author

@benjeffery - should be push this in for 0.5.0?

@awohns - what's the simplest way of testing out the perf implications for tsdate?

@jeromekelleher jeromekelleher added this to the Python 0.5.0 milestone May 20, 2022
@benjeffery
Copy link
Member

If we are doing this (which seems the right thing to do I think) then yes it should be in 0.5.0.

@petrelharp
Copy link
Contributor

Oh, this would be nice! We were just wanting this!

@awohns
Copy link
Member

awohns commented May 26, 2022

Results of some perf testing on this:
ts.edge_diffs() is used in two places in tsdate. The more expensive usage is constructing the prior. Here are the runtimes for that operation on an inferred tree sequence from 1000 genomes chromosome 22. Current is the current implementation of edge_diffs() and Direction is the implementation in this PR:

Current: 6 mins 44 seconds. Direction: 7mins 41 seconds.

@jeromekelleher
Copy link
Member Author

Right, that's too much of a difference. Will need to think this through again - thanks @awohns!

@jeromekelleher
Copy link
Member Author

Hmm, so looks like most of the time for this code is actually spent constructing edge objects and the difference is really about the different code paths we take for that now. Some stats:

  • C code path, yield None (i.e., don't create any edges): 0.98 seconds
  • Python code path, yield None: 1.91 seconds
  • C code path, yield lists of tuples: 1.19 seconds
  • Python code path, yield lists of tuples: 4.05 seconds
  • C code path, yield list of locally constructed Edge: 7.16 seconds (This is the current implementation in main)
  • Python code yield list of Edge using self.edge(): 13.66 (implementation Wilder tried out)
  • Python code yielding list of locally constructed Edge: 10.55 seconds

So, I think there's room for quite a bit of improvement here and we did have a pretty big perf regression.

I wonder if edge metadata is really worth it here, and if we should do something like this:

# fully locked down dataclass
class DiffEdge:
     left: float
     right: float
     parent: int
     child: int
     id: int
    
     @property
     def metadata(self):
          raise ValueError("Metadata access not supported on edge diffs. Please use ``ts.edge(e.id).metadata`` instead")

I think if we can get back to the performance of yielding lists of tuples, we can switch the the Python code path and get rid of some C complexity, as this will be a perf improvement over what's currently in the library.

@benjeffery, any thoughts?

@benjeffery
Copy link
Member

benjeffery commented Jun 7, 2022

I've pushed an additional commit here which removes a function call in the Edge creation code path.

Here are the timings on my machine for 100k trees of 1.6M edges:

main 3s
This PR before commit (ts.edge): 7.12s
Edge(*tuple): 4.8s
Edge(*tuple) with own_init: 3.73s

I tried a few more tricks, but they added complexity for only a percent or so, not worth it.

So we're losing about 25% vs main with this python-only code. Also things should be even better with 3.11. I think that is acceptable and we should wrap this up by removing the C code and documenting the function if you agree?

@jeromekelleher
Copy link
Member Author

SGTM. Would the DiffEdge trick I suggested above make any difference or is it basically the same as the current code?

@benjeffery
Copy link
Member

Using a simple DiffEdge as you described above gets you to 3.55 vs 3.73. It's not nothing, but I don't think that is a big enough win to justify adding a parallel class.

@jeromekelleher
Copy link
Member Author

OK then, I guess we should put this to the end-users. @awohns, would you mind trying this version out on your benchmark and letting us know if you're happy with the perf?

@jeromekelleher
Copy link
Member Author

OK then, I guess we should put this to the end-users. @awohns, would you mind trying this version out on your benchmark and letting us know if you're happy with the perf?

Bumping this one @awohns - as a key user of edge_diffs we'd like to know if the perf drop here is acceptable for your applications. Would you mind checking your benchmark again? (It's also probably relevant for your "bricking" code). We'd like to release 0.5.0 in the next few days.

@awohns
Copy link
Member

awohns commented Jun 15, 2022

Sorry, didn't see this until now @jeromekelleher.

So here's the timing for creating the tsdate prior of NYGC 1000 genomes chr22 (same file as before; 453,350 trees):
Current: 6 mins 29 seconds (389 seconds)
Direction: 7 mins 14 seconds (434 seconds)
So it's ~12% slower

And here's the perf for running [naive_split_edges()](https://github.com/awohns/ldgm/blob/a152cb6db42658ab24d973757979af83a1c9c6d5/ldgm/bricks.py#L42) on the first 30 MB of the same tree sequence (148,437 trees):
Current: 8 mins 57 seconds (537 seconds)
Direction: 8 mins 2 seconds (482 seconds)
Which I guess means that edge_diffs() is not really a contributing factor to the overall runtime of this function.

@jeromekelleher
Copy link
Member Author

Thanks @awohns. I think you should have the casting vote here on whether we keep the existing C code or not then - 10% is a significant perf drop, but not a major one and probably wouldn't be noticed. I honestly don't mind either way as we have the code in both cases (we can keep the current C-based approach as the default), I just want someone else to make the decision!

@benjeffery
Copy link
Member

My 2 cents here is that by Python 3.11 we'll probably get that 10% back. If @awohns doesn't want to make the decision I'm happy to!

@awohns
Copy link
Member

awohns commented Jun 16, 2022

Thanks for seeking my feedback @jeromekelleher and for input @benjeffery!
I think that 10% is acceptable from my perspective, especially given that edge_diffs is not a dominant factor in the runtime of either tsdate or ldgm. So I'm happy for this to be merged!

@jeromekelleher
Copy link
Member Author

OK, great! @benjeffery - do you want to do the rest here or will I pick it up?

@benjeffery
Copy link
Member

OK, great! @benjeffery - do you want to do the rest here or will I pick it up?

I can take this from here!

@benjeffery benjeffery marked this pull request as ready for review June 20, 2022 13:33
@jeromekelleher
Copy link
Member Author

whoops, took off the AUTOMERGE label there @benjeffery as I'd left some embarassing intermediate commits.Will squash those now.

@jeromekelleher
Copy link
Member Author

This also still leaves in the C infrastructure as dead code. Will pull out now.

@jeromekelleher
Copy link
Member Author

Documentation isn't finished either. Marking as draft.

@jeromekelleher jeromekelleher marked this pull request as draft June 20, 2022 13:56
@benjeffery
Copy link
Member

whoops, took off the AUTOMERGE label there @benjeffery as I'd left some embarassing intermediate commits.Will squash those now.

Gah, sorry. I didn't think properly about the C and docs.

jeromekelleher and others added 2 commits June 20, 2022 16:14
Moves the implementation to Python resulting in a roughly 2X perf
regression.
@jeromekelleher jeromekelleher marked this pull request as ready for review June 20, 2022 15:18
@jeromekelleher
Copy link
Member Author

OK, should be ready to go now.

@mergify mergify bot merged commit 7a1a1a0 into tskit-dev:main Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants