-
Notifications
You must be signed in to change notification settings - Fork 78
32bit genotypes #2108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
32bit genotypes #2108
Conversation
|
Might be worth doing a check perf check to see what promoting up to 32 bit before going through the painful stuff? Where should be a quick s/i8/i32 change you could do globally that would be easy. |
In tracing all the types we also have to change to 32 in map_mutations so doing the lot and then testing that and genotypes for perf. |
acb15c8 to
5b501e6
Compare
|
@petrelharp @bhaller @molpopgen This PR converts all internal genotype handline to 32bit from 8bit (some bits had optional 32bit which has been removed) Initial testing shows this has no effect on perf, at least when iterating through variants to get the genotypes from python. I still need to do some C tests. |
|
I'm fine with this in principle. I'm still using a lot of 8-bit code on my end, but those code paths are separate. |
|
If I understand what's being proposed, this should be fine. We only use vargen_t for crosschecking the tskit data structures against SLiM (i.e., in debug mode) and when loading a .trees file from disk, I believe. In all places where we currently call tsk_vargen_init() we pass TSK_16_BIT_GENOTYPES; we don't use 8-bit. Is 16-bit still going to be supported, or is it also going away in favor of 32-bit? Is the TSK_16_BIT_GENOTYPES flag going to be deprecated, or honored, or just ignored? Anyhow, I could dig into the diffs here to answer my own questions, but this really ought to be no problem for us, if I'm not totally misunderstanding what is at issue. |
|
Thanks for the questions @bhaller - under the current proposal the 16bit option will be removed and the flag will be ignored, hence why we want to get this sorted for C API 1.0. |
5b501e6 to
ecb48d2
Compare
Codecov Report
@@ Coverage Diff @@
## main #2108 +/- ##
==========================================
- Coverage 93.37% 93.33% -0.05%
==========================================
Files 27 27
Lines 25591 25538 -53
Branches 1163 1163
==========================================
- Hits 23895 23835 -60
- Misses 1666 1673 +7
Partials 30 30
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
|
Python: This branch: 129s C: This branch: 312s What I don't understand is why the python is so much faster. Must be a mistake right? Looking into it. |
|
Ok think I have sorted my issues out. |
|
No noticeable impact on ram, as you would expect for iterating. Just needs a changelog. |
OK, good to know. Full steam ahead; will be a trivial fix when we decide to merge the next tskit version, I think. Thanks. |
|
@jeromekelleher What sort of info did you want from Here's the summary:
The flame graph from both is almost 100% |
|
Looks great! The first ~10 lines of the |
|
Cool, here's that for main: and this branch: Seems to be hardly any difference! |
|
OK, so there's a very slight increase in the time spent in I'm sold then, let's make the change! |
e60a7f4 to
623326e
Compare
|
@jeromekelleher Need an approving review here for the merge. |
623326e to
ad1e017
Compare


Fixes #463
WIP, getting some odd, unrelated test failures. Trying on CI.