Skip to content

Conversation

@benjeffery
Copy link
Member

Fixes #651

Needs python test work now.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not sure the verification logic is correct.

@benjeffery benjeffery marked this pull request as ready for review July 22, 2020 11:10
@benjeffery
Copy link
Member Author

@jeromekelleher RFR! Happy to hear suggestions for further tests or docs I missed.

@benjeffery
Copy link
Member Author

Actually, hold fire - seems windows wants to put unknown first and glibc wants to put it last.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments.

@benjeffery
Copy link
Member Author

So for the current cmp_mutation in this PR if you have three mutations with times 4, UNK, 5 (and hence ids 0,1,2) then you get:
5<4, 4<U, U<5 which is inconsistent. With glibc this results in 4,5,U and windows does U,4,5. We could fix this by saying unknown times are either + or - infinity. BUT that might mean we broke any ordering that was present due to the "parents first" rule. I'm not sure how to fix this as check for a parent chain in cmp_mutation sounds expensive.

@jeromekelleher
Copy link
Member

Is this because of stable and unstable sorting algs in the libc implementations @benjeffery?

@jeromekelleher
Copy link
Member

Ugh, this is nasty. The only simple way out of this I can see is to disallow mixed known and unknow times at a site. That way we should never have to compare unknown times with anything. It's the edgiest of edge-cases, in any case (who is actually going to do this?).

@petrelharp, any thoughts here?

It's great this has come up now - testing works!

@jeromekelleher
Copy link
Member

@petrelharp, we could use your input here. Do you object to disallowing mixed known and unknown times at a site? Otherwise, we get into undefined behaviour because or sort keys are ill defined, as Ben has shown here.

@petrelharp
Copy link
Contributor

I agree: disallow mixed missing/nonmissing times. I don't see a good way around it and don't know if a reason we'd need to have mixed times. Ps Sorry for the delay, I am on semi vacation.

@benjeffery
Copy link
Member Author

No worries, thanks for having a look. We can allow mixed known/unknown as long as sort isn't called. Or is it might be cleaner to have it be a table-level requirement.

@jeromekelleher
Copy link
Member

We can allow mixed known/unknown as long as sort isn't called. Or is it might be cleaner to have it be a table-level requirement.

Let's enforce it as a table-level requirement. It's just hassle for us otherwise.

@codecov
Copy link

codecov bot commented Jul 27, 2020

Codecov Report

Merging #726 into master will decrease coverage by 0.03%.
The diff coverage is 77.41%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #726      +/-   ##
==========================================
- Coverage   87.69%   87.66%   -0.04%     
==========================================
  Files          24       24              
  Lines       19323    19346      +23     
  Branches     3618     3624       +6     
==========================================
+ Hits        16946    16960      +14     
- Misses       1292     1297       +5     
- Partials     1085     1089       +4     
Flag Coverage Δ
#c_tests 85.27% <77.41%> (-0.06%) ⬇️
#python_c_tests 90.12% <ø> (+<0.01%) ⬆️
#python_tests 99.01% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/core.h 100.00% <ø> (ø)
python/tskit/tables.py 99.68% <ø> (ø)
c/tskit/tables.c 81.17% <74.07%> (-0.11%) ⬇️
c/tskit/core.c 94.08% <100.00%> (+0.04%) ⬆️
python/_tskitmodule.c 82.99% <0.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b78da2...94b01d8. Read the comment docs.

@benjeffery benjeffery force-pushed the sort-mutation-time branch from ddc30d7 to 94b01d8 Compare July 27, 2020 13:45
@benjeffery
Copy link
Member Author

Seems to be some issues with coverage reporting as C lines I know are covered from python are not showing as such.

@benjeffery benjeffery merged commit 4e6809a into tskit-dev:master Jul 27, 2020
@benjeffery benjeffery deleted the sort-mutation-time branch July 27, 2020 14:04
@mufernando
Copy link
Member

nice work on this @benjeffery, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add requirement that mutations are sorted by time

4 participants