Skip to content

Conversation

@petrelharp
Copy link
Contributor

This will close #1713. Since we already have all the TSK_CHECK_X_ORDERING flags we needed (except X=MIGRATIONS), I just went through to make sure that the things checked under CHECK_ORDERING were all ordering-related. They were, except for mutations. So, I've moved a bunch of consistency checks out of TSK_CHECK_MUTATION_ORDERING, i.e., checks that weren't previously done by default that now are: This turned up a few nonsensical tables in the tests. But we should make sure it's actually what we want to do? These checks are:

  • mutation time younger than node
  • mixed known/unknown times at a site
  • site of parent mutation agrees with site

Other notes:

  • I don't think we need a TSK_CHECK_ORDERING flag, since you get that from TSK_CHECK_TREES, and I don't think anyone's going to want to check ordering without checking the other optional properties.
  • Hm, however, TSK_CHECK_TREES also implies running tsk_table_collection_check_tree_integrity, which is more expensive (it builds trees; everything else is a single pass through the tables). So, maybe we do want a generic CHECK_ORDERING?
  • There was no previous check for migration ordering; I've implemented it (although we don't have a sort for it yet...)
  • I've also made the printing out of tables a tad nicer (it was hard for me to find the table I wanted)

@codecov
Copy link

codecov bot commented Sep 17, 2021

Codecov Report

Merging #1722 (f20fae8) into main (1a26153) will increase coverage by 1.54%.
The diff coverage is 100.00%.

❗ Current head f20fae8 differs from pull request most recent head 71b09dd. Consider uploading reports for the commit 71b09dd to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1722      +/-   ##
==========================================
+ Coverage   91.82%   93.36%   +1.54%     
==========================================
  Files          10       27      +17     
  Lines       12801    24250   +11449     
  Branches        0     1093    +1093     
==========================================
+ Hits        11754    22642   +10888     
- Misses       1047     1573     +526     
- Partials        0       35      +35     
Flag Coverage Δ
c-tests 92.10% <100.00%> (+<0.01%) ⬆️
lwt-tests 89.25% <ø> (ø)
python-c-tests 94.51% <ø> (?)
python-tests 98.75% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/core.h 100.00% <ø> (ø)
c/tskit/core.c 97.78% <100.00%> (+0.01%) ⬆️
c/tskit/tables.c 90.19% <100.00%> (+0.01%) ⬆️
c/tskit/trees.c 94.86% <0.00%> (-0.01%) ⬇️
c/tskit/stats.c 85.15% <0.00%> (ø)
python/_tskitmodule.c 91.48% <0.00%> (ø)
python/tskit/__main__.py 0.00% <0.00%> (ø)
python/tskit/text_formats.py 100.00% <0.00%> (ø)
python/tskit/__init__.py 100.00% <0.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a26153...71b09dd. Read the comment docs.

@petrelharp
Copy link
Contributor Author

I've tested this increased checking with SLiM (the treerec tests and pyslim), and nothing bad happened - do you see anything we wouldn't want to do each time we simplify, @bhaller?

@bhaller
Copy link

bhaller commented Sep 18, 2021

I've tested this increased checking with SLiM (the treerec tests and pyslim), and nothing bad happened - do you see anything we wouldn't want to do each time we simplify, @bhaller?

I'm quite out of my depth on this, I would say. :-> But how's this: if you make a PR for SLiM that pulls these changes in, with the appropriate changes to the places in SLiM where we do integrity checks, then I'll test it on my side with my big test suite, with crosschecks enabled, etc.? And I'll also make sure that it would catch Elissa's bug even with coalescence checking disabled, in DEBUG mode at least. Sound good?

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - minor nitpicks

c/tskit/tables.c Outdated
goto out;
}
/* Check if time is nonfinite */
/* Check if time is nonfinite and less recent than node time */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this a few times - should be or less recent, right?

c/tskit/tables.c Outdated

/* Check known/unknown times are not both present on a site */
if (unknown_time) {
num_unknown_times += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_unknown_times++; is a bit more C idiomatic

c/tskit/tables.c Outdated
}
}

/* reset checks if needed */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't immediately obvious what the "if needed" meant here. Maybe "Reset counters for the next site". Also, can make it a bit more concise by

if (j > 0 && mutations.site[j - 1] != mutations.site[j]) {
    ...
}

this is guaranteed to be safe as short circuiting booleans is part of the C standard.

c/tskit/tables.c Outdated
}

/* Check time value ordering */
/* Check time ordering, we do this after the time checks above, so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't quite parse for me. Maybe "Check time ordering. We do..."

@bhaller
Copy link

bhaller commented Sep 19, 2021

Passes all checks on my side. Thumbs up.

@petrelharp
Copy link
Contributor Author

Thanks - fixed up those things (and the docs). After I fix whatever github's version of clang-format tells me to do (since it differs from my local one) and push again this should be ready to go. I noticed that the checks for mixing up known and unknown mutation times within sites don't actually work if the tables aren't ordered, but I also decided we don't care.

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just needs a changelog @petrelharp

@petrelharp
Copy link
Contributor Author

Changelog added!

@benjeffery
Copy link
Member

Woop! Merging.

@mergify mergify bot merged commit c830df1 into tskit-dev:main Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add TSK_NO_CHECK_ORDERING to flags for table_collection_check_consistency

4 participants