Skip to content

Conversation

@benjeffery
Copy link
Member

@benjeffery benjeffery commented Nov 16, 2020

Description

Adds options to tsk_table_collection_clear. Also changes the default behaviour by not clearing the provenance table. In the original issue I thought clear should wipe everything - but I've come round to the idea that the most common clearing operation is where you want to retain the provenance and metadata schemas. We should encourage retaining these by default in a table-modifying operation.

Fixes #929

PR Checklist:

  • Tests that fully cover new/changed functionality.
  • Documentation including tutorial content if appropriate.
  • Changelogs, if there are API changes.

@benjeffery benjeffery marked this pull request as draft November 16, 2020 14:29
@benjeffery benjeffery force-pushed the clear-options branch 5 times, most recently from 4aa6642 to 49a5ac5 Compare November 16, 2020 14:44
@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

@codecov
Copy link

codecov bot commented Nov 16, 2020

Codecov Report

Merging #1001 (84b478e) into main (618996b) will increase coverage by 0.01%.
The diff coverage is 83.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1001      +/-   ##
==========================================
+ Coverage   93.65%   93.66%   +0.01%     
==========================================
  Files          26       26              
  Lines       20699    20758      +59     
  Branches      838      838              
==========================================
+ Hits        19385    19444      +59     
  Misses       1277     1277              
  Partials       37       37              
Flag Coverage Δ
c-tests 92.49% <77.77%> (+0.04%) ⬆️
lwt-tests 93.57% <ø> (ø)
python-c-tests 94.82% <92.30%> (-0.01%) ⬇️
python-tests 98.57% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
c/tskit/tables.c 90.75% <77.77%> (+0.09%) ⬆️
python/_tskitmodule.c 91.29% <91.66%> (+<0.01%) ⬆️
python/tskit/tables.py 99.59% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 618996b...84b478e. Read the comment docs.

@benjeffery benjeffery marked this pull request as ready for review November 16, 2020 14:53
@petrelharp
Copy link
Contributor

LGTM! Although - we don't seem to have discussed this in #929 - I would have thought the default would be to clear provenance, since what's the point of keeping provenance in an empty table? Provenance says where the information came from, but if there's no information...

@benjeffery
Copy link
Member Author

My thinking here was based on the existing usages of clear - they both copied the provenance back in after clearing. You can see in this PR I have removed that copy.

Basically you usually clear when you're modifying an existing table, which should retain provenance and metadata.

@jeromekelleher
Copy link
Member

I would have thought the default would be to clear provenance, since what's the point of keeping provenance in an empty table? Provenance says where the information came from, but if there's no information...

This was my initial thought also.

@jeromekelleher
Copy link
Member

My thinking here was based on the existing usages of clear - they both copied the provenance back in after clearing. You can see in this PR I have removed that copy.

I see, that's a good point. What if we changed the flags to KEEP_PROVENANCE and KEEP_METADATA? It seems reasonable to me that the default behaviour would be to nuke everything, and then you can optionally keep some stuff in special cases where you want them?

@petrelharp
Copy link
Contributor

Ok, that's a good point. I guess in practice if we want a totally brand-new table collection then we tend to just re-allocate a new one. So, I agree, what you've done here actually makes more sense. OK, never mind - I vote to keep it the way you've got it!

@jeromekelleher
Copy link
Member

jeromekelleher commented Nov 16, 2020

How does this look at the Python level @benjeffery - did we not provide a TableCollection.clear() method? It might be worth doing this now, as the "direction" the flags need to be in usually becomes clearer after we think through the semantics in Python.

Peter makes a good point too - if we want to totally crispy clean table collection, we can just make a new one, so I'm leaning towards agreeing with what you have.

@benjeffery
Copy link
Member Author

benjeffery commented Nov 16, 2020

How does this look at the Python level @benjeffery - did we not provide a TableCollection.clear() method? It might be worth doing this now, as the "direction" the flags need to be in usually becomes clearer after we think through the semantics in Python.

There is no such method in the Python API - I assume the test suite would benefit from it. I'll have a look for potential usage and add it to this PR - setting back to draft till that is in.

Peter makes a good point too - if we want to totally crispy clean table collection, we can just make a new one, so I'm leaning towards agreeing with what you have.

I also agree with this - what's the point (especially in Python) of a complete clear?

@benjeffery benjeffery marked this pull request as draft November 16, 2020 22:49
@benjeffery benjeffery force-pushed the clear-options branch 2 times, most recently from 60e42a1 to d7e293f Compare November 18, 2020 01:19
@benjeffery benjeffery marked this pull request as ready for review November 18, 2020 01:20
@benjeffery
Copy link
Member Author

Only found one spot in the Python tests to use this - but I still think it is worth having. @jeromekelleher I think this is merge-ready.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify mergify bot merged commit 09bff9c into tskit-dev:main Nov 18, 2020
@benjeffery benjeffery deleted the clear-options branch March 19, 2021 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Should tsk_table_collection_clear remove metadata?

4 participants