Skip to content

Conversation

@jeromekelleher
Copy link
Member

In favour of the tsk_table_collection_equals_with_options introduced in #897.

@jeromekelleher jeromekelleher marked this pull request as draft October 20, 2020 14:33
@codecov
Copy link

codecov bot commented Oct 20, 2020

Codecov Report

Merging #917 into main will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #917      +/-   ##
==========================================
- Coverage   93.43%   93.39%   -0.04%     
==========================================
  Files          25       25              
  Lines       20134    20209      +75     
  Branches      808      808              
==========================================
+ Hits        18812    18874      +62     
- Misses       1285     1299      +14     
+ Partials       37       36       -1     
Impacted Files Coverage Δ
c/tskit/tables.c 90.63% <100.00%> (+0.04%) ⬆️
_tskitmodule.c 90.20% <0.00%> (-0.21%) ⬇️
tskit/tables.py 99.70% <0.00%> (+0.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33409ef...5cbf581. Read the comment docs.

@AdminBot-tskit
Copy link
Collaborator

📖 Docs for this PR can be previewed here

@jeromekelleher
Copy link
Member Author

It would be good to get some feedback here at this point @benjeffery @petrelharp @mufernando. Basically, I've added an "options" argument to all the tables and made metadata comparisons optional now. I've also added an option for ignoring provenance timestamps, which will be handy when comparing simulation outputs.

Questions:

  1. Do we want the TSK_CMP_NO_TS_METADATA option now? Seems like just saying TSK_CMP_NO_METADATA and not ignoring all the metadata is probably what we want, most of the time?
  2. Should we also have an option to ignore the metadata schemas? If so, how does this interact with the TSK_CMP_NO_METADATA flag? Probably the simplest is the have TSK_CMP_NO_METADATA just ignore metadata and TSK_CMP_NO_METADATA_SCHEMA just ignore the schema?
  3. What do we do at the Python level? Perhaps it would be better to have
def equals(self, other, metadata=True, provenance=True, provenance_timestamp=True):

rather than the current ignore_metadata=False or (worse) no_metadata=False?

@petrelharp
Copy link
Contributor

1. Do we want the `TSK_CMP_NO_TS_METADATA` option now? Seems like just saying `TSK_CMP_NO_METADATA` and not ignoring all the metadata is probably what we want, most of the time?

Well, in the use case of union'ing SLiM tree sequences, we do want to just ignore top-level metadata, so... yes?

2. Should we also have an option to ignore the metadata schemas? If so, how does this interact with the `TSK_CMP_NO_METADATA` flag? Probably the simplest is the have TSK_CMP_NO_METADATA just ignore metadata and TSK_CMP_NO_METADATA_SCHEMA just ignore the schema?

Well, ignoring metadata should ignore schemas, but I don't think we should ignore schemas if we compare metadata. It would maybe be useful to be able to ask to compare equivalency of schemas rather than equiality at the bytes level, but that's another can of worms (to do it in C anyhow). I'd say skip it unless we actually need it?

3. What do we do at the Python level? Perhaps it would be better to have
def equals(self, other, metadata=True, provenance=True, provenance_timestamp=True):

rather than the current ignore_metadata=False or (worse) no_metadata=False?

Sounds good, plus maybe top_level_metadata=metadata?

@mufernando
Copy link
Member

This all looks good to me. I agree that we need TSK_CMP_NO_TS_METADATA for top-level only metadata/schema.

@jeromekelleher jeromekelleher marked this pull request as ready for review October 24, 2020 16:51
@jeromekelleher
Copy link
Member Author

I think this is ready for final review here. I ended up going back to the original "IGNORE_X" names, as the semantics got confusing when we changed back and forth between positive and negative in Python and C. "Ignore" is a bit longer, but I think it's quite clear what the flags go then.

Speaking of semantics - are we all happy that ignore_metadata implies ignore_ts_metadata, and the latter has no effect when ignore_metadata=True?

Copy link
Member

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the semantics we have in this PR now. Just one minor doc suggestion and a broken windows test that is easily (if somewhat unsatisfyingly) fixed.

@mufernando
Copy link
Member

LGTM!

@jeromekelleher
Copy link
Member Author

OK, great, let's get this one in then.

@mergify mergify bot merged commit fc5e58e into tskit-dev:main Oct 27, 2020
@jeromekelleher jeromekelleher deleted the options-for-equals branch October 27, 2020 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants