-
Notifications
You must be signed in to change notification settings - Fork 78
Put metadata and schemas in dict representations #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
38a6b70 to
3dcdaf5
Compare
Codecov Report
@@ Coverage Diff @@
## master #687 +/- ##
==========================================
- Coverage 87.71% 87.42% -0.29%
==========================================
Files 23 23
Lines 17810 17963 +153
Branches 3526 3575 +49
==========================================
+ Hits 15622 15705 +83
- Misses 1073 1091 +18
- Partials 1115 1167 +52
Continue to review full report at Codecov.
|
|
I'm missing some coverage, don't merge for now. |
3dcdaf5 to
5a161d9
Compare
f813721 to
6978388
Compare
|
@jeromekelleher This is now fully-backward compatible with older |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @benjeffery. I think there's a memory leak in the dict_encoding. Can you run stress_lowlevel and verify please? Otherwise, I think we need to get a little more test coverage on the behaviour when metadata_schema etc is missing from the dict encoding (not mapped to None, but not in the dictionary at all).
python/tskit/tables.py
Outdated
| """ | ||
| self.ll_table.append_columns( | ||
| dict(metadata=metadata, metadata_offset=metadata_offset) | ||
| dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be able to just leave metadata_schema out of this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't as this is calling the low-level method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be OK now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep gone in 682b97d
|
@jeromekelleher Thanks, think I have addressed them all. Sorry there were quite a few! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments above @benjeffery. I think it's probably worth manually pasting in a few small examples of dict encoded output from msprime and tsinfer, just to make sure that we can actually parse these successfully.
As I was going through this, I realised that the dict encoding should really be versioned. Maybe we should add a key "encoding_version" = (major,minor) to the dictionary, which (if not present) defaults to 1.0. That way we can think a bit more clearly about how this encoding changes over time.
(Obviously need to be careful that this doesn't break old versions of tskit, though. Should be OK, shouldn't it?)
python/tskit/tables.py
Outdated
| """ | ||
| self.ll_table.append_columns( | ||
| dict(metadata=metadata, metadata_offset=metadata_offset) | ||
| dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be OK now, right?
|
I've added a version in 37f2bd6. I've used 1.1 as 1.0 is the verison that had no version. In your comment you mention setting a default to 1.0 - I think this would go in code that is reading back the version, which we're not doing yet as we're not using it. |
|
I've added a hardcoded msprime example - but I can't see where in tsinfer it is making dicts. A search for "asdict" fails in the tsinfer source. |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, few small changes but ready for final review then I think?
|
@jeromekelleher Ok, let me know if good for a squash. |
|
I'll give one more look over once #687 is merged @benjeffery, but I'd be surprised if anything else shows up at this point. |
|
#687 is this PR? |
🤦 |
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, squash'n'merge!
a8d8365 to
1218d25
Compare
1218d25 to
d771988
Compare
Fixes #686
This is backward compatible with existing dict representations as schemas (and top-level metadata and schema) are not required on
fromdictand if empty are not included inasdict.Note that this also adds a
metadata_schemaargument toset_columns.