-
Notifications
You must be signed in to change notification settings - Fork 78
Metadata docs #741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata docs #741
Conversation
|
Ok, I've treid to address all the comments and have done a read through. Ready for a check over @jeromekelleher and @petrelharp! |
Codecov Report
@@ Coverage Diff @@
## master #741 +/- ##
=======================================
Coverage 87.72% 87.72%
=======================================
Files 24 24
Lines 19394 19400 +6
Branches 3640 3640
=======================================
+ Hits 17013 17019 +6
Misses 1290 1290
Partials 1091 1091
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
jeromekelleher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some minor comments.
|
@jeromekelleher Fixed up. I think we wait for @petrelharp or @molpopgen to have a look before merging. |
| which created the tree-sequence file, as the exact metadata | ||
| stored will vary depending on the use case. Subsequent processes can add or modify the schemas | ||
| if they wish to add or modify to the types (or encoding) of the metadata. Most users of tree-sequence | ||
| files will not need to modify the schemas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add "The purpose of the schema is to say what information is stored in each metadata record, and how it is stored." - maybe after the first sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 837e0e8
docs/metadata.rst
Outdated
| This codec places extra restrictions on the schema: | ||
|
|
||
| #. Each property must have a ``binaryFormat`` | ||
| This sets the binary encoding for used for the property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"for used for"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0b31382
petrelharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Something that I didn't see addressed is: what if the metadata don't match the schema? The tree sequence will be loaded (since metadata are decoded lazily), but you'll get errors when trying to decode it; and to figure out what's going on (and maybe retrieve the metadata) you might need to set the schema to the null schema and look at the raw bytes.
Oh, and also: should it be said somewhere that the metadata schema is stored as a string in the underlying tables? And that if created in python, this can be viewed by str(schema)?
| To determine the binary encoding of each property in the metadata the ``binaryFormat`` key is used. | ||
| This describes the encoding for each property using ``struct`` | ||
| `format characters <https://docs.python.org/3/library/struct.html#format-characters>`_. | ||
| For example an unsigned 8-byte integer can be specified with:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably acknowledge some of the below as being modified from the struct docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 3d63b49
docs/metadata.rst
Outdated
| The codec stores the length of the array before the array data. The format used for the | ||
| length of the array can be chosen with ``arrayLengthFormat`` which must be one | ||
| of ``B``, ``H``, ``I``, ``L`` or ``Q`` which have the same meaning as in the numeric | ||
| types above. ``Q`` is the default. As an example:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #769
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in fecbe45 with added error message on overflow.
docs/metadata.rst
Outdated
| "codec": "struct", | ||
| "type": "object", | ||
| "properties": { | ||
| "accession_number": {"type": "number", "binaryFormat": "i"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 8b80596
docs/metadata.rst
Outdated
| This schema states that the metadata for each row of the table | ||
| is an object consisting of two properties. Property ``accession_number`` is a number | ||
| (stored as a 4-byte int) which must be specified (it is included in the ``required`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they're both required, since it's struct, and there isn't a required list any more (should there be, to emphasize?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4dfd481
docs/tutorial.rst
Outdated
|
|
||
| 'Bob1234' | ||
| To modify the metadata of rows in tables use the :ref:`sec_tutorial_metadata_bulk`. | ||
| to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing end of the sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 004263a
docs/tutorial.rst
Outdated
| accessed with ``ts.metadata`` and ``ts.metadata_schema``. | ||
|
|
||
| If there is no schema for a table or top-level metadata, then no decoding is performed | ||
| and ``bytes`` will be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "no schema" the same as MetadataSchema(None)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 2f5f18e
docs/tutorial.rst
Outdated
| tables.individuals.metadata_schema = schema | ||
| Now that the table has a schema calls to | ||
| This will overwrite any existing schema. Now that the table has a schema calls to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it should be mentioned here that assigning the schema does not validate all pre-existing metadata entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note added in a84fefb
Yes you'll likely get some hairy message from the decoder, or worse silently corrupted or missing data. There's not much that can be done apart from encouraging use of |
|
Thanks for the great comments @petrelharp I think I've addressed them all. |
petrelharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks!!
|
Looks like this is ready to go after a squash @benjeffery? |
0b31382 to
5240745
Compare
|
I've squashed to two commits: one docs, one code change. Thanks for everyone's input to the metadata API and docs! |
Closes #603, #769