Skip to content

Conversation

@benjeffery
Copy link
Member

Fixes #2064

@benjeffery benjeffery force-pushed the null_json_is_empty_dict branch from 08eccbd to 0baa912 Compare January 21, 2022 13:58
@benjeffery benjeffery marked this pull request as ready for review January 21, 2022 15:10
@benjeffery benjeffery force-pushed the null_json_is_empty_dict branch from 0baa912 to f7e33e4 Compare January 21, 2022 15:23
@codecov
Copy link

codecov bot commented Jan 21, 2022

Codecov Report

Merging #2104 (275fa9b) into main (137b645) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2104   +/-   ##
=======================================
  Coverage   93.37%   93.37%           
=======================================
  Files          27       27           
  Lines       25587    25589    +2     
  Branches     1159     1160    +1     
=======================================
+ Hits        23891    23893    +2     
  Misses       1666     1666           
  Partials       30       30           
Flag Coverage Δ
c-tests 92.34% <ø> (ø)
lwt-tests 89.05% <ø> (ø)
python-c-tests 72.11% <33.33%> (-0.02%) ⬇️
python-tests 98.89% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/tskit/metadata.py 99.01% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 137b645...275fa9b. Read the comment docs.

Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this is a win.

We should get more votes on this though, as it's a potentially breaking change. @petrelharp @molpopgen would you mind taking a quick look please?

@benjeffery benjeffery force-pushed the null_json_is_empty_dict branch from f7e33e4 to 7c25823 Compare January 24, 2022 12:47
@benjeffery
Copy link
Member Author

@jeromekelleher Test added

@molpopgen
Copy link
Member

LGTM, I think this is a win.

We should get more votes on this though, as it's a potentially breaking change. @petrelharp @molpopgen would you mind taking a quick look please?

I'm afraid that I haven't been following this one. But in fwdpy11 I either use the codec struct for metadata and the one table where I use JSON has no empty metadata allowed (I think). In tskit-rust, empty metadata is already handled by returning a None type. My only question would be regarding top-level metadata, but returning an empty object seems fine there (for my purposes). I do not allow top-level metadata to be empty upon export from fwdpy11.

@jeromekelleher
Copy link
Member

Thanks @molpopgen, once question:

In tskit-rust, empty metadata is already handled by returning a None type.

Does this conflict with our proposed behaviour of returning the empty object? I guess it must be inconsistent with the idea of defining defaults on top of empty metadata strings like we do here? For example, if your schema has an attribute "a" with default = "abc", then running schema.decode(b"") or schema.decode(b"{}")will both return {"a": "abc"}.

@molpopgen
Copy link
Member

molpopgen commented Jan 27, 2022

Thanks @molpopgen, once question:

In tskit-rust, empty metadata is already handled by returning a None type.

Does this conflict with our proposed behaviour of returning the empty object? I guess it must be inconsistent with the idea of defining defaults on top of empty metadata strings like we do here? For example, if your schema has an attribute "a" with default = "abc", then running schema.decode(b"") or schema.decode(b"{}")will both return {"a": "abc"}.

So this all about Python if I understand correctly? The rust interface sees en empty string on the C side and returns None.

If someone wants to use a rust JSON schema from Python, then they have to do the work to set that up correctly. I have not had time to test any of that. (I think there's an open issue to do so.)

@molpopgen
Copy link
Member

One question did come to mind about these changes. For the struct codec, I am assuming that one receives None when the metadata are empty. That is not changing? If that does change, then it is a breaking change for me.

On the rust side, I see no conflict. Default parameters don't always make sense for compiled languages. We could accommodate such a thing by requiring that metadata compatible types implement the Default trait, but I see little reason to do so. To me, a None is much more clear than to have to check if an object instance is equal to the default value.

@benjeffery
Copy link
Member Author

. For the struct codec, I am assuming that one receives None when the metadata are empty. That is not changing? If that does change, then it is a breaking change for me.

This is correct, not changing and also needed for compatibility with SLIm.

@jeromekelleher
Copy link
Member

Having a quick look through the Rust metadata with @benjeffery, it seems like the Rust metadata mechanisms are concerned with serialising/deserialising data structures within rust rather than being concerted with decoding arbitrary metadata.

What would happen if we tried to read, say, the population metadata from an msprime tree sequence in Rust @molpopgen? Would you just get back a bunch of bytes, or would it try to give something like a Python dictionary of the contents?

@molpopgen
Copy link
Member

Having a quick look through the Rust metadata with @benjeffery, it seems like the Rust metadata mechanisms are concerned with serialising/deserialising data structures within rust rather than being concerted with decoding arbitrary metadata.

What would happen if we tried to read, say, the population metadata from an msprime tree sequence in Rust @molpopgen? Would you just get back a bunch of bytes, or would it try to give something like a Python dictionary of the contents?

You would need to do the work of defining a struct on the rust side to represent the metadata and then use the procedural macros to register that type as being serialized using JSON. Then that should all work, but I've not tested anything. The number of lines of code should be something like the number of metadata fields plus two, where the extra two are the struct name and the metadata registration macro call.

You are not able to call the functions to retrieve metadata unless you provide the name of such a struct.

@molpopgen
Copy link
Member

In principle one could add a function to get the metadata as bytes. But such an interface immediately implies using unsafe code blocks and all that that entails. Raw bytes from C are not compatible with rust strings, and the types used to handle them all required code blocks marked as unsafe. these concerns are why I have not added such interface.

@jeromekelleher
Copy link
Member

OK, sounds good. So, it sounds like these differences between rust and Python and really differences, so there's no point in worrying about it.

I say we merge!

@molpopgen
Copy link
Member

Yes, these are real language differences that force API differences. Strong static typing and trying to prevent insecure code makes rust different.

@molpopgen
Copy link
Member

Just looked at the code and was reminded of something. Accessing a table row object will include Option<Vec<u8>>, which will be None if there are no metadata or it will contain the raw bytes. It is the table_x::metadata function that requires the macro-registered structs that I referred to earlier. This interface does not require unsafe unless certain things are done with the bytes.

@benjeffery
Copy link
Member Author

Merging!

@benjeffery benjeffery force-pushed the null_json_is_empty_dict branch from 7c25823 to 275fa9b Compare February 1, 2022 11:42
@mergify mergify bot merged commit 1e8958b into tskit-dev:main Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

empty metadata should be allowed by the JSON codec

3 participants