Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional fields don't use default values when unset #82

Closed
lukebakken opened this issue Aug 31, 2016 · 13 comments
Closed

Optional fields don't use default values when unset #82

lukebakken opened this issue Aug 31, 2016 · 13 comments

Comments

@lukebakken
Copy link
Contributor

lukebakken commented Aug 31, 2016

Message definition from here:

message DtFetchReq {
    required bytes bucket = 1;
    required bytes key    = 2;
    required bytes type  = 3;
    optional uint32 r             =  4;
    optional uint32 pr            =  5;
    optional bool   basic_quorum  =  6;
    optional bool   notfound_ok   =  7;
    optional uint32 timeout       =  8;
    optional bool   sloppy_quorum =  9;
    optional uint32 n_val         = 10;
    optional bool include_context = 11 [default=true];
}

Input data:

Req = #dtfetchreq{bucket = "bucket",
                  key = <<"key">>,
                  type = <<"type">>}.

Expected record with values after decoding:

#dtfetchreq{bucket = <<"bucket">>,
            key = <<"key">>,
            type = <<"type">>,
            r = 0,
            pr = 0,
            basic_quorum = false,
            notfound_ok = false,
            timeout = 0,
            sloppy_quorum = false,
            n_val = 0,
            include_context = true}

Actual decoded data:

#dtfetchreq{bucket = <<"bucket">>,
            key = <<"key">>,
            type = <<"type">>,
            r = undefined,
            pr = undefined,
            basic_quorum = undefined,
            notfound_ok = undefined,
            timeout = undefined,
            sloppy_quorum = undefined,
            n_val = undefined,
            include_context = undefined}

It seems like, according to the spec an optional field that is not supplied a default value or a value during encoding should be using type-specific defaults (0 for numeric fields, for instance). Instead, it seems like undefined is being used.

This may be related to #71

Thoughts? I'm going to start digging into the gpb code now to see what I can find. Thanks!

@lukebakken
Copy link
Contributor Author

lukebakken commented Aug 31, 2016

I just found issue #21 as well. I'm going to see exactly what epb does in this case.

@lukebakken
Copy link
Contributor Author

lukebakken commented Aug 31, 2016

epb also uses undefined for optional fields that do not have a default= option and are unset. But, if the field has a default= option, that is used if the data is not present during decoding.

Here is a test I added that demonstrates this:

https://github.com/basho/riak_pb/blob/develop/test/encoding_test.erl#L87-L102

Running that test with erlang_protobuffs succeeds, while include_context is undefined when using gpb.

I will see if I can add the epb behavior to the epb_compatibility option.

@tomas-abrahamsson
Copy link
Owner

Default values is a bit of a weak spot with gpb. The handling is buggy and inconsistent. It currently works like this:

if you compile with records (ie not with the maps option) then the default values gets written as record default values, like -record(m,{f=true}). This of course breaks with optional

But if one compiles with the maps option, there are no default values at all.

When generating code for decoding, gpb has some strategies, for performance. One being to pass around a record, and update it each time a field has been decoded. When this strategy is chosen, the initial record is created as #m{}, ie any default gets inserted, and if a field is optional and not included in the binary-to-be-decoded, it just means that field is never updated, and the default value will make it as a decoded value. This is wrong and buggy most of the time, and should be fixed. In part, this, too, might be what this issue is about.

But there's another strategy for how the decoding code looks like, that passes around each field as a variable instead of the record, that handling initiates any optional field to undefined regardless. This is one point where it is inconsistent.

Which of these two strategies to use depends on some judgement done in the analysis phase.

Regardless of strategy, when decoding is generated for maps, default values are never included, making it inconsistent in one more aspect.

One goal, generally for gpb, should of course be that is consistent. What this might mean for epb_compatibility could depend a little on whether it would be clean or messy to support the exact behaviour of epb in this regard.

I think that it is wrong to insert default values when the record is defined. (definitely wrong in case of optional fields anyway) And I think that the #m{} used in some cases in the decoding code, is wrong: it should initialize any optional field to undefined.

Another goal should be that there is some room for sensible extensibility in the direction of what is requested in the #21.

I think the #71 is unrelated since this is still on proto2 syntax, not proto3.

At the moment, there are some other things, it'll take a week at the earliest before I could take a closer look.

@lukebakken
Copy link
Contributor Author

After spending time reading the proto2 and proto3 specs, I can see why both optional and default were dropped in proto3. What a mess.

For epb_compatibility, I will have to ensure that gpb uses a field's default value when decoding if no value is sent on the wire. Currently, undefined is used.

Other changes to be in-line with the proto2 spec aren't as important to me at this point.

I think that it is wrong to insert default values when the record is defined. (definitely wrong in case of optional fields anyway)

But, that is what the spec requires. If no value is sent on the wire, either the field's default value is used (if defined), or the type-specific default should be used. This required languages to implement the has_foo workaround to indicate the presence of data like you noted. So, for strict proto2 implementation, this is what I think gpb should do. I suspect it would be such a big change in behavior it would have to be an option ... and may not be worth it as I don't see it to be a frequently-requested feature.

@tomas-abrahamsson
Copy link
Owner

Ah, yes, it was a long time for me too, but perhaps it boils down to designing something that:

  1. honours defaults and type-defaults
  2. feels natural to Erlang
  3. provides has_foo functionality

The gpb more or less attempts to do 2 and 3 by using records or maps for messages, and by setting omitted optional fields to undefined (or omitting them entirely, in the case of maps), but gpb fails to do 1 in a good way.

Is it really possible to do all of 1, 2 and 3? 1 and 3 can be achieved for instance by adding a bitmap to the record, the bitmap describing which optional fields are actually present. But it would require using api functions for setting fields so that the bitmap gets updated. One could not merely set the record field using the normal record field setting syntax. so it would fail on 2 (natural mapping to Erlang).

I get the impression that epb focuses on 1 and 2, but fails on 3. (the focus on 1 for defaults, but not type-defaults, if i understand it correctly?)

If we could come up with something that feels natural to Erlang, I'd be willing to give it a go, but until now, I haven't been able to come up with anything good. Ideas welcome of course!

For epb compatibility, I suppose fields of type message cannot have default values? If that is false, then substantial changes would be required. On the other hand, if that is true, ie that only scalar fields can have defaults, then I think it would be sufficient to add something in the gpb_compile:msg_decoder_initial_params so that it assigns default (or type-default) field values, given that the right option is set.

Being somewhat wiser from earlier experiences, I think the best for option designing would be to (a) control this by some new option for instance defaults_for_omitted_optionals and/or type_defaults_for_omitted_optionals, and (b) let the epb_compatibility expand to this (or those) new option(s) too. It already expands to some other options, see gpb_compile:norm_opt_epb_compat_opt In other words, not add more meaning to the epb_compatibility option except via expanded-to options. This gives people who wants only one or the other thing possibility to mix and match. Of course, as always, if there are better option names, it is most welcome.

@tomas-abrahamsson
Copy link
Owner

I think I need to make myself more clear: If we can find some way that feels natural to Erlang, to do both 1 and 3, then I'd welcome it, and be willing to spend time. Until then, I'd rather prioritize 3 (ie being able to see omitted fields) but I could accept an option, or perhaps two options, for shifting priority to defaults or type defaults, I can see this being useful. However, if message type fields can have defaults, then substantial re-writes would be necessary, such that I think options for shifting priority would be much more work that it is worth.

@lukebakken
Copy link
Contributor Author

Thanks for the detailed thoughts. The spec isn't clear, but I'm pretty sure that for message fields the default value is language-specific, so for languages that support null or null pointers, that is the default. gpb should continue to use undefined.All indications are that default values just apply to scalar fields.

@tomas-abrahamsson
Copy link
Owner

All indications are that default values just apply to scalar fields.

Yes, I agree to this after having read some protobuf source code and also tried it experimentally:

$ protoc  --cpp_out `pwd`  t.proto
t.proto:4:34: Messages can't have default values.

So is the below a correct recap of our discussion so far?

  • gpb should normally just ignore default values. There are some issues I'll fix here
  • but when in epb_compatibility mode, we should change it:
    • when decoding a binary, for optional fields that are not present in that binary:
      • if the field has a specified default, it should get that value
      • if no default is specified, it should get undefined, not a type default
        (is this correct how epb works? I think this so after having read the epb-generated code)
    • the record definitions in the hrl files should have default values
    • we should add some option for controlling how defaults (and perhaps also type-defaults) are handled, and epb_compatibility should expand also to that option

@lukebakken
Copy link
Contributor Author

@tomas-abrahamsson - that is my understanding as well. In epb, the type-specific default is not used, just undefined, in the case where no default is specified.

Your other bullet points are spot on too.

I spent some time trying to get a handle on everything that happens in msg_decoder_initial_params, and may be able to return to it next week. At the very least, I'll be able to review any changes you make 😄

Have a great weekend.

@tomas-abrahamsson
Copy link
Owner

Thanks! The same (retroactively :) )

I just pushed a branch (with a very long name) if you want to try it out: optionally-decode-omitted-optionals-to-defaults

@lukebakken
Copy link
Contributor Author

Thank you @tomas-abrahamsson I'll be trying it out today.

@tomas-abrahamsson
Copy link
Owner

Just pushed 3.26.0, containing the options defaults_for_omitted_optionals and type_defaults_for_omitted_optionals. The epb_compatibility option now also expands to defaults_for_omitted_optionals (but not type_defaults_for_omitted_optionals). Closing.

@tomas-abrahamsson
Copy link
Owner

(I'll remove the for-test-and-review temporary branches optionally-decode-omitted-optionals-to-defaults and new-optionally-decode-omitted-optionals-to-defaults)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants