Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correctly handle defaults for proto2 and proto3 #146

Merged
merged 24 commits into from
Jun 14, 2022

Conversation

tomasz-szypenbejl-td
Copy link

This is another attempt to add the default values feature, based on @Yurzel's work. It is still not perfect, but my company is willing to spend (reasonably) more time on this, provided we receive some feedback/guidance on how to make it better (@td-krzysiek, thanks for letting me work on it).

This pull request contains:

  • The original work of @Yurzel from Default values #134 rebased on the current main branch,
  • A few changes in src/descriptor.ts and src/plugin.ts necessary for npx bazel run //src:diff_compiler_update to work. Those changes also made most of the tests pass. The problem was that src/descriptor.ts code was accessing object fields like 'oneof_index' and 'default_value' using the new getters (that are providing default values now), and thus testing against null (and similar operations) did not work as expected. The bad thing is I fixed the problem by using the ugly pb.Message.getField(fieldDescriptor, someNumber) calls - perhaps it is possible to figure out a better approach.
  • A few additional changes, so that getters do NOT provide defaults for proto3 optional fields and oneof fields (unless the oneof fields have defaults given explicitly in a proto2 input file). The rationale behind this decision is that a user of the library would probably like to know whether a proto3 optional is present or not, and would probably also like to know which of the oneof fields is present in the message. My company does not really use oneof and proto3 optional fields, so we are willing to make further changes to that behaviour if necessary.
  • Changes in the serialization code so that fields having default values are skipped during serialization. The good thing is the generated serialization code no longer uses the ugly getField() calls, and it produces smaller output. The bad thing is serialization no longer works according to the spec for proto2 messages (as I understand, in proto2 it is OK to skip undefined/null fields, not the fields having default values). I think the problem could be fixed by resorting to the ugly getField() calls in serialization code generated for proto2 messages.

It appears that the main source of the still unsolved problems is that those ugly getField() calls have to be made to obtain actual value of a field without the default. Perhaps we should consider making a change in the classes generated for protobuf messages, so that they allow user to get field value both with and without default, e.g.:

// with defaults:
const fieldValue1 = message.field_value1;
const fieldValue2 = message.field_value2;
// without defaults:
const optFieldValue1 = message.no_defaults.field_value1;
const optFieldValue2 = message.no_defaults.field_value2;

Then we could easily made the code in src/descriptor.ts pretty again, and easily restore the correct serialization behavior for proto2 messages - just a thought.

Yurzel and others added 22 commits June 9, 2022 15:46
…x bazel run //src:diff_compiler_update' repeatedly
…run it multiple times and it produces the same output to src/compiler/{descriptor,plugin}.ts every time
…wo lines when rebasing Yurzel's work -- this is now fixed
…optional in a proto3 file, then do not use the default value for that field; also do not use default values for one_of fields, unless the defaults are explicitly provided in the .proto file
…ty fields of type bytes should also be skipped during serialization
@Yurzel
Copy link

Yurzel commented Jun 9, 2022

Good job 👍 Just a few things.

  • A few changes in src/descriptor.ts and src/plugin.ts necessary for npx bazel run //src:diff_compiler_update to work. Those changes also made most of the tests pass. The problem was that src/descriptor.ts code was accessing object fields like 'oneof_index' and 'default_value' using the new getters (that are providing default values now), and thus testing against null (and similar operations) did not work as expected. The bad thing is I fixed the problem by using the ugly pb.Message.getField(fieldDescriptor, someNumber) calls - perhaps it is possible to figure out a better approach.

Thanks. This was bugging me alot. Didn't realize this was an issue. 😓

  • A few additional changes, so that getters do NOT provide defaults for proto3 optional fields and oneof fields (unless the oneof fields have defaults given explicitly in a proto2 input file). The rationale behind this decision is that a user of the library would probably like to know whether a proto3 optional is present or not, and would probably also like to know which of the oneof fields is present in the message. My company does not really use oneof and proto3 optional fields, so we are willing to make further changes to that behaviour if necessary.

Actually all of my work has been inspired by protoc javascript output. From the observation you can see that almost all the fields use getFieldWithDefault and there are extra has/clear functions, so you get to know whenever the field has been set or not. I implemented it in #136. The branch suffers the same issue like #134. (#136 is actually based on #134)

I even tackled the serialization of default values there.

presence fields info:
https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md

@tomasz-szypenbejl-td
Copy link
Author

tomasz-szypenbejl-td commented Jun 10, 2022

I wish I had taken a better look at all the pull requests. I clearly should have started with #136 instead of #134. @Yurzel, thanks for telling me about the other pull request.
I guess I could now revert my latest commits (the ones that changed the handling of proto3 optionals and oneoff fields, and also the ones that changed the serialization). Then if I rebase the commits from #136 that I do not yet have in my branch on top of it, everything should be fine - just a few more changes will be needed to take advantage of the extra has fields in src/descriptor.ts and src/field.ts (to get rid of the ugly getField() calls). Then perhaps the code will need some small adjustments to work fine with the new json_names option.
@thesayyn, would the code be good enough to merge if I make the changes described above?

@thesayyn
Copy link
Owner

@tomasz-szypenbejl-td let's land this and you follow up with the next one? that'd make it easier for me to review

@thesayyn
Copy link
Owner

I don't want to be a speed bump to you since you are putting a lot of effort into this.

@tomasz-szypenbejl-td
Copy link
Author

@thesayyn If you prefer to review this pull request first, and then have another pull request containing the changes from #136 - that seems perfectly OK to me. But if during the review you decide that there are some changes that are simply not acceptable and could be easily fixed by including the additional commits from #136 (those ugly getField() calls in src/descriptor.ts and src/field.ts are potentially one example of such changes), please don't hesitate to change your mind.

src/compiler/plugin.ts Outdated Show resolved Hide resolved
src/descriptor.ts Show resolved Hide resolved
src/field.ts Show resolved Hide resolved
test/conformance/map/proto/map_checked.ts Outdated Show resolved Hide resolved
@thesayyn
Copy link
Owner

@tomasz-szypenbejl-td only two concerns withstands now. Let's address them and land this

@tomasz-szypenbejl-td
Copy link
Author

@tomasz-szypenbejl-td only two concerns withstands now. Let's address them and land this

Agreed. It may take me a few days to make the necessary changes - please be patient :)

* make toObject() work just the same as before introducing defaults,
* revert test changes that were made due to changed toObject() behaviour,
* no more defaults for required proto2 fields without [default=],
* tests for the defaults feature modified accordingly
@tomasz-szypenbejl-td
Copy link
Author

I think I got it right this time (but I do not mind making a few more corrections should that be necessary).

The generated toObject() method will now work exactly the same as before (as if the defaults did not exist). People who want toObject() to return defaults can borrow a helper method toObjectWithDefaults from default.spec.ts.

Please note that eventually I did not touch the getField() call in src/field.ts because oneof_index really is marked as optional in the proto file and thus has an (implicit) default value of 0.

There is a test case that verifies that in proto2 normal property access (without the getField() call) returns null for required properties that do not have a [default=]. The name of the test case is: "should not return defaults for required fields without [default=] (v2)" in default.spec.ts.

Copy link
Owner

@thesayyn thesayyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@thesayyn thesayyn changed the title Default values - 2nd attempt fix: correctly handle defaults for proto2 and proto3 Jun 14, 2022
@thesayyn
Copy link
Owner

The generated toObject() method will now work exactly the same as before (as if the defaults did not exist). People who want toObject() to return defaults can borrow a helper method toObjectWithDefaults from default.spec.ts.

BTW, the current behavior toObject is to return defaults since it reads the fields from their getters. if getters return default values so will toObject.

@thesayyn thesayyn merged commit f5adbf2 into thesayyn:main Jun 14, 2022
@tomasz-szypenbejl-td
Copy link
Author

BTW, the current behavior toObject is to return defaults since it reads the fields from their getters. if getters return default values so will toObject.

Oh, I guess that may be a misunderstanding on my part. I thought you wanted the observable behaviour of the .toObject() method to stay exactly the same as before (i.e. before introducing defaults), and I modified the code generating the .toObject() methods so that the objects returned by .toObject() are really exactly the same as before (as if the defaults support has not been added, in other words: the returned objects have just the fields that were set, without defaults).

Nothing to worry about - I will just change the code generating .toObject() methods again in the second pull request (where presence/absence fields will be introduced) so that it returns default values. I just need a few days to prepare the 2nd pull request - please be patient :)

Thank you for approving and merging this pull request.

@alfonmga
Copy link

alfonmga commented Jun 28, 2022

Amazing work – @thesayyn would you be able to do a release including this PR as soon as you have some mental bandwidth for it? This issue hit me in the face today. Cheers mate.

@thesayyn
Copy link
Owner

@alfonmga We'll cut a new release as soon as we land #147

@Santalov
Copy link

The generated toObject() method will now work exactly the same as before (as if the defaults did not exist). People who want toObject() to return defaults can borrow a helper method toObjectWithDefaults from default.spec.ts.

BTW, the current behavior toObject is to return defaults since it reads the fields from their getters. if getters return default values so will toObject.

I have discovered, that toObject doesn't return default for the Map, but the corresponding getter does. Is this a bug or this is by design?

@tomasz-szypenbejl-td
Copy link
Author

I have discovered, that toObject doesn't return default for the Map, but the corresponding getter does. Is this a bug or this is by design?

I think this must be a bug I introduced. The final code in this pull request was not really returning defaults in .toObject() output. Later, when I finally changed the .toObject() behavior in #147 I probably just completely forgot about maps. The original pull request #136 certainly included tests that expected map fields to have default values in the objects returned by .toObject().
So yes, I believe it is a bug.

@Santalov
Copy link

I have discovered, that toObject doesn't return default for the Map, but the corresponding getter does. Is this a bug or this is by design?

I think this must be a bug I introduced. The final code in this pull request was not really returning defaults in .toObject() output. Later, when I finally changed the .toObject() behavior in #147 I probably just completely forgot about maps. The original pull request #136 certainly included tests that expected map fields to have default values in the objects returned by .toObject(). So yes, I believe it is a bug.

Thank you for the explanation, I'd like to fix it while working on #154 .

@Santalov
Copy link

Santalov commented Jul 25, 2022

@tomasz-szypenbejl-td
Oh, can you please clarify, why isRequiredWithoutExplicitDefault function is introduced. As far as i am concerned, it is used for determing, whether the field requires a value to be set. So when the field is required without explicit default, we can omit it in the fromObject method and the Message.getField function is used in the getter method.
But according to the proto2 language docs, required fields are always present in the serialized message and are non-nullable arguments of the contructor and fromObject method. There is no reference to the default values, they only matter when used with optional fields.

Can I remove this function and only refer to isOptional as follows?

  • isOptional is true => the field is nullable in the constructor and fromObject method.
  • isOptional is false => the field is non-nullable in the constructor and fromObject method.
  • isOptional && isMessage => the field is nullable in the toObject return type, other fields are non-nullable (they always have value).

This change does not affect the generation of has_ fields

@tomasz-szypenbejl-td
Copy link
Author

You probably mean the isRequiredWithoutExplicitDefault function.
As far as I remember, I introduced that function so that in proto2 messages the getters of the required fields return undefined (instead of some default value) when a field is not present and does not have an explicitly specified default value in the proto2 file. At that point (when that function was introduced) it seemed important to have getters behave exactly in that way. Now that we have presence fields, I am not sure if it is still so important.
You can go ahead and try to remove that function (and adjust the rest of the code and tests accordingly), but please keep in mind that:

  • Currently the code protoc-gen-ts generates allows you to create an object representing a message without passing any arguments to the constructor (even when the message has required fields). I would say that such a message is initially invalid and becomes valid only after you assign some values to the required fields.
  • protoc-gen-ts does not throw an exception when deserializing a message that does not have some required fields (neither does Google's javascript implementation). There probably should some way to determine if an object (representing a deserialized message) really has the required values (OR perhaps an exception should be thrown when deserializing a message missing required fields).
  • The mindset that default values only matter for optional fields is not exactly correct. Specifying default values is allowed for required proto2 fields as explained in https://stackoverflow.com/questions/42554286/why-protobuf-both-required-optional-field-accepts-default-value-i-expect-only (I personally think this is insane, but what can we do).

All I am trying to say is that the change you would like to make is quite big, requires a lot of work and might be a breaking change for some users of the library. Whether such a change will be accepted & merged if of course up to @thesayyn.

@Santalov
Copy link

Santalov commented Jul 25, 2022

You probably mean the isRequiredWithoutExplicitOptional function.

Yeah, sorry, I mean isRequiredWithoutExplicitDefault. Corrected the message.

  • Currently the code protoc-gen-ts generates allows you to create an object representing a message without passing any arguments to the constructor (even when the message has required fields). I would say that such a message is initially invalid and becomes valid only after you assign some values to the required fields.

At first I wanted to write that there is a counterexample in the tests, but noticed that you can pass no object into the constructor. This invalidates the required behaviour, so it is the same as optional, isn't it? I am ok to keep the current behaviour in place, maybe future pull requests will change it.

BTW, in the message mentioned above signatures of the constructor and fromObject methods are different. I'd like to bring them into line.

  • protoc-gen-ts does not throw an exception when deserializing a message that does not have some required fields (neither does Google's javascript implementation). There probably should some way to determine if an object (representing a deserialized message) really has the required values (OR perhaps an exception should be thrown when deserializing a message missing required fields).

Oh, thank you for clarifying this. Now there is a reason for isRequiredWithoutExplicitDefault.

Thanks a lot for the reference!

All I am trying to say is that the change you would like to make is quite big, requires a lot of work and might be a breaking change for some users of the library. Whether such a change will be accepted & merged if of course up to @thesayyn.

Yeah, i think my old proposal can introduce more bugs. It needs to be corrected, so:

  • data parameter of the constructor remains nullable.
  • data parameter of the fromObject method becomes nullable.
  • isOptional is true => the field is nullable in the constructor and fromObject method, when data is passed.
  • isOptional is false => the field is non-nullable in the constructor and fromObject method, when data is passed.
  • isOptional && isMessage || isRequiredWithoutExplicitDefault => the field is nullable in the toObject return type, other fields are non-nullable (they always have value).
  • And of course fromObject, constructor and toObject must be in conformance with getters and setters. You should be unable to set null value to a required (!isOptional) field, cos it doesn't make sense. Now you also cannot do it via constructor explicitly.

This way protoc-gen-ts will still allow the creation of invalid messages and ready to handle invalid messages with typescript's strictNullCheck enabled. The return type of toObject method will represent actual values returned. Getters and setters types will also represent actual values.

Constructor already uses only isOptional predicate, code here, but other places are out of sync.

@Santalov
Copy link

Again, @tomasz-szypenbejl-td,
Thank you so much for the clarification!

@tomasz-szypenbejl-td
Copy link
Author

@Santalov I think it all makes sense. I only have doubts about:

  • data parameter of the fromObject method becomes nullable.

I think that if a method is called fromObject nobody should be surprised that they are required to provide some object as argument. However, I can also see that making the argument nullable would make that method more similar to the constructor, so I don't have strong feelings here.

I recently noticed that currently repeated fields are handled rather inconsistently too:

const m1 = new DefaultMessageV2WithoutDefault();
// compiles fine

const m2 = new DefaultMessageV2WithoutDefault({});
// does not compile - required fields missing

const m3 = new DefaultMessageV2WithoutDefault({ array_int32: [] });
// does not compile - required fields still missing

const m4 = DefaultMessageV2WithoutDefault.fromObject({});
// does not compile - required field missing

const m5 = DefaultMessageV2WithoutDefault.fromObject({ array_int32: [] });
// compiles fine - apparently it only cares about array_int32 and ignores the required fields here

const m6 = DefaultMessageV2WithoutDefault.fromObject({ array_int32: [], array_message: [] });
// compiles fine - but it does not really care about array_message - as seen in the example above

I think this inconsistency (in handling of arrays of messages vs arrays of other types in fromObject input) is probably a result of handling message fields specially regardless of whether the field is repeated - it should be easy to fix.

Again, @tomasz-szypenbejl-td, Thank you so much for the clarification!

No problem at all. Thank you for your work on making this library better.

@Santalov
Copy link

Santalov commented Jul 26, 2022

I think that if a method is called fromObject nobody should be surprised that they are required to provide some object as argument. However, I can also see that making the argument nullable would make that method more similar to the constructor, so I don't have strong feelings here.

Yeah, that makes sense. I have already implemented that feature, but the decision is after @thesayyn

I recently noticed that currently repeated fields are handled rather inconsistently too:

I'll fix this inconsistency in the next pr.
Do you think repeated and map fields should be non-nullable in constructor and fromObject method? Both proto2 and proto3 don't track presence for such fields. https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis

Personally, I'd like to make them nullable.

@tomasz-szypenbejl-td
Copy link
Author

Do you think repeated and map fields should be non-nullable in constructor and fromObject method? Both proto2 and proto3 don't track presence for such fields. https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis

Personally, I'd like to make them nullable.

I would prefer them to be nullable too. Somebody could argue that making them non-nullable and forcing user to always provide at least an empty array (or map) to fromObject (or constructor) would be a more consistent approach. Perhaps that is true, but as a user I would definitely prefer to be able to completely skip those fields when creating a new message if I just want them to be empty.

Of course, even if user skips the array/map fields during message creation, the toObject method called on that message should still return an object having empty arrays/maps in those field. And so should field getters too (at least in my opinion).
Such a behavior of message classes generated by protoc-gen-ts could then be simply described as "Array and map fields are empty by default, so if you don't provide non-empty values for them during message creation, they will be empty."
That sounds intuitive enough to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants