Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta: protobuf optional and required fields #379

Closed
matejcik opened this issue Aug 2, 2019 · 10 comments · Fixed by #1266
Closed

Meta: protobuf optional and required fields #379

matejcik opened this issue Aug 2, 2019 · 10 comments · Fixed by #1266
Assignees
Labels
code Code improvements protobuf Structure of messages exchanged between Trezor and the host
Milestone

Comments

@matejcik
Copy link
Contributor

matejcik commented Aug 2, 2019

i'm getting confused, so i'm going to start a meta-issue to document the state of our knowledge

Related Issues

#14, #16, #352

Current Situation

Our code uses protobuf v2, where the following is true:

  • Each non-repeated field can either be present or missing from the wire format
  • Producer can choose whether to send a field or not
  • Consumer can know whether the message contained the field or not
  • We make use of this in many places; e.g., ApplySettings.use_passphrase can be true, false, or unset. If unset, the setting is not touched. Otherwise it is set to the appropriate value.
  • At the same time, a lot of code is spent on validation, just search for "is None"

Problem Statement

I'm not sure ;)

One concrete problem is that mypy static typing is not powerful enough to test our error checking. When all message fields are optional, the type of that field is set to Optional[some type], and every function that uses it would need to contain something like assert field is not None, because the check does not carry across function call boundaries. Actually doing this would crazily complicate our code.

A related concrete problem is the already crazy amount of code we spend on input validation -- and we have no good way to ensure that the coverage is exhaustive.

Possible Solutions

Prefill default values for primitive types

When parsing the message, if a field is missing, substitute a base value for that type: "" for strings, 0 for integer-likes, etc. This solves a huge part of validation: type annotations for these fields are no longer Optional and we can statically check their use.

Nested messages are still None, so still optional, but that is desirable. And there's much less of them, so the remaining amount of explicit validation is manageable.

This is essentially what proto3 suggests, see also #16.

The drawback: we can no longer check whether the field was sent.
In places where this is required, we would have to add explicit flags to the protobuf. For instance, in ApplySettings, we would need a new field for every option, to tell if we should set that option, or keep the current setting. This would be backwards-incompatible in most places, and maybe even require rework of some parts of the logic.

IMHO this is a no-go.

Only prefill explicit default values

We could annotate fields with [default=x] with x the desired default value. We can then modify the parser so that if a field is missing on wire, the default is used.

At the moment, some places have a default annotation, but it is only advisory, see also #352.

This has the advantage that we can pick and choose which fields must be optional and which must not. The code that checks if a field is present can continue to work unchanged, AND we get strong checking for the other places.

Of course, this would involve a full review of all protobuf messages and their field usage.

Make use of required fields

We could start using required field type, and reject messages that don't contain the required fields. See also #14.

This seems complementary to the explicit default values option. The usage semantic of required is essentially "you must always provide a value", versus default's "if you don't like the default value of <x>, you can provide a different one."

Note however that required uint32 field and optional uint32 field [default=99] are effectively identical in terms of "defensiveness". A possible attacker can always choose to either send the value explicitly (to avoid required check) or not send it (to trigger default).

The drawback is that this introduces possible backwards-compatibility problems: once a consumer has a required field, we can never again not send it.
But functinally speaking, that is true for all fields. Many of our consumers rely on data in specific fields. We can leave them out of the wire format, but the data will be missing on the other side anyway. This might even be an argument in favor: marking a field required will make such consumer fail early if we ever make a change to that field.

Conclusion

Given the above problem statement, proto2 is more appropriate for our usecase, and we should decide if we want to start using defaults and required fields.

It is possible to enable both features in pb2py now and change the definitions gradually.

Did I miss some related problems? Esp. in terms of what proto3 brings us?

@matejcik matejcik added protobuf Structure of messages exchanged between Trezor and the host meta labels Aug 2, 2019
@tsusanka
Copy link
Contributor

tsusanka commented Aug 5, 2019

Great write-up! As related to protobuf v3 upgrade I think we should check:

  • if v2 will be supported in the near future (it seems so)
  • if v3 has some performance advatages (it seems it does not)

@matejcik
Copy link
Contributor Author

matejcik commented Aug 5, 2019

Re (1): tooling for proto2 is not going away anytime soon, and the only upstream tool we need is protoc -- which we can freeze at an old version if necessary.
Also we can simulate both required and default with custom field options, making the definition valid in proto3.
All encoding and decoding is done by custom code in the firmware. The only problem could arise in Connect/wallet.

Re (2): the wire format for proto3 is pretty much identical.
There is some space saving because proto3 does not send fields that have the default value, and there are "packed arrays" that are more efficient than the normal ones. I expect the saving to be negligible for us.
I expect speed to be identical too.

@prusnak
Copy link
Member

prusnak commented Aug 5, 2019

How does the following plan sound?

  1. let's stick to proto2
  2. revisit all the fields and use required when the field is really required
  3. revisit all the fields, remove the default values and make sure we handle the missing value correctly in the code (in another word, the application code, not the parser, adds the default value) + also consider whether the default value fields shouldn't become required fields

@tsusanka
Copy link
Contributor

tsusanka commented Aug 5, 2019

revisit all the fields, remove the default values and make sure we handle the missing value correctly in the code (in another word, the application code, not the parser, adds the default value) + also consider whether the default value fields shouldn't become required fields

What is the benefit of having the default values in application code instead of the parser? It seems to me that the default values should be as close to the messages defintion as possible.

@prusnak
Copy link
Member

prusnak commented Aug 5, 2019

It seems to me that the default values should be as close to the messages defintion as possible.

I agree with this more or less.

What is the benefit of having the default values in application code instead of the parser?

I am afraid of shitty Protobuf implementations which do not properly encode default values. Having default values on the decoding side would be nicer IMHO. But no strong opinion.

@tsusanka
Copy link
Contributor

tsusanka commented Aug 6, 2019

I am afraid of shitty Protobuf implementations which do not properly encode default values. Having default values on the decoding side would be nicer IMHO. But no strong opinion.

That's a good point.

@matejcik
Copy link
Contributor Author

matejcik commented Aug 6, 2019

I am afraid of shitty Protobuf implementations which do not properly encode default values. Having default values on the decoding side would be nicer IMHO. But no strong opinion.

But we will have the defaults on the decoding side. We will use the default option to generate code that sets the default if it's not sent over the wire -- which is the behavior we want. Shitty host-side implementations can be free to fix their own code.

@tsusanka
Copy link
Contributor

Does this need QA?

@matejcik
Copy link
Contributor Author

QA is part of #1266

@tsusanka
Copy link
Contributor

Test requested via Asana.

@tsusanka tsusanka modified the milestones: 2020-10, 2020-11 Oct 1, 2020
@tsusanka tsusanka modified the milestones: 2020-11, 2020-12 Nov 6, 2020
@tsusanka tsusanka modified the milestones: 2020-12, 2021-01 Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Code improvements protobuf Structure of messages exchanged between Trezor and the host
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants