-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stdlib): Add proto_encode and parse_proto to encode and parse protocol buffers #739
Conversation
parse_proto accepts a bytes value, a proto descriptor file path and a message type and returns the VRL value as parsed from the proto. `parse_proto!(base64_decode("data"), "person.desc", "proto.Person")` `=> {"id": 42, "name": "John Doe"}` For `encode_proto`, it does the reverse: `encode_proto!({"id": 42, "name": "John Doe"}, "person.desc", "proto.Person")` `=> bytes data` Note that the path to descriptor file and message type are required to be literals. We do this so that during compilation we load the message descriptor and reuse it for encoding or parsing. Most of the code was copied over from the protobuf codec support from vector. However, there's one notable difference. In `parse_proto`, we don't set fields that are not set. This ensures that encode_proto and parse_proto remain symmetric (decoding the result of parsing will return the original result).
src/stdlib/parse_proto.rs
Outdated
use std::path::Path; | ||
use std::path::PathBuf; | ||
|
||
fn proto_to_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on reducing code duplication here? Could we pull the shared bit into VRL? We want changes to proto codecs to be in sync with these new functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first instinct is to provide the following three functions:
fn get_message_descriptor(descriptor_set_path: &Path, message_type: &str,
fn encode_proto(descriptor: &MessageDescriptor, value: Value)
fn parse_proto(descriptor: &MessageDescriptor, value: Value)
as public functions in src/protobuf/ and expose that through a public module as vrl::protobuf
. Then we can just use them in the vector crate.
Let me know if that makes sense and I can do the changes. Open to other suggestions since I am not super familiar with the code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This plan sounds reasonable to me. This will require some effort:
- add the protobuf utils to VRL
- wait for a VRL release
- bump VRL version in Vector
- finally replace duplicate Vector code with
vrl::protobuf
utils
We can start with the first step and if you want, we can maintain a dev branch on Vector for the last step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a good opportunity to cleanup the tests/data
a bit.
I am not sure if all of those separate files are required. Worth taking a look if some of them can be merged. Not sure if we could avoid duplication across repos, worth a quick consideration but not a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @flavioc for refactoring, this is starting to look great!
As a next step, just to be sure that the exposed utils are good to go, can you follow up with a draft PR in Vector where:
- Change the VRL version to point to this commit
vrl = { git = "https://github.com/vectordotdev/vrl", rev = "9b25e9954c794f54a6e0e1c8706d79718b72983c" ...
- Delete duplicate code and use the utils you exposed in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! So, this is how the vector side will look like: vectordotdev/vector@8538929
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this commit that part of a PR? Interested to see the CI checks passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I created a draft PR: vectordotdev/vector#20074
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 if those checks I think we can go ahead with this PR
5fb0fd2
to
9b25e99
Compare
There are some failing checks. They all look minor. This needs a changelog entry and some clippy fixes. |
Fixed those (I think) and also the licenses file. |
There are some test failures. You can repo locally with |
Thanks @flavioc! Let's follow up on the Vector PR after a new VRL version is released. |
I forgot to mention, currently the documentation lives in the Vector repo. Every time we add new VRL function we also followup with a doc PR e.g. vectordotdev/vector#20048. Can you please create one when have some time? 🙏 |
For sure, I will follow up with that. Thanks for the review! |
* docs(vrl): Add documentation for parse_proto and encode_proto (vectordotdev/vrl#739) * Add spellcheck exception Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Add ignore pattern for base64 Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Update pattern Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Fix pattern Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * another try Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * cue formatting Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * update tests Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Ignore desc files in syntax check Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
…dotdev#20139) * docs(vrl): Add documentation for parse_proto and encode_proto (vectordotdev/vrl#739) * Add spellcheck exception Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Add ignore pattern for base64 Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Update pattern Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Fix pattern Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * another try Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * cue formatting Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * update tests Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Ignore desc files in syntax check Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
parse_proto accepts a bytes value, a proto descriptor file path and a message type and returns the VRL value as parsed from the proto.
parse_proto!(base64_decode("data"), "person.desc", "proto.Person")
=> {"id": 42, "name": "John Doe"}
For
encode_proto
, it does the reverse:encode_proto!({"id": 42, "name": "John Doe"}, "person.desc", "proto.Person")
=> bytes data
Note that the path to descriptor file and message type are required to be literals. We do this so that during compilation we load the message descriptor and reuse it for encoding or parsing.
Most of the code was copied over from the proto codec support from vector. However, there's one notable difference. In
parse_proto
, we don't set fields that are not set. This ensures that encode_proto and parse_proto remain symmetric (decoding the result of parsing will return the original result).