Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib): Add proto_encode and parse_proto to encode and parse protocol buffers #739

Merged
merged 6 commits into from
Mar 13, 2024

Conversation

flavioc
Copy link
Contributor

@flavioc flavioc commented Mar 11, 2024

parse_proto accepts a bytes value, a proto descriptor file path and a message type and returns the VRL value as parsed from the proto.

parse_proto!(base64_decode("data"), "person.desc", "proto.Person") => {"id": 42, "name": "John Doe"}

For encode_proto, it does the reverse:

encode_proto!({"id": 42, "name": "John Doe"}, "person.desc", "proto.Person") => bytes data

Note that the path to descriptor file and message type are required to be literals. We do this so that during compilation we load the message descriptor and reuse it for encoding or parsing.

Most of the code was copied over from the proto codec support from vector. However, there's one notable difference. In parse_proto, we don't set fields that are not set. This ensures that encode_proto and parse_proto remain symmetric (decoding the result of parsing will return the original result).

parse_proto accepts a bytes value, a proto descriptor file path and a message type
and returns the VRL value as parsed from the proto.

`parse_proto!(base64_decode("data"), "person.desc", "proto.Person")`
`=> {"id": 42, "name": "John Doe"}`

For `encode_proto`, it does the reverse:

`encode_proto!({"id": 42, "name": "John Doe"}, "person.desc", "proto.Person")`
`=> bytes data`

Note that the path to descriptor file and message type are required to be literals. We do this so
that during compilation we load the message descriptor and reuse it for encoding or parsing.

Most of the code was copied over from the protobuf codec support from vector. However, there's one
notable difference. In `parse_proto`, we don't set fields that are not set. This ensures that
encode_proto and parse_proto remain symmetric (decoding the result of parsing will return the
original result).
@jszwedko jszwedko requested a review from pront March 11, 2024 14:06
use std::path::Path;
use std::path::PathBuf;

fn proto_to_value(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on reducing code duplication here? Could we pull the shared bit into VRL? We want changes to proto codecs to be in sync with these new functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first instinct is to provide the following three functions:

  • fn get_message_descriptor(descriptor_set_path: &Path, message_type: &str,
  • fn encode_proto(descriptor: &MessageDescriptor, value: Value)
  • fn parse_proto(descriptor: &MessageDescriptor, value: Value)

as public functions in src/protobuf/ and expose that through a public module as vrl::protobuf. Then we can just use them in the vector crate.

Let me know if that makes sense and I can do the changes. Open to other suggestions since I am not super familiar with the code yet.

Copy link
Contributor

@pront pront Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plan sounds reasonable to me. This will require some effort:

  1. add the protobuf utils to VRL
  2. wait for a VRL release
  3. bump VRL version in Vector
  4. finally replace duplicate Vector code with vrl::protobuf utils

We can start with the first step and if you want, we can maintain a dev branch on Vector for the last step.

Copy link
Contributor

@pront pront Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a good opportunity to cleanup the tests/data a bit.

I am not sure if all of those separate files are required. Worth taking a look if some of them can be merged. Not sure if we could avoid duplication across repos, worth a quick consideration but not a blocker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @flavioc for refactoring, this is starting to look great!

As a next step, just to be sure that the exposed utils are good to go, can you follow up with a draft PR in Vector where:

  • Change the VRL version to point to this commit vrl = { git = "https://github.com/vectordotdev/vrl", rev = "9b25e9954c794f54a6e0e1c8706d79718b72983c" ...
  • Delete duplicate code and use the utils you exposed in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! So, this is how the vector side will look like: vectordotdev/vector@8538929

Copy link
Contributor

@pront pront Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commit that part of a PR? Interested to see the CI checks passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I created a draft PR: vectordotdev/vector#20074

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 if those checks I think we can go ahead with this PR

@flavioc flavioc requested a review from pront March 11, 2024 21:05
@pront
Copy link
Contributor

pront commented Mar 12, 2024

There are some failing checks. They all look minor. This needs a changelog entry and some clippy fixes.

@flavioc flavioc changed the title Add proto_encode and parse_proto to encode and parse protocol buffers feat(stdlib): Add proto_encode and parse_proto to encode and parse protocol buffers Mar 12, 2024
@flavioc
Copy link
Contributor Author

flavioc commented Mar 12, 2024

There are some failing checks. They all look minor. This needs a changelog entry and some clippy fixes.

Fixed those (I think) and also the licenses file.

@pront
Copy link
Contributor

pront commented Mar 12, 2024

There are some test failures. You can repo locally with ./scripts/checks.sh vrl_tests tests.

@pront pront enabled auto-merge March 13, 2024 15:04
@pront
Copy link
Contributor

pront commented Mar 13, 2024

Thanks @flavioc! Let's follow up on the Vector PR after a new VRL version is released.

@pront pront added this pull request to the merge queue Mar 13, 2024
@pront
Copy link
Contributor

pront commented Mar 13, 2024

I forgot to mention, currently the documentation lives in the Vector repo. Every time we add new VRL function we also followup with a doc PR e.g. vectordotdev/vector#20048. Can you please create one when have some time? 🙏

Merged via the queue into vectordotdev:main with commit e5988a5 Mar 13, 2024
11 checks passed
@flavioc
Copy link
Contributor Author

flavioc commented Mar 13, 2024

I forgot to mention, currently the documentation lives in the Vector repo. Every time we add new VRL function we also followup with a doc PR e.g. vectordotdev/vector#20048. Can you please create one when have some time? 🙏

For sure, I will follow up with that. Thanks for the review!

github-merge-queue bot pushed a commit to vectordotdev/vector that referenced this pull request Mar 25, 2024
* docs(vrl): Add documentation for parse_proto and encode_proto (vectordotdev/vrl#739)

* Add spellcheck exception

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Add ignore pattern for base64

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Update pattern

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Fix pattern

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* another try

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* cue formatting

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* update tests

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Ignore desc files in syntax check

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
AndrooTheChen pushed a commit to discord/vector that referenced this pull request Sep 23, 2024
…dotdev#20139)

* docs(vrl): Add documentation for parse_proto and encode_proto (vectordotdev/vrl#739)

* Add spellcheck exception

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Add ignore pattern for base64

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Update pattern

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Fix pattern

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* another try

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* cue formatting

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* update tests

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

* Ignore desc files in syntax check

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>

---------

Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants