Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(vrl): add encode_key_value function #7751

Merged
merged 16 commits into from Jun 22, 2021

Conversation

prognant
Copy link
Contributor

@prognant prognant commented Jun 4, 2021

It reuses encode_logfmt and allows custom field & k/v delimiters.

encode_logfmt in now an alias with preset delimiters.

It can be use to construct datadog log tags list:

tags = parse_key_value!(.ddtags, field_delimiter: ",", key_value_delimiter: ":")
merge(tags,{"region": "ap-3", "via:vector" })
.ddtags = encode_key_value!(tags, key_value_delimiter: ":", field_delimiter: ",")

@prognant prognant marked this pull request as ready for review June 8, 2021 11:42
@prognant prognant requested review from a team and pablosichert and removed request for a team June 8, 2021 11:42
Copy link
Contributor

@spencergilbert spencergilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment on the documentation side of things, otherwise looks good to me

docs/reference/remap/functions/add_datadog_tags.cue Outdated Show resolved Hide resolved
@spencergilbert
Copy link
Contributor

I wonder if that failed test is flakey 🤔

@prognant
Copy link
Contributor Author

prognant commented Jun 9, 2021

I wonder if that failed test is flakey 🤔

I think so as well, I believe that's not the first time it fails with no apparent reason or no obvious link to the code changed by the PR

@JeanMertz JeanMertz self-requested a review June 10, 2021 14:40
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few initial comments.

There are two "higher-level" thoughts that I have:

  1. Do we want to specifically tie this to Datadog, or is this pattern (tags using foo:bar,baz:qux) generic enough that we can give it a more generic name?

  2. If we do want to tie it to Datadog specifically, should we name it datadog_add_tags and datadog_remove_tags, similar to how we name all provider-specific parsing functions parse_<provider>_... (such as parse_aws_...)?

docs/reference/remap/functions/add_datadog_tags.cue Outdated Show resolved Hide resolved
docs/reference/remap/functions/remove_datadog_tags.cue Outdated Show resolved Hide resolved
docs/reference/remap/functions/remove_datadog_tags.cue Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/add_datadog_tags.rs Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/add_datadog_tags.rs Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/add_datadog_tags.rs Outdated Show resolved Hide resolved
@prognant
Copy link
Contributor Author

prognant commented Jun 11, 2021

I left a few initial comments.

There are two "higher-level" thoughts that I have:

  1. Do we want to specifically tie this to Datadog, or is this pattern (tags using foo:bar,baz:qux) generic enough that we can give it a more generic name?
  2. If we do want to tie it to Datadog specifically, should we name it datadog_add_tags and datadog_remove_tags, similar to how we name all provider-specific parsing functions parse_<provider>_... (such as parse_aws_...)?

I think you made a point about being generic, we could assume that a tag list is generic enough (add key/value delimiter and and tags delimiter as an optional field) and instead have function named taglist_remove(...) and taglist_add(...) WDYT ?

@spencergilbert
Copy link
Contributor

I think you made a point about being generic, we could assume that a tag list is generic enough (add key/value delimiter and and tags delimiter as an optional field) and instead have function named taglist_remove(...) and taglist_add(...) WDYT ?

Just noting that it seems similar to append with more structure

@JeanMertz
Copy link
Contributor

JeanMertz commented Jun 11, 2021

Just noting that it seems similar to append with more structure

There's also parse_key_value which specifically allows specifying key_value_delimiter and field_delimiter, which seems even more closely related to these functions.

It appears this is identical to using add_datadog_tags and remove_datadog_tags:

$ tags = "foo:bar,baz:qux"
"foo:bar,baz:qux"

$ parsed_tags = parse_key_value!(tags, field_delimiter: ",", key_value_delimiter: ":")
{ "baz": "qux", "foo": "bar" }

$ del(parsed_tags.baz)
"qux"

$ parsed_tags.quux = "quuz"
"quuz"

$ parsed_tags
{ "foo": "bar", "quux": "quuz" }

Now, the thing that seems to be missing in VRL right now is "how to go from a k/v object to a k/v string?" We have convert functions for this. It seems to me what we really want in this case is to_key_value (or, to_key_value_string, to be more precise), which would behave something like this:

$ .ddtags = to_key_value(parsed_tags, field_delimiter: ",", key_value_delimiter: ":")
"foo:bar,quux:quuz"

Having said all this, I'm not against providing functions that make the above list of operations more convenient to perform, I'm just not entirely convinced the {add,remove}_datadog_tags functions meet that threshold (and if they do, I'm similarly not convinced they need to be Datadog-specific).

I'm happy to defer to others such as @jszwedko and @binarylogic to make that call.

@prognant
Copy link
Contributor Author

prognant commented Jun 14, 2021

I was unaware of the parse_key_value function, that's being say I'm now against my own PR...

Now, the thing that seems to be missing in VRL right now is "how to go from a k/v object to a k/v string?" We have convert functions for this. It seems to me what we really want in this case is to_key_value (or, to_key_value_string, to be more precise), which would behave something like this:

$ .ddtags = to_key_value(parsed_tags, field_delimiter: ",", key_value_delimiter: ":")
"foo:bar,quux:quuz"

Bouncing on that encode_logfmt would then be a specific case of to_key_value, with field_delimiter: " " and key_value_delimiter: "=", what I can think of right now would be to add a generic encode_key_value (to follow the existing function naming convention) that would do the same as your to_key_value suggestion, use the existing encode_logfmt code with slight changes and rewrite encode_logfmt as a simple wrapper. cc @JeanMertz

edit: I reworked the PR accordingly

@prognant prognant force-pushed the prognant/vrl-add-ddog-tag branch 2 times, most recently from 9264e6c to bfbf21f Compare June 15, 2021 13:46
@prognant prognant changed the title enhancement(remap): Datadog tag manipulation helper enhancement(remap): add encode_key_value function Jun 15, 2021
@jszwedko
Copy link
Member

One difference I'm noticing with parse_key_value is that it doesn't handle keys with no values. It seems like we support non-key:value tags in datadog? For example:

$ parse_key_value!("foo:bar,baz,host:localhost", field_delimiter: ",", key_value_delimiter: ":")
{ "baz,host": "localhost", "foo": "bar" }

We could modify parse_key_value to return something like:

{ "host": "localhost", "foo": "bar", "baz": null }

If I remember correctly, we also allow duplicate tags in Datadog which would throw another wrinkle into this. For example, what would foo:bar,foo:baz parse as?

I agree that these functions probably don't necessitate something Datadog specific but it also seems like parse_key_value/encode_key_value don't necessarily cover all of the cases.

That being said, I think this PR to add encode_key_value is fine to go in as it is a value-add regardless.

@prognant
Copy link
Contributor Author

prognant commented Jun 16, 2021

We could modify parse_key_value to return something like:

{ "host": "localhost", "foo": "bar", "baz": null }

It seems a suggestion to handle standalone key already exists out there https://pkg.go.dev/github.com/kr/logfmt?utm_source=godoc

Standalone key would end up being deserialised along with a true boolean for the matching value (instead of the null you suggested). I think it makes sense to use a boolean, the parse_key_value function would have no compatibility issue, the encode_key_value could have a extra settings to enable/disable standalone key serialisation for boolean.

As supporting standalone key (either as true or null for the value) seems doable without breaking any existing behaviour and also cover corner cases for datadog tags manipulation.

Edit: the latest commit support standalone key and should not break anything (maybe the verify parser also needs to ignore quoted string) cc @jszwedko

@prognant prognant force-pushed the prognant/vrl-add-ddog-tag branch 2 times, most recently from b57cf6b to ec7e590 Compare June 17, 2021 10:53
@prognant
Copy link
Contributor Author

prognant commented Jun 17, 2021

I split the PR in 2 different PR :

@prognant prognant changed the title enhancement(remap): add encode_key_value function enhancement(remap): add encode_key_value function Jun 17, 2021
@prognant
Copy link
Contributor Author

If I remember correctly, we also allow duplicate tags in Datadog which would throw another wrinkle into this. For example, what would foo:bar,foo:baz parse as?

I just checked, it works without any error, however the former is dropped silently and the latter is returned, a way to support same key tags would be to extend array processing function, a new function not really tied to datadog tag manipulation per se is available in #7908 could help doing some tag filtering just using join / split as simple parser/encoder.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @prognant ! This looks good to me for adding the encode_key_value. I agree we could add an option to output keys with true values as just the key as a follow-up. This might be required for the datadog tags work?

Could we add this new function to the benches?

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! Almost there.

Comment on lines -252 to -253
want: Ok("agent.id=1234 agent.name=vector event=log log.file.path=encode_logfmt.rs network.ip.0=127 network.ip.1=0 network.ip.2=0 network.ip.3=1 network.proto=tcp"),
tdef: TypeDef::new().bytes().fallible(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep these (logfmt specific) tests, or were they moved to the new function?

Copy link
Contributor Author

@prognant prognant Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same as parse_logfmt/parse_key_value, i.e. no tests in the encode_logfmt (Existing ones were relocated in encode_key_value). I find it consistent, and as default separators match the logfmt format it is not illogical.

lib/vrl/stdlib/src/encode_key_value.rs Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/encode_key_value.rs Outdated Show resolved Hide resolved
}

fn type_def(&self, _state: &state::Compiler) -> TypeDef {
TypeDef::new().bytes().fallible()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we can make this function infallible if the fields_ordering argument is omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so as well, I'll give it a try.

@prognant
Copy link
Contributor Author

Nice, thanks @prognant ! This looks good to me for adding the encode_key_value. I agree we could add an option to output keys with true values as just the key as a follow-up. This might be required for the datadog tags work?

Could we add this new function to the benches?

Indeed, I was planning to work on that after getting that one merged, I think it makes sense to have encode_key_value and parse_key_value as symmetric as possible, and it would prevent datadog tags consisting in standalone key to be modified when edited using those function.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@prognant prognant changed the title enhancement(remap): add encode_key_value function enhancement(vrl): add encode_key_value function Jun 21, 2021
@prognant prognant enabled auto-merge (squash) June 21, 2021 12:07
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor remarks, but otherwise this LGTM

docs/reference/remap/functions/encode_logfmt.cue Outdated Show resolved Hide resolved
lib/vrl/stdlib/src/encode_key_value.rs Outdated Show resolved Hide resolved
prognant and others added 16 commits June 22, 2021 09:43
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Co-authored-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Signed-off-by: prognant <pierre.rognant@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants