-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stdlib): add sieve
string function
#724
feat(stdlib): add sieve
string function
#724
Conversation
Adds a `sieve` string function which can remove unwanted characters from a string using a list of allowed characters (or regex of allowed patterns). Fixes: vectordotdev#704
required: false, | ||
}, | ||
Parameter { | ||
keyword: "replace_repeated", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ Can you explain a bit what this replace_repeated
argument does? I think we could come up with a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bit confusing.
Based on the original issue (#704), it is meant to provide a way to set a replacement string when multiple repeated characters are removed from the original string.
You can also see how it behaves in tests and examples.
all_options
example might represent this the best:
all_options {
args: func_args![value: value!("test123%456.فوائد.net."), permitted_characters: regex::Regex::new("[a-z.0-9]").unwrap(), replace_single: "X", replace_repeated: "<REMOVED>"],
want: Ok(value!("test123X456.<REMOVED>.net.")),
tdef: TypeDef::bytes().infallible(),
}
In this example the single %
found in the string that was not allowed, was replaced with X
, but that Arabic (maybe?) script string was replaced with <REMOVED>
.
src/stdlib/sieve.rs
Outdated
let mut result = String::with_capacity(value.len()); | ||
let mut missed_length = 0; | ||
for char in value.chars() { | ||
if characters.contains(&char) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any crate that uses a more efficient algorithm for string matching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check it out and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, this is an optional step, I was curious. If there's nothing mature available we can just leave a comment that this can be optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to respond. I was thinking, maybe it would make sense to remove the option to use string
and enforce only regex
. On that one at least no manual checks are performed and I just iterate through the returned matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution!
* docs(vrl): add documentation for `sieve` function Related: vectordotdev/vrl#724 * Fix typo in sieve docs Co-authored-by: jhgilbert <j.h.gilbert@gmail.com> * Update function docs after removing string pattern option * cue fmt Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Fix example Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: jhgilbert <j.h.gilbert@gmail.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
* docs(vrl): add documentation for `sieve` function Related: vectordotdev/vrl#724 * Fix typo in sieve docs Co-authored-by: jhgilbert <j.h.gilbert@gmail.com> * Update function docs after removing string pattern option * cue fmt Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> * Fix example Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> --------- Signed-off-by: Jesse Szwedko <jesse.szwedko@datadoghq.com> Co-authored-by: jhgilbert <j.h.gilbert@gmail.com> Co-authored-by: Jesse Szwedko <jesse.szwedko@datadoghq.com>
Adds a
sieve
string function which can remove unwanted characters from a string using a list of allowed characters (or regex of allowed patterns).Fixes: #704