Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex replace using a closure #628

Closed
tmccombs opened this issue Jan 7, 2024 · 2 comments · Fixed by #636
Closed

regex replace using a closure #628

tmccombs opened this issue Jan 7, 2024 · 2 comments · Fixed by #636

Comments

@tmccombs
Copy link
Contributor

tmccombs commented Jan 7, 2024

I filed #633, but now realize that this might have been a better place to report it.

For convenience I will copy over my text from there:

Use Cases

I want to redact certain personally identifiable information, such as an email address, from logs, but in such a way that if I already know the information (email address), I can search the logs for matching log entries.

Attempted Solutions

I think it would be possible to do this with a lua transform. But the documentation for that says to create an issue if the remap transform doesn't meet my needs.

Lua also doesn't have a built in sha256 function so I would either need to use a lua native sha256 implementation, which I suspect would be slow, or pull in a shared library that adds such a function for lua (is that possible with vector?)

Proposal

I can think of a few ways this could be done:

  • Add an option to the redact function that will replace the matched string with a hash of itself, Possibly with a configurable hash function, instead of with a fixed string.
  • add a new function that replaces a matching regex with its hash
  • make a function like replace, but instead of a string for the replacement, it uses a closure to compute a replacement value.

I think that the last bullet point is the most general and could be useful in other cases as well.

P.s. having a built in redaction pattern for email addresses would be awesome.

@tmccombs
Copy link
Contributor Author

tmccombs commented Jan 7, 2024

I don't currently have time to put together a full PR, but here is a rough draft of a partial implementation of a replace function that uses a closure:

fn replace_with<T>(value: Value, pattern: Value, ctx: &mut Context, runner: closure::Runner<T>) -> Resolved {
    let value = value.try_bytes_utf8_lossy()?;
    // TODO: support count?
    match pattern {
        Value::Regex(regex) => {
            let mut i: usize = 0;
            let mut failure: Option<ExpressionError> = None;

            let replaced = regex.replace_all(&value, |captures| {
                let captures_value = captures_to_value(captures);
                let result = runner.run_index_value(ctx, i, captures_value).and_then(|s| Ok(s.try_bytes_utf8_lossy()?));
                match result {
                    Ok(v) => v,
                    Err(e) => {
                        failure.get_or_insert(e);
                        Cow::from("")
                    }
                }
            }).as_bytes();
            if let Some(err) = failure {
                Err(e)
            } else {
                Ok(replaced.into())
            }
        }
        // TODO: should we also support Value::Bytes?
        value => Err(ValueError::Expected {
            got: value.kind(),
            expected: Kinde.regex()
        }.into())
    }
}

fn coaptures_to_value(captures: &Captures) -> Value {
    if captures.len() == 1 {
        // this is garanteed not to panic because the there is always 1 result.
        captures[0].into()
    } else {
        // return an array of the capture groups
        captures.iter().map(|m| m.as_str()).into()
    }
}

@pront
Copy link
Collaborator

pront commented Jan 8, 2024

This is a neat idea. If you are interested in driving this to completion, your draft is in the right direction, it just needs some boilerplate code, unit tests and a VRL test. Otherwise, we will add this to the backlog and prioritize accordingly.

tmccombs added a commit to tmccombs/vrl that referenced this issue Jan 9, 2024
This is similar to `replace`, but takes a closure to compute the replacment from the match and
capture groups, instead of taking a replacment string.

Fixes: vectordotdev#628
tmccombs added a commit to tmccombs/vrl that referenced this issue Jan 17, 2024
This is similar to `replace`, but takes a closure to compute the replacment from the match and
capture groups, instead of taking a replacment string.

Fixes: vectordotdev#628
github-merge-queue bot pushed a commit that referenced this issue Jan 24, 2024
* feat(stdlib): Add replace_with function

This is similar to `replace`, but takes a closure to compute the replacment from the match and
capture groups, instead of taking a replacment string.

Fixes: #628

* Pull request feedback

* enhancement(replace_with): Pass object instead of array to closure

This allows us to expose the named capture groups with names.

* Add named capture groups directly to capture object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants