Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

redact a pattern, but replace with a hash instead of a string #633

Closed
tmccombs opened this issue Jan 6, 2024 · 7 comments 路 Fixed by #640
Closed

redact a pattern, but replace with a hash instead of a string #633

tmccombs opened this issue Jan 6, 2024 · 7 comments 路 Fixed by #640
Labels
type: feature A value-adding code addition that introduce new functionality. vrl: stdlib Changes to the standard library

Comments

@tmccombs
Copy link
Contributor

tmccombs commented Jan 6, 2024

A note for the community

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

I want to redact certain personally identifiable information, such as an email address, from logs, but in such a way that if I already know the information (email address), I can search the logs for matching log entries.

Attempted Solutions

I think it would be possible to do this with a lua transform. But the documentation for that says to create an issue if the remap transform doesn't meet my needs.

Lua also doesn't have a built in sha256 function so I would either need to use a lua native sha256 implementation, which I suspect would be slow, or pull in a shared library that adds such a function for lua (is that possible with vector?)

Proposal

I can think of a few ways this could be done:

  • Add an option to the redact function that will replace the matched string with a hash of itself, Possibly with a configurable hash function, instead of with a fixed string.
  • add a new function that replaces a matching regex with its hash
  • make a function like replace, but instead of a string for the replacement, it uses a closure to compute a replacement value.

I think that the last bullet point is the most general and could be useful in other cases as well.

P.s. having a built in redaction pattern for email addresses would be awesome.

References

No response

Version

No response

@tmccombs tmccombs added the type: feature A value-adding code addition that introduce new functionality. label Jan 6, 2024
@jszwedko jszwedko transferred this issue from vectordotdev/vector Jan 9, 2024
@jszwedko
Copy link
Member

jszwedko commented Jan 9, 2024

Thanks for opening this @tmccombs ! This is something we considered when first implementing the redact function but sadly never circled back to add. It definitely makes sense (and we'd support a contribution here if motivated).

I also moved this issue to the VRL repo since the change would happen there.

@jszwedko jszwedko added the vrl: stdlib Changes to the standard library label Jan 9, 2024
@tmccombs
Copy link
Contributor Author

tmccombs commented Jan 9, 2024

Oh, haha, I already created a separate issue. I'll close this as a duplicate. Unless you think it would be worth having a function dedicated to hashing a pattern, in addition to or instead of a function to replace with a closure.

@tmccombs tmccombs closed this as completed Jan 9, 2024
@jszwedko
Copy link
Member

jszwedko commented Jan 9, 2024

Apologies @tmccombs , The issue you linked is actually the same as this one 馃槃 I just moved it to this repo. I'll reopen it.

@jszwedko jszwedko reopened this Jan 9, 2024
@tmccombs
Copy link
Contributor Author

tmccombs commented Jan 9, 2024

oops, I made the link wrong. I meant #628

@jszwedko
Copy link
Member

jszwedko commented Jan 9, 2024

Ah, I see. I think we can have both. This one can represent adding the ability to replace with a hash, easily. Even if we add support for replacing with the result of a closure I think having hashing be a first-class feature would be useful for discoverability and improved performance from a native implementation.

@tmccombs
Copy link
Contributor Author

Sounds good to me. I'd be happy to make a PR for that as well, now that I am a little familiar with the codebase. Do you think it would be better to have a separate function (redact_hash maybe?) or add an optional option to the existing redact function?

@jszwedko
Copy link
Member

That'd be great! Luckily the function implementations are relatively self-contained. Here is where redact is implemented: https://github.com/vectordotdev/vrl/blob/main/src/stdlib/redact.rs

I think adding additional configuration options to the redact function makes sense. For example, it could be called like:

redact("my id is 123456", filters: [r'\d+'], redactor: "sha256")

to replace 123456 with 8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92. There was some discussion about this here and the original issue.

tmccombs added a commit to tmccombs/vrl that referenced this issue Jan 12, 2024
This option allows you to specify either a custom string, or a hash function to use as the redactor,
in addition to the current behavior of the fixed string "[Redacted]".

Fixes vectordotdev#633
tmccombs added a commit to tmccombs/vrl that referenced this issue Jan 31, 2024
This option allows you to specify either a custom string, or a hash function to use as the redactor,
in addition to the current behavior of the fixed string "[Redacted]".

Fixes vectordotdev#633
tmccombs added a commit to tmccombs/vrl that referenced this issue Feb 6, 2024
This option allows you to specify either a custom string, or a hash function to use as the redactor,
in addition to the current behavior of the fixed string "[Redacted]".

Fixes vectordotdev#633
github-merge-queue bot pushed a commit that referenced this issue Feb 6, 2024
* enhancement(redact): Add redactor option

This option allows you to specify either a custom string, or a hash function to use as the redactor,
in addition to the current behavior of the fixed string "[Redacted]".

Fixes #633

* enhancement(redact): Add encoding argument

To specify how to encode hash values.

* test(redact): Add unit tests for redactors
@pront pront closed this as completed in #640 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A value-adding code addition that introduce new functionality. vrl: stdlib Changes to the standard library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants