Improve false negative ratio by detecting keys with hyphens #49

domanchi · 2018-07-03T16:48:31Z

Certain API keys use hyphens.

e.g. blahblah-aaaa-bbbb-cccc-ddddddd

This currently is not caught by the suite of HighEntropyStringPlugins.

The text was updated successfully, but these errors were encountered:

KevinHock · 2018-07-03T17:36:37Z

There are also some AKIAblahblah keys that do not have high-entropy enough, but I suppose that should be a different issue :)

mohit-surana · 2018-10-24T01:10:29Z

I wanted to work on this issue. On doing a little bit of digging, I came up with two proposed solutions and would really appreciate any comments regarding the same:

Can we expect the client to include a hyphen within the charset? If yes, then I believe we just need to use re.escape(charset) instead of charset on this line
If clients should continue to use the existing charset, we need to either enrich just the regex, or both the regex and charset (internally). Unless we append the hyphen to the charset in the constructor, the entropy calculation will not use hyphens. So should hyphens be included in the entropy calculation?

Finally, I believe tests should go here, right?
Thanks!

domanchi · 2018-10-24T01:20:24Z

Those are good questions @mohit-surana! The short answers are:

No (we should be able to support both, holistically)
Yes?

Based on the entropy algorithm, it seems that the more characters in the charset, the higher the entropy can be.

Following this logic, it would suggest that a more liberal charset may require a different entropy configuration level, seeing that the same level may produce more false positives.

However, if this is true, then any additions to the charset would require a completely separate plugin (e.g. adding hyphens and percentage signs -%), and the maintenance of these potential plugins could get very messy.

Any thoughts on this?

mohit-surana · 2018-10-24T01:59:17Z

Theoretically, yes. It would increase the false positive rate while reducing false negative rates as well. Ultimately it will be a trade-off between false positive and false negatives. Do we have any statistics regarding the current system's false positive rates?

How can we design good tests to measure the new statistics, that have a large coverage to assess the new FP/FN rates?

As for new plugins, it seems to me that ultimately, making changes to the entropy calculation is a big NO as it may affect current clients. And you can make combinatorial number of plugins if we make one for each small difference. Would it be better to allow clients to pass an additional argument indicating whether they would like to include additional preset/client specified symbols?

Bottom line: If FP increases a lot, we need to have the client make a conscious decision to move into a new version that supports hyphens.

domanchi · 2018-10-24T20:37:08Z

I'm in favor of the additional argument, but I don't know how that might look like with the user interface. Certainly would increase the scope of this issue (and perhaps no longer a "good first issue")! If you still wanted to take it on, we'd more than welcome the contribution!

Otherwise, the AKIA prefixed issue that @KevinHock mentioned may be a good start. Though it doesn't strictly find the AWS secret, it gives a good indication that there might be a secret there, in the same principle as "where there's smoke, there's fire".

As for testing FP/FN rates, we are building a large internal collection of various different secrets that we use to experiment with our new plugins. We can certainly run your plugin on our corpus, and help tweak its default sensitivity.

mohit-surana · 2018-10-30T15:47:54Z

Hey @domanchi. I am interested in implementing the additional argument version of the solution. I am a bit caught up with stuff at the moment and I'll get back to it as soon as time permits!

The internal corpus sounds like a really good idea, and in general will help attract more users as well. As for the AKIA prefix, I will need to think further to understand how we can incorporate patterns along with the entropy calculation. Let me get back to you!

lorenzodb1 · 2024-05-09T17:03:45Z

We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.

domanchi added the good first issue The issue can be tackled by someone who has little to no knowledge about the project. label Jul 3, 2018

KevinHock added the help wanted Indicates that we would like someone that’s not a maintainer to work on the issue. label Sep 13, 2018

KevinHock self-assigned this Jun 24, 2019

KevinHock added accuracy false negatives and removed good first issue The issue can be tackled by someone who has little to no knowledge about the project. help wanted Indicates that we would like someone that’s not a maintainer to work on the issue. labels Jun 24, 2019

lorenzodb1 added pending The issue still needs to be reviewed by one of the maintainers. and removed accuracy labels Jun 13, 2022

lorenzodb1 closed this as completed May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve false negative ratio by detecting keys with hyphens #49

Improve false negative ratio by detecting keys with hyphens #49

domanchi commented Jul 3, 2018

KevinHock commented Jul 3, 2018

mohit-surana commented Oct 24, 2018 •

edited

Loading

domanchi commented Oct 24, 2018

mohit-surana commented Oct 24, 2018

domanchi commented Oct 24, 2018

mohit-surana commented Oct 30, 2018

lorenzodb1 commented May 9, 2024

Improve false negative ratio by detecting keys with hyphens #49

Improve false negative ratio by detecting keys with hyphens #49

Comments

domanchi commented Jul 3, 2018

KevinHock commented Jul 3, 2018

mohit-surana commented Oct 24, 2018 • edited Loading

domanchi commented Oct 24, 2018

mohit-surana commented Oct 24, 2018

domanchi commented Oct 24, 2018

mohit-surana commented Oct 30, 2018

lorenzodb1 commented May 9, 2024

mohit-surana commented Oct 24, 2018 •

edited

Loading