Using regex to scan for specific credentials #209

ericwb · 2019-04-26T21:36:37Z

Is your feature request related to a problem? Please describe.
See https://github.com/Yelp/detect-secrets for reference, but it would be useful for Precaution to also warn on secrets, Confidential code headers, etc. So level of detection is done through GitHub token scanning, but not all.

Note that also detect-secrets operates on all files (json, txt, etc), and not just languages files. There are definitely cases where API keys are stuffed into shell scripts, json files, yaml, etc.

Describe the solution you'd like
A new, slightly different type of scanner. Not really a linter. And it would scan all types of file extensions, not just language code, so the download of files would need to include everything.

Describe alternatives you've considered
Token scanning provided by GitHub does some things by not all. I don't think it would find a Confidential header whereas detect-secrets could be configured to do so. But it would be good to research and do a comparison analysis.

There are also probably many other CLI tools to do similar scans. This was one recommended from a blog article.

Additional context

ericwb · 2019-04-26T21:38:07Z

This integration could also add some differentiation as compared to similar competing Apps

ericwb · 2019-05-09T04:47:36Z

Note; detect-secrets overlaps with language specific linters somewhat. It's KeywordDetector plugin will detect variables named password or secrets, same as Bandit.

And many of the other plugins it has could be integrated into language specific linters, since they would be better at detecting using an AST rather than regex.

However, non-language files (*.py, *.go, *.js, *.ts) would be best detected by detect-secrets. And there are examples of people putting secret keys in json files, yaml files, etc.

ericwb · 2019-05-09T04:48:33Z

So should detect-secrets only be used on files not scanned by other linters?

joshuagl · 2019-05-09T12:49:13Z

I like the idea of detect-secrets only scanning files not scanned by other linters, but I don't think it's as cut and dry as not scanning files scanned by other linters:

linters may not support secret detection (i.e. I don't think tslint does)
a linters secret detection may not be as complete as detect-secret's , for example I don't think many language security linters do high entropy string detection?

There's a few approaches we could take here:

try to ensure (through contributions, where necessary) that all linters we use have comprehensive secret detection:
i. we could ramp up to this by having detect-secrets exclude not run against a list of file types for which we have confidence the language linter handles secret detection. I think this list would start out at zero and increase as we work with the upstream linters?
have some logic to prevent reporting the same/similar issues for the same line of code. This might mean:
a. having a model to map similar functionality across linters/scanners which will be run against the same code
b. assigning priorities to the linters
c. filtering out duplicate reports from lower priority scanners based on the functionality map and priorities

MVrachev · 2019-05-16T07:45:03Z

There are a few particularities in detect-secrets:

it doesn't provide "severity" and "confidence" level.
We should discuss what approach we should use
it doesn't have good documentation for every plugin it uses.
The only documentation I could find is this: https://github.com/Yelp/detect-secrets#currently-supported-plugins
it doesn't provide detail information for every issue like the
other linters. That means that because the "message" attribute is
mandatory in the annotation object
https://developer.github.com/v3/checks/runs/#annotations-object
we should figure out what detail message we would provide for the
different issues it finds.

We should discuss what to do with those.

MVrachev · 2019-05-16T07:53:08Z

I think that we can try and see how an option 2

have some logic to prevent reporting the same/similar issues for the same line of code.

implemented with the logic from subpoint c

c. filtering out duplicate reports from lower priority scanners based on the functionality map and priorities

MVrachev · 2019-05-16T08:08:40Z

Here is the list of the pull requests/features we should add to fully integrate detect-secrets into Precaution:

Spawn functionality + unit tests for it
Report functionality + unit tests for it
Add detect-secrets in the README.md and in the doc pages:
- initial_setup
- false-positives
- setting up for manual deployment?

MVrachev · 2019-05-21T11:26:57Z

Currently, detect-secrets doesn't give you the option to use it with multiple parameters like that:
detect-secrets scan test.js test.yaml Yelp/detect-secrets#180

That's why to achieve scanning multiple parameters we have to call it like that:
detect-secrets scan .

The problem is that currently, we are using two folders for our temporary files which will be processed
cache/go/src/<repo_id>/<pr_id>/ for all Go files and cache/<repo_id>/<pr_id>/ for all other files.
If you want to read why we use two folders read the discussions here:

What happens because of that is that when we start detect-secrets from the cache/go/src/<repo_id>/<pr_id>/ directory it will scan all go files with ease but it will miss all other files in the cache/<repo_id>/<pr_id>/ folder.

If we start detect-secrets from cache/<repo_id>/<pr_id>/ folder we would scan all Go files below and if there are not deleted files from an old scan we would report for their errors too.

Because of all those problems, I believe that first, we should create a Docker image which will allow us to use Go modules in Gosec (as described here: #227) and then we would continue with the detect-secret spawn functionality.

MVrachev · 2019-05-27T12:12:20Z

I made a research for other tools besides detect-secrets and here are my findings:

Tools using the Shannon entropy for minimizing the false-positives:

There are a lot of tools focused on finding secrets which are working on Git repositories.
Those tools won't work for us because we are scanning files under the cache/ folder.
But cache/ is in our gitignore file so those files will be missed.

nishakm · 2019-05-28T12:33:02Z

How about scanning one at a time?

MVrachev · 2019-05-28T12:48:51Z

How about scanning one at a time?

detect-secrets doesn't support multiple parameters right now: Yelp/detect-secrets#180 which means that we should give one parameter - a folder to detect-secrets which contains all files to be scanned

As I mentioned in this comment #209 (comment) we have a problem with that which will be resolved if we have Dockerfile.

MVrachev · 2019-06-13T07:43:09Z

We reevaluated the benefits of using detect-secrets HighEntropyString plugin.
We found out that this plugin won't work for us because we used the default entropy configuration and then detect-secrets didn't catch an obvious hardcoded credentials.
@joshuagl decided to configure it a little bit and he got a lot of false positives.

It's meaningless to use detect-secrets without the HighEntropyString and that's why we decided that we would write our own implementation using regular expressions to scan for specific tokens.
Because that would be a nontrivial feature to implement we decided to move this feature to the Near future milestone.

MVrachev · 2019-06-13T08:04:33Z

It's important when implementing the regex scan how we would store the files for scan in the memory because Heroku provides us with limited RAM https://devcenter.heroku.com/articles/limits#dyno-memory.
One of the ways to process the files will be by fixing the maximum amount of memory we want our files to use and then before opening a file:

check the size of the file
if the currentUsedMemory + the size of the file > maximumRAMusage then we have to use 4096 bytes (or another size) of blocks to process that file
if we can load the file in memory and use less or equal memory of the defined maximumRAMusage memory then load the whole file in the memory

PS: Closed the issue by mistake

ericwb added the enhancement New feature or request label Apr 26, 2019

joshuagl modified the milestone: Initial release Apr 29, 2019

MVrachev self-assigned this May 15, 2019

MVrachev mentioned this issue May 15, 2019

Add report functionality for detect-secrets + tests #235

Closed

MVrachev changed the title ~~Integrate detect-secrets~~ Using regex to scan for specific credentials Jun 13, 2019

MVrachev closed this as completed Jun 13, 2019

MVrachev reopened this Jun 13, 2019

MVrachev modified the milestones: Initial release, Near Future Jun 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using regex to scan for specific credentials #209

Using regex to scan for specific credentials #209

ericwb commented Apr 26, 2019

ericwb commented Apr 26, 2019

ericwb commented May 9, 2019

ericwb commented May 9, 2019

joshuagl commented May 9, 2019

MVrachev commented May 16, 2019 •

edited

Loading

MVrachev commented May 16, 2019

MVrachev commented May 16, 2019 •

edited by joshuagl

Loading

MVrachev commented May 21, 2019 •

edited

Loading

MVrachev commented May 27, 2019

nishakm commented May 28, 2019

MVrachev commented May 28, 2019 •

edited

Loading

MVrachev commented Jun 13, 2019 •

edited

Loading

MVrachev commented Jun 13, 2019

Using regex to scan for specific credentials #209

Using regex to scan for specific credentials #209

Comments

ericwb commented Apr 26, 2019

ericwb commented Apr 26, 2019

ericwb commented May 9, 2019

ericwb commented May 9, 2019

joshuagl commented May 9, 2019

MVrachev commented May 16, 2019 • edited Loading

MVrachev commented May 16, 2019

MVrachev commented May 16, 2019 • edited by joshuagl Loading

MVrachev commented May 21, 2019 • edited Loading

MVrachev commented May 27, 2019

nishakm commented May 28, 2019

MVrachev commented May 28, 2019 • edited Loading

MVrachev commented Jun 13, 2019 • edited Loading

MVrachev commented Jun 13, 2019

MVrachev commented May 16, 2019 •

edited

Loading

MVrachev commented May 16, 2019 •

edited by joshuagl

Loading

MVrachev commented May 21, 2019 •

edited

Loading

MVrachev commented May 28, 2019 •

edited

Loading

MVrachev commented Jun 13, 2019 •

edited

Loading