New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect Hugging Face access tokens #1204
Conversation
90d1fbd
to
d62b765
Compare
8adf1c4
to
24a6df3
Compare
r := config.Rule{ | ||
RuleID: "huggingface-access-token", | ||
Description: "Hugging Face Access token", | ||
Regex: regexp.MustCompile(`(?:^|[\\'"` + "`" + ` >=:])(hf_[a-zA-Z]{34})(?:$|[\\'"` + "`" + ` <])`), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgmz slightly concerned here since it's using regexp directly instead of one of the predefined generated functions. Any reason this can't be something like:
Regex: generateSemiGenericRegex([]string{"hf_", "hugging"}, "hf_" + alphaNumeric("34")),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both generateSemiGenericRegex
and generateUniqueTokenRegex
make the pattern case-insensitive, which increased the number of false positives I encountered during testing. I also found several instances of the token that wasn't detected with the existing predefined regexes (e.g., " hf_token = \"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\"\n",
).
That being said, I recognize that it's silly to hand-roll all the regexep boundaries on a per-rule basis. I was actually in midst of writing an issue describing how we could enhance the validation process by generation realistic scenarios that a secret could be found in. Currently the quality of test data for true & false positives varies based on rule, and would greatly benefit from a set of consistent test data rather than people adding new symbols in an ad-hoc untestable way that.
24a6df3
to
1f05973
Compare
@rgmz would you mind resolving the conflicts? Happy to merge this after they have been resolved |
1f05973
to
8e6488d
Compare
8e6488d
to
a82c211
Compare
Done. The regex issue you mention needs to be revisited in the future; I've started some of the work in #1222. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [zricethezav/gitleaks](https://github.com/gitleaks/gitleaks) | patch | `v8.18.0` -> `v8.18.1` | --- ### Release Notes <details> <summary>gitleaks/gitleaks (zricethezav/gitleaks)</summary> ### [`v8.18.1`](https://github.com/gitleaks/gitleaks/releases/tag/v8.18.1) [Compare Source](gitleaks/gitleaks@v8.18.0...v8.18.1) #### Changelog - [`dab7d02`](gitleaks/gitleaks@dab7d02) dont crash on 100gb files pls ([#​1292](gitleaks/gitleaks#1292)) - [`e63b657`](gitleaks/gitleaks@e63b657) remove secretgroup from default config ([#​1288](gitleaks/gitleaks#1288)) - [`20fcf50`](gitleaks/gitleaks@20fcf50) feat: Hashicorp Terraform fields for password ([#​1237](gitleaks/gitleaks#1237)) - [`b496677`](gitleaks/gitleaks@b496677) perf: avoid allocations with `(*regexp.Regexp).MatchString` ([#​1283](gitleaks/gitleaks#1283)) - [`a3ab4e8`](gitleaks/gitleaks@a3ab4e8) refactor: more explicit rules ([#​1280](gitleaks/gitleaks#1280)) - [`bd9a25a`](gitleaks/gitleaks@bd9a25a) bugfix: reduce false positives for stripe tokens by using word boundaries in regex ([#​1278](gitleaks/gitleaks#1278)) - [`6d0d8b5`](gitleaks/gitleaks@6d0d8b5) add Infracost API rule ([#​1273](gitleaks/gitleaks#1273)) - [`2959fc0`](gitleaks/gitleaks@2959fc0) refactor: simplify test asserts ([#​1271](gitleaks/gitleaks#1271)) - [`d37b38f`](gitleaks/gitleaks@d37b38f) Update Makefile - [`14b1ca9`](gitleaks/gitleaks@14b1ca9) refactor: change detect tests to t.Fatal instead of log.Fatal ([#​1270](gitleaks/gitleaks#1270)) - [`d9f86d6`](gitleaks/gitleaks@d9f86d6) feat(rules): Add detection for Scalingo API Token ([#​1262](gitleaks/gitleaks#1262)) - [`ed34259`](gitleaks/gitleaks@ed34259) feat(jwt): detect base64-encoded tokens ([#​1256](gitleaks/gitleaks#1256)) - [`0d5e46f`](gitleaks/gitleaks@0d5e46f) feat: add --ignore-gitleaks-allow cmd flag ([#​1260](gitleaks/gitleaks#1260)) - [`a82ac29`](gitleaks/gitleaks@a82ac29) switch out libs ([#​1259](gitleaks/gitleaks#1259)) - [`0b84afa`](gitleaks/gitleaks@0b84afa) fix: no-color option should also affect zerolog output ([#​1242](gitleaks/gitleaks#1242)) - [`8976539`](gitleaks/gitleaks@8976539) Fixed lineEnd indexing if the match is the whole line ([#​1223](gitleaks/gitleaks#1223)) - [`30c6117`](gitleaks/gitleaks@30c6117) feat: Add optional redaction value, default 100 ([#​1229](gitleaks/gitleaks#1229)) - [`e9135cf`](gitleaks/gitleaks@e9135cf) fix(jwt): longer segment lengths ([#​1214](gitleaks/gitleaks#1214)) - [`f65f915`](gitleaks/gitleaks@f65f915) Added yarn.lock file to default allowlist paths ([#​1258](gitleaks/gitleaks#1258)) - [`abfd0f3`](gitleaks/gitleaks@abfd0f3) Update README.md - [`18283bb`](gitleaks/gitleaks@18283bb) feat(rules): make case insensitivity optional ([#​1215](gitleaks/gitleaks#1215)) - [`9fb36b2`](gitleaks/gitleaks@9fb36b2) feat(rules): detect Hugging Face access tokens ([#​1204](gitleaks/gitleaks#1204)) - [`db4bc0f`](gitleaks/gitleaks@db4bc0f) Resolve [#​1170](gitleaks/gitleaks#1170) - Enable selection of a single rule ([#​1183](gitleaks/gitleaks#1183)) - [`3cbcda2`](gitleaks/gitleaks@3cbcda2) Update authress.go to include alternate form account dash (-) ([#​1224](gitleaks/gitleaks#1224)) - [`46c6272`](gitleaks/gitleaks@46c6272) refactor: remove unnecessary removing temp files in tests ([#​1255](gitleaks/gitleaks#1255)) - [`963a697`](gitleaks/gitleaks@963a697) refactor: use os.ReadFile instead of os.Open + io.ReadAll ([#​1254](gitleaks/gitleaks#1254)) - [`163ec21`](gitleaks/gitleaks@163ec21) fix(sumologic): improve patterns ([#​1218](gitleaks/gitleaks#1218)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNTIuMSIsInVwZGF0ZWRJblZlciI6IjM3LjE1Mi4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=--> Reviewed-on: https://git.mgrote.net/container-images/miniflux-filter/pulls/13 Co-authored-by: Renovate Bot <renovate@mgrote.net> Co-committed-by: Renovate Bot <renovate@mgrote.net>
Description:
This PR adds a detection for Hugging Face access tokens.
Checklist: