Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect Hugging Face access tokens #1204

Merged
merged 1 commit into from Aug 24, 2023

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Jun 20, 2023

Description:

This PR adds a detection for Hugging Face access tokens.

Checklist:

  • Does your PR pass tests?
  • Have you written new tests for your changes?
  • Have you lint your code locally prior to submission?

@rgmz rgmz force-pushed the feat/huggingface-token branch 2 times, most recently from 90d1fbd to d62b765 Compare June 20, 2023 14:20
@rgmz rgmz marked this pull request as ready for review June 20, 2023 14:20
@rgmz rgmz force-pushed the feat/huggingface-token branch 2 times, most recently from 8adf1c4 to 24a6df3 Compare June 20, 2023 16:37
@rgmz rgmz mentioned this pull request Jun 23, 2023
3 tasks
r := config.Rule{
RuleID: "huggingface-access-token",
Description: "Hugging Face Access token",
Regex: regexp.MustCompile(`(?:^|[\\'"` + "`" + ` >=:])(hf_[a-zA-Z]{34})(?:$|[\\'"` + "`" + ` <])`),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgmz slightly concerned here since it's using regexp directly instead of one of the predefined generated functions. Any reason this can't be something like:

Regex: generateSemiGenericRegex([]string{"hf_", "hugging"}, "hf_" + alphaNumeric("34")),

Copy link
Contributor Author

@rgmz rgmz Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both generateSemiGenericRegex and generateUniqueTokenRegex make the pattern case-insensitive, which increased the number of false positives I encountered during testing. I also found several instances of the token that wasn't detected with the existing predefined regexes (e.g., " hf_token = \"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\"\n",).

That being said, I recognize that it's silly to hand-roll all the regexep boundaries on a per-rule basis. I was actually in midst of writing an issue describing how we could enhance the validation process by generation realistic scenarios that a secret could be found in. Currently the quality of test data for true & false positives varies based on rule, and would greatly benefit from a set of consistent test data rather than people adding new symbols in an ad-hoc untestable way that.

@zricethezav
Copy link
Collaborator

@rgmz would you mind resolving the conflicts? Happy to merge this after they have been resolved

@rgmz
Copy link
Contributor Author

rgmz commented Aug 24, 2023

Done. The regex issue you mention needs to be revisited in the future; I've started some of the work in #1222.

@zricethezav zricethezav merged commit 9fb36b2 into gitleaks:master Aug 24, 2023
1 check passed
@rgmz rgmz deleted the feat/huggingface-token branch August 24, 2023 15:39
quotengrote pushed a commit to quotengrote/miniflux-filter that referenced this pull request Jan 26, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [zricethezav/gitleaks](https://github.com/gitleaks/gitleaks) | patch | `v8.18.0` -> `v8.18.1` |

---

### Release Notes

<details>
<summary>gitleaks/gitleaks (zricethezav/gitleaks)</summary>

### [`v8.18.1`](https://github.com/gitleaks/gitleaks/releases/tag/v8.18.1)

[Compare Source](gitleaks/gitleaks@v8.18.0...v8.18.1)

#### Changelog

-   [`dab7d02`](gitleaks/gitleaks@dab7d02) dont crash on 100gb files pls ([#&#8203;1292](gitleaks/gitleaks#1292))
-   [`e63b657`](gitleaks/gitleaks@e63b657) remove secretgroup from default config ([#&#8203;1288](gitleaks/gitleaks#1288))
-   [`20fcf50`](gitleaks/gitleaks@20fcf50) feat: Hashicorp Terraform fields for password ([#&#8203;1237](gitleaks/gitleaks#1237))
-   [`b496677`](gitleaks/gitleaks@b496677) perf: avoid allocations with `(*regexp.Regexp).MatchString` ([#&#8203;1283](gitleaks/gitleaks#1283))
-   [`a3ab4e8`](gitleaks/gitleaks@a3ab4e8) refactor: more explicit rules ([#&#8203;1280](gitleaks/gitleaks#1280))
-   [`bd9a25a`](gitleaks/gitleaks@bd9a25a) bugfix: reduce false positives for stripe tokens by using word boundaries in regex ([#&#8203;1278](gitleaks/gitleaks#1278))
-   [`6d0d8b5`](gitleaks/gitleaks@6d0d8b5) add Infracost API rule ([#&#8203;1273](gitleaks/gitleaks#1273))
-   [`2959fc0`](gitleaks/gitleaks@2959fc0) refactor: simplify test asserts ([#&#8203;1271](gitleaks/gitleaks#1271))
-   [`d37b38f`](gitleaks/gitleaks@d37b38f) Update Makefile
-   [`14b1ca9`](gitleaks/gitleaks@14b1ca9) refactor: change detect tests to t.Fatal instead of log.Fatal ([#&#8203;1270](gitleaks/gitleaks#1270))
-   [`d9f86d6`](gitleaks/gitleaks@d9f86d6) feat(rules): Add detection for Scalingo API Token ([#&#8203;1262](gitleaks/gitleaks#1262))
-   [`ed34259`](gitleaks/gitleaks@ed34259) feat(jwt): detect base64-encoded tokens ([#&#8203;1256](gitleaks/gitleaks#1256))
-   [`0d5e46f`](gitleaks/gitleaks@0d5e46f) feat: add --ignore-gitleaks-allow cmd flag ([#&#8203;1260](gitleaks/gitleaks#1260))
-   [`a82ac29`](gitleaks/gitleaks@a82ac29) switch out libs ([#&#8203;1259](gitleaks/gitleaks#1259))
-   [`0b84afa`](gitleaks/gitleaks@0b84afa) fix: no-color option should also affect zerolog output ([#&#8203;1242](gitleaks/gitleaks#1242))
-   [`8976539`](gitleaks/gitleaks@8976539) Fixed lineEnd indexing if the match is the whole line ([#&#8203;1223](gitleaks/gitleaks#1223))
-   [`30c6117`](gitleaks/gitleaks@30c6117) feat: Add optional redaction value, default 100 ([#&#8203;1229](gitleaks/gitleaks#1229))
-   [`e9135cf`](gitleaks/gitleaks@e9135cf) fix(jwt): longer segment lengths ([#&#8203;1214](gitleaks/gitleaks#1214))
-   [`f65f915`](gitleaks/gitleaks@f65f915) Added yarn.lock file to default allowlist paths ([#&#8203;1258](gitleaks/gitleaks#1258))
-   [`abfd0f3`](gitleaks/gitleaks@abfd0f3) Update README.md
-   [`18283bb`](gitleaks/gitleaks@18283bb) feat(rules): make case insensitivity optional ([#&#8203;1215](gitleaks/gitleaks#1215))
-   [`9fb36b2`](gitleaks/gitleaks@9fb36b2) feat(rules): detect Hugging Face access tokens ([#&#8203;1204](gitleaks/gitleaks#1204))
-   [`db4bc0f`](gitleaks/gitleaks@db4bc0f) Resolve [#&#8203;1170](gitleaks/gitleaks#1170) - Enable selection of a single rule  ([#&#8203;1183](gitleaks/gitleaks#1183))
-   [`3cbcda2`](gitleaks/gitleaks@3cbcda2) Update authress.go to include alternate form account dash (-) ([#&#8203;1224](gitleaks/gitleaks#1224))
-   [`46c6272`](gitleaks/gitleaks@46c6272) refactor: remove unnecessary removing temp files in tests ([#&#8203;1255](gitleaks/gitleaks#1255))
-   [`963a697`](gitleaks/gitleaks@963a697) refactor: use os.ReadFile instead of os.Open + io.ReadAll ([#&#8203;1254](gitleaks/gitleaks#1254))
-   [`163ec21`](gitleaks/gitleaks@163ec21) fix(sumologic): improve patterns ([#&#8203;1218](gitleaks/gitleaks#1218))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4xNTIuMSIsInVwZGF0ZWRJblZlciI6IjM3LjE1Mi4xIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->

Reviewed-on: https://git.mgrote.net/container-images/miniflux-filter/pulls/13
Co-authored-by: Renovate Bot <renovate@mgrote.net>
Co-committed-by: Renovate Bot <renovate@mgrote.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants