Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support docker image history scanning #2882

Merged
merged 3 commits into from
May 28, 2024

Conversation

jamestelfer
Copy link
Contributor

@jamestelfer jamestelfer commented May 25, 2024

Description

Note

This PR has been created to help illustrate a possible fix for #2881. I'd be grateful for any feedback, and if this is useful, I'll follow the guidance to ready it for merge.

Enables the docker scanner to send image history records to the scanner. These records are generally the commands that were run to build each image layer. The entries look similar to:

    {
      "created": "2024-05-24T05:26:37.56186972Z",
      "created_by": "RUN |8 PNPM_VERSION=v8.15.5 GITHUB_REGISTRY_TOKEN=ghp_ohmygodnessaleakedvalue /bin/sh -c unwise-command '${GITHUB_REGISTRY_TOKEN}' ",
      "comment": "buildkit.dockerfile.v0"
    },

In this example, the secret is leaked from a build argument, but it could equally be from a command line literal.

Fixes #2881

Questions

  1. I'm currently using image://config-file/history/index as the filename for each history entry (see diff). Is this sufficient for identification? It doesn't look particularly flash, but I'm not sure what would be better.
  2. The Layer in the Docker metadata is entered as the empty string. If this isn't OK, I'd welcome suggestions on fixing it.
  3. I added a new metric docker_history_entries_scanned for this, assuming this is the right thing to do.

Checklist

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@CLAassistant
Copy link

CLAassistant commented May 25, 2024

CLA assistant check
All committers have signed the CLA.

@jamestelfer jamestelfer marked this pull request as ready for review May 25, 2024 12:46
@jamestelfer jamestelfer requested a review from a team as a code owner May 25, 2024 12:46
Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition @jamestelfer this looks great to me.

pkg/sources/docker/docker.go Show resolved Hide resolved
pkg/sources/docker/docker_test.go Show resolved Hide resolved
pkg/sources/docker/docker.go Outdated Show resolved Hide resolved
@ahrav
Copy link
Collaborator

ahrav commented May 27, 2024

Description

Note

This PR has been created to help illustrate a possible fix for #2881. I'd be grateful for any feedback, and if this is useful, I'll follow the guidance to ready it for merge.

Enables the docker scanner to send image history records to the scanner. These records are generally the commands that were run to build each image layer. The entries look similar to:

    {
      "created": "2024-05-24T05:26:37.56186972Z",
      "created_by": "RUN |8 PNPM_VERSION=v8.15.5 GITHUB_REGISTRY_TOKEN=ghp_ohmygodnessaleakedvalue /bin/sh -c unwise-command '${GITHUB_REGISTRY_TOKEN}' ",
      "comment": "buildkit.dockerfile.v0"
    },

In this example, the secret is leaked from a build argument, but it could equally be from a command line literal.

Fixes #2881

Questions

  1. I'm currently using image://config-file/history/index as the filename for each history entry (see diff). Is this sufficient for identification? It doesn't look particularly flash, but I'm not sure what would be better.
  2. The Layer in the Docker metadata is entered as the empty string. If this isn't OK, I'd welcome suggestions on fixing it.
  3. I added a new metric docker_history_entries_scanned for this, assuming this is the right thing to do.

Checklist

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

I am not too well versed in our Docker source, but from what I can tell this looks pretty good to me. That being said, @zricethezav @joeleonjr would you mind giving this a look as well. Thanks.

@rgmz
Copy link
Contributor

rgmz commented May 27, 2024

The Layer in the Docker metadata is entered as the empty string. If this isn't OK, I'd welcome suggestions on fixing it.

It's possible to associate history with the layer ID by index (skipping empty layers). (On that note, the current TruffleHog implementation is incorrect/not ideal and uses the digest instead of the layer ID.)

For instance, if you look at the postgres:latest you can see:

  1. #(nop) ADD file:5a is layer 254fddc33a7883e66303d20feb68795952cb91cb0af7574e129f6bfd8efad544
  2. RUN /bin/sh -c set -eux; \tgroupadd -r postgres --gid=999; is layer d3b78d99bcc3eee03f029078417d7d1076170980d4073719fdb6d8c187e003df.

History

{
"history": [
        {
            "created": "2024-05-09T18:58:11Z",
            "created_by": "/bin/sh -c #(nop) ADD file:5aaace706aa00ff97d243daa2c29f5de88f124e1b97c570634f16eef90783286 in / "
        },
        {
            "created": "2024-05-09T18:58:11Z",
            "created_by": "/bin/sh -c #(nop)  CMD [\"bash\"]",
            "empty_layer": true
        },
        {
            "created": "2024-05-09T18:58:11Z",
            "created_by": "RUN /bin/sh -c set -eux; \tgroupadd -r postgres --gid=999; \tuseradd -r -g postgres --uid=999 --home-dir=/var/lib/postgresql --shell=/bin/bash postgres; \tmkdir -p /var/lib/postgresql; \tchown -R postgres:postgres /var/lib/postgresql # buildkit",
            "comment": "buildkit.dockerfile.v0"
        },
        ...
}

Layers

{
        "Layers": [
            "254fddc33a7883e66303d20feb68795952cb91cb0af7574e129f6bfd8efad544/layer.tar",
            "d3b78d99bcc3eee03f029078417d7d1076170980d4073719fdb6d8c187e003df/layer.tar",
            "cfb5c12ea5e8675b8fbc75bf74b539422cd99ae65e6f1b3603638a2e6152e384/layer.tar",
            "7bd588371cf03e6283e8729a4a8e28c3e02864d6865eddcdd1c9a7c8718996be/layer.tar",
            ...
       ]
}

This can be validated by either docker pull postgres && docker save postgres > layers.tar or using a GUI like Dive.
image

@jamestelfer
Copy link
Contributor Author

The association with layers like this is a great point.

I can annotate an entry with its corresponding layer without issue, as long as it's ok that some records will not have an associated layer.

Not all entries will have a layer, but any may have a secret.

Given this, some records will have an empty string for the layer, and I presume that the proposed naming scheme is OK for a start?

Style suggestion from review feedback.
Where possible, add the associated layer to the history entry record. This may help tracing any issues discovered.

This also changes the entry reference format to `image-metadata:history:%d:created-by` which _may_ be more self-explanatory.
@jamestelfer jamestelfer requested a review from ahrav May 28, 2024 12:59
@jamestelfer
Copy link
Contributor Author

Thanks for all your feedback @ahrav and @rgmz! I think I've addressed it, looking forward to a re-review.

Note that I changed the entry reference format to image-metadata:history:%d:created-by which may be more self-explanatory than image://config-file/history/%d.

Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks again for adding this change.


// processHistoryEntry processes a history entry from the image configuration metadata.
func (s *Source) processHistoryEntry(ctx context.Context, historyInfo historyEntryInfo, chunksChan chan *sources.Chunk) error {
// Make up an identifier for this entry that is moderately sensible. There is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this comment.

pkg/sources/docker/docker.go Show resolved Hide resolved
@ahrav ahrav merged commit 0024b6c into trufflesecurity:main May 28, 2024
11 of 12 checks passed
@jamestelfer jamestelfer deleted the image-history-scanning branch May 29, 2024 04:52
haraldh added a commit to matter-labs/vault-auth-tee that referenced this pull request Jun 10, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
|
[trufflesecurity/trufflehog](https://togithub.com/trufflesecurity/trufflehog)
| action | minor | `v3.76.3` -> `v3.78.0` |

---

### Release Notes

<details>
<summary>trufflesecurity/trufflehog
(trufflesecurity/trufflehog)</summary>

###
[`v3.78.0`](https://togithub.com/trufflesecurity/trufflehog/releases/tag/v3.78.0)

[Compare
Source](https://togithub.com/trufflesecurity/trufflehog/compare/v3.77.0...v3.78.0)

#### What's Changed

- Add postman to tui by [@&#8203;hxnyk](https://togithub.com/hxnyk) in
[trufflesecurity/trufflehog#2895
- chore(deps): update alpine docker tag to v3.20 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2874
- fix(deps): update golang.org/x/exp digest to
[`fd00a4e`](https://togithub.com/trufflesecurity/trufflehog/commit/fd00a4e)
by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2899
- Update metadata for DataDog for API + APPKey by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2879
- Consistent docker image of MSSQL for integration testing. by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2898
- fix(deps): update module github.com/charmbracelet/bubbletea to v0.26.4
by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2885
- Remove 'www' from `DefaultFalsePositives` by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2896
- Redis integration test by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2901
- Fix Github `enumerateWithToken` test failure by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2880
- fix(deps): update module github.com/aws/aws-sdk-go to v1.53.14 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2900
- fix(deps): update module github.com/hashicorp/go-retryablehttp to
v0.7.7 by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2904
- integration testing for mongodb. by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2907
- fix(deps): update module
github.com/azure/go-autorest/autorest/azure/auth to v0.5.13 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2902
- chore: fix some comments by
[@&#8203;jinjiadu](https://togithub.com/jinjiadu) in
[trufflesecurity/trufflehog#2903
- \[chore] Always log git repositories being scanned by
[@&#8203;mcastorina](https://togithub.com/mcastorina) in
[trufflesecurity/trufflehog#2909
- Add Jenkins scanning by
[@&#8203;dustin-decker](https://togithub.com/dustin-decker) in
[trufflesecurity/trufflehog#2892

#### New Contributors

- [@&#8203;jinjiadu](https://togithub.com/jinjiadu) made their first
contribution in
[trufflesecurity/trufflehog#2903

**Full Changelog**:
trufflesecurity/trufflehog@v3.77.0...v3.78.0

###
[`v3.77.0`](https://togithub.com/trufflesecurity/trufflehog/releases/tag/v3.77.0)

[Compare
Source](https://togithub.com/trufflesecurity/trufflehog/compare/v3.76.3...v3.77.0)

#### What's Changed

- Remove "finished verificationOverlap chunks" log line by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2860
- fix(deps): update module github.com/wasilibs/go-re2 to v1.5.3 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2861
- fix(deps): update module google.golang.org/api to v0.181.0 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2857
- fix(deps): update module github.com/aws/aws-sdk-go to v1.53.5 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2859
- Update azure storage extra data by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2808
- Update regex for Organization in Azure Devops detector by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2866
- fix(deps): update module github.com/aws/aws-sdk-go to v1.53.6 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2867
- \[chore] - Use http.NewRequestWithContext by
[@&#8203;ahrav](https://togithub.com/ahrav) in
[trufflesecurity/trufflehog#2870
- adding Groq detector by [@&#8203;0x1](https://togithub.com/0x1) in
[trufflesecurity/trufflehog#2873
- Log reasons for GitLab repo exclusion by
[@&#8203;rosecodym](https://togithub.com/rosecodym) in
[trufflesecurity/trufflehog#2875
- \[github] Scan user repositories by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2814
- Elastic adapter by [@&#8203;camgunz](https://togithub.com/camgunz) in
[trufflesecurity/trufflehog#2727
- Improve handling of Gist URLs by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2653
- Fix some GitHub source test issues by
[@&#8203;rgmz](https://togithub.com/rgmz) in
[trufflesecurity/trufflehog#2774
- fix(deps): update module github.com/aws/aws-sdk-go to v1.53.10 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2871
- fix(deps): update module github.com/go-logr/logr to v1.4.2 by
[@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2869
- fix(deps): update module cloud.google.com/go/secretmanager to v1.13.1
by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2884
- fix(deps): update golang.org/x/exp digest to
[`4c93da0`](https://togithub.com/trufflesecurity/trufflehog/commit/4c93da0)
by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2883
- fix(deps): update module github.com/elastic/go-elasticsearch/v8 to
v8.13.1 by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2886
- fix(deps): update module github.com/gabriel-vasile/mimetype to v1.4.4
by [@&#8203;renovate](https://togithub.com/renovate) in
[trufflesecurity/trufflehog#2890
- Added extra data for LaunchDarkly by
[@&#8203;abmussani](https://togithub.com/abmussani) in
[trufflesecurity/trufflehog#2836
- feat: support docker image history scanning by
[@&#8203;jamestelfer](https://togithub.com/jamestelfer) in
[trufflesecurity/trufflehog#2882

#### New Contributors

- [@&#8203;camgunz](https://togithub.com/camgunz) made their first
contribution in
[trufflesecurity/trufflehog#2727
- [@&#8203;jamestelfer](https://togithub.com/jamestelfer) made their
first contribution in
[trufflesecurity/trufflehog#2882

**Full Changelog**:
trufflesecurity/trufflehog@v3.76.3...v3.77.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/matter-labs/vault-auth-tee).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuOCIsInVwZGF0ZWRJblZlciI6IjM3LjM4OC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support image history in Docker image scans
4 participants