Enhancement on Pipeline level provenance #850

chuangw6 · 2023-07-06T19:40:24Z

Feature request

For a PipelineRun, Chains is able to dive into individual taskruns' type hinting results without pipeline author having to surface task results to pipeline results.
(Current Chains: only recognize pipeline-level type hinting results for a PipelineRun.)
Within a pipelinerun, Chains should generate a provenance as soon as it sees an artifact is generated by a task even though some tasks post artifact generation fail, which as a result fails the whole pipelinerun.
(Current Chains: will not generate a provenance until individual taskruns and the pipelinerun succeed.)
Per-artifact Provenance instead of Per-run Provenance
(Current Chains: generate multiple provenances with the exact same content, and those provenances are associated with different artifacts being generated from that Run)
Examples:
- If a pipeline consists of tasks like clone -> build-push -> slack-notify, the provenance associated with the artifact should only contain the info of the tasks that contribute to the build process of the artifact, which would be clone and build-push in this case. In the 2nd feature request mentioned above, when slack-notify fails, the provenance should still be available for that artifact.
- If a pipeline consists of 2 sets of tasks for producing 2 different artifacts, clone1 -> build-push-go (producing go artifact) and clone 2 -> build-push-oci (producing oci artifact), the provenance associated with "go artifact" should only contain information of tasks clone1 and build-push-go, whereas the provenance associated with "oci artifact" should only contain information of tasks clone2 and build-push-oci

Use case

Feature 1: better user experience and avoid errors that could happen when author propagates task results to pipeline results
Feature 2: the provenance associated with the artifact being generated in a pipeline is still available even though some post-artifact generation tasks i.e. slack notify or deploy task, or the tasks that have nothing related to building that artifact i.e. tasks for building other language artifacts failed. In particular, if we make provenance available before a deploy task, it would be helpful for some deployment gate mechanism which can do provenance verification and decide whether to proceed deployment.
Feature 3: "focused" provenance that only contain information of tasks that are related to the artifact generation is the real "per artifact" provenance.

The text was updated successfully, but these errors were encountered:

chuangw6 · 2023-07-06T19:41:33Z

Looking for feedback and thoughts! Thanks everyone.

chuangw6 · 2023-07-06T19:41:47Z

cc @chitrangpatel @lcarva @wlynch

lcarva · 2023-07-06T21:10:17Z

Thanks for filing this feature request! I do disagree with it for various reasons that I have outlined below.

First, I do agree with the current stance that everything in the pipeline contributes to the making of the software artifact, even tasks that have occurred after a binary was built. Consider a pipeline like this: clone -> scan-source -> build-push -> scan-image. I definitely care that the two scan tasks succeed. I consider the software artifact to be viable if the whole pipeline succeeded. Unfortunately, I don't think Tekton Pipelines supports optional Tasks. But you can get around this by extending the task definition (e.g. script) to not fail if cannot send a notification, for example. You can even emit a TaskRun result to indicate whether or not the non-critical task was successful. And this information will be available in the PipelineRun provenance which is handy.

Second, the main reason we added PipelineRun attestations is to provide the full picture of how the artifact was created. Removing parts of the PipelineRun is dangerous. What if a TaskRun deemed unimportant ran in parallel and modified the contents of the local git clone? I wouldn't want users to be responsible for determining which tasks are important - in fact, I wouldn't trust them to do so. I would also be concerned if we tried to do this automatically in Chains. Tekton Pipelines is quite flexible and there's a high chance Chains would get it wrong, if not initially then eventually.

Third, I don't think it's within the scope of Chains to slice and dice a PipelineRun. If the user truly wants their artifacts to be treated separately, they should use separate constructs. I don't know the status of Pipelines in Pipelines but that might be worth looking into. I'd prefer to add whatever construct is needed in Tekton Pipelines instead of implementing something that is Chains specific.

wlynch · 2023-07-07T15:07:47Z

Thanks for starting this thread @chuangw6! I agree PipelineRun provenance is far from perfect.

That said, general +1 to @lcarva!

For a PipelineRun, Chains is able to dive into individual taskruns' type hinting results without pipeline author having to surface task results to pipeline results.
(Current Chains: only recognize pipeline-level type hinting results for a PipelineRun.)

Long term I'd like for tektoncd/pipeline#6326 to be the answer here. I agree it's annoying and brittle today to need to force users to have to propagate these values back up. I don't think results are the right medium for this, they just happen to be an easy way we could hook into without upstream Pipeline changes.

Within a pipelinerun, Chains should generate a provenance as soon as it sees an artifact is generated by a task even though some tasks post artifact generation fail, which as a result fails the whole pipelinerun.
(Current Chains: will not generate a provenance until individual taskruns and the pipelinerun succeed.)

Few things here:

PipelineRuns waiting for Tasks to succeed before publishing - This is somewhat intentional. There is a race condition when writing OCI layers, since the .att manifest that contains the list of attestation layers needs to be updated altogether. We wait on TaskRuns to succeed before publishing the PipelineRun attestations to reduce the chance of collision, since these are likely to write to the same artifact.

Open to improvements here though! We've been waiting for Proposal: Document ETag and Precondition Semantics opencontainers/distribution-spec#250, but a blocking queue across reconcile threads would also help here.
PipelineRuns not publishing provenance on failure - Not sure if this was intentional. This sounds like a valid use case though 🤔 We would need to make sure we're capturing the status though so we can distinguish whether everything succeeded though. We may still need to wait for the Pipeline to complete before we try and publish the attestation though.

Per-artifact Provenance instead of Per-run Provenance
(Current Chains: generate multiple provenances with the exact same content, and those provenances are associated with different artifacts being generated from that Run)

I do think the idea of trying to better model the actual used inputs/outputs is interesting, but probably needs some more exploration / experimentation before we agree to accept. I agree with @lcarva that this is probably a fairly tricky problem due to the flexibility of Pipelines (but don't let that deter you from experimenting!)

This sounds like a graph reachability problem based on provenance inputs/outputs. While I don't think we could/should strip out config information about the Task/Pipeline specs, using something like this to inform what provenance we publish to what subjects seems interesting. 🤔

Thoughts off the top of my head:

Shared resources like workspaces would create dependency relationships that you'd need to model.
What happens if multiple branches in the Pipeline claim they produce the same artifact?
How does this impact spec verification to ensure the spec hasn't been tampered with (hopefully it shouldn't)
The case @lcarva mentioned of "I want to make sure all of these checks ran in my Pipeline" probably doesn't mesh well with this, and is something we want to make sure we can keep. I agree Pipelines-in-Pipelines might be a better fit here (also Step CRD, which would allow you to do a similar thing but w/ Tasks).

chuangw6 · 2023-07-13T19:54:03Z

Thanks all for sharing the thoughts here!!

Agree that there are so much ambiguities in feature 2 & 3. It's worth noting that feature 3 is sort of derived from feature 2 to address the question: what amount of information we should include when we generate provenance as soon as an artifact was built. (So for feature 3, we might need to rethink what amount of information we should/need to include if we want to pursue 2).

But for feature 1, RE @wlynch point

Long term I'd like for tektoncd/pipeline#6326 to be the answer here. I agree it's annoying and brittle today to need to force users to have to propagate these values back up. I don't think results are the right medium for this, they just happen to be an easy way we could hook into without upstream Pipeline changes.

IMHO, it would be still beneficial and would be a plus to Chains v1 release to have Chains to dive into task level results to find type hinting instead of expecting authors to manually surface results to pipeline level. 2 reasons:

It's a step in the direction we want to pursue to replace type hinting approach. Regardless of which approach we ultimately adopt, the task is the level/block that generates artifacts. Therefore, I believe we would end up doing things at the task level instead of expecting authors to do things at the pipeline level. For example, Feature: Task provenance output pipeline#6326 proposes tasks to output structured provenance data. So feature 1 is not going the opposite.
Better user experience. Even with the current type hinting approach, it is the task (i.e., git-clone, and kaniko catalog tasks) that expects type hinting parameters and produces type hinting results. The names of Params/Results on the pipeline level should not matter as long as authors pipe the right data into the tasks. Therefore, enabling Chains to dive into task level to find type hinting results would be a better user experience and a less error-prone approach. This way, pipeline authors don't need to worry about the rules when defining their pipelines as long as they pull in right tasks.

wlynch · 2023-07-13T21:11:05Z

Got it, so the idea is we would just take a union of all the Task provenance and promote that up as the Pipeline provenance? 🤔

chuangw6 · 2023-07-13T21:38:03Z

I am not sure if I perceive the term union of Task Provenance same way as you use :D Please let me know if the following explanation is aligned with your understanding. Happy to meet to sync up!

The idea is to take the union of all Tasks' results and use them for the subjects and materials/resolvedDependencies in the pipeline level provenance, rather than checking results on the pipeline level.

Current: Pipeline has to define some type hinting results on the pipeline level and populate the values with taskrun results in order to let chains understand what artifacts were generated in the pipeline.
Proposed: Pipeline author doesn't need to define those type hinting results on the pipeline level. Instead, Chains looks into each individual taskruns' results and understand what was used/generated.

lcarva · 2023-07-14T13:03:00Z

The idea is to take the union of all Tasks' results and use them for the subjects and materials/resolvedDependencies in the pipeline level provenance, rather than checking results on the pipeline level.

When we implemented PipelineRun attestations, this was the initial idea. But we decided against it to keep things simpler on the implementation side. Also, the argument was that Tasks are likely to be reused way more often than Pipelines. So adding a specially named result to a Pipeline could make pretty much any Task compatible with Chains.

+1 to revisit this. Not having to "promote" the result from the Task to the Pipeline is a nice feature. I think it also makes it easier to reduce some blind spots in the SLSA Provenance.

wlynch · 2023-07-14T14:48:03Z

@chuangw6 I think we're on the same page. By union I mean set union, so the Pipeline provenance would include the aggregation of all Task provenance.

As long as there isn't ambiguity around what would be included that seems reasonable to me. 👍

chuangw6 · 2023-07-14T14:51:25Z

/assign

Related feature 1 in tektoncd#850 Prior, Chains only looks for pipeline results to understand what artifacts were generated in a pipeline. That means pipeline authors need to name pipeline results in the type hinting way and propagate its value with individual TaskRun results. Now, Chains is able to dive into individual TaskRun results to understand what artifacts were generated throughout a pipeline. This way, pipeline authors no longer need to worry about the rules when writting a pipeline as long as they pull in right tasks that produce type hinting results. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 1/2 of tektoncd#850 Prior, Chains only looks for pipeline results to understand what artifacts were generated in a pipeline. That means pipeline authors need to name pipeline results in the type hinting way and propagate its value with individual TaskRun results. Now, Chains is able to dive into individual TaskRun results to understand what artifacts were generated throughout a pipeline. This way, pipeline authors no longer need to worry about the rules when writting a pipeline as long as they pull in right tasks that produce type hinting results. That said, the old behaviour - observing pipeline level results is reserved by introducing a configmap field `artifacts.pipelinerun.observe-mode` which allows configuring how chains observes the outputs. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 1/2 of tektoncd#850 Prior, Chains only looks for pipeline results to understand what artifacts were generated in a pipeline. That means pipeline authors need to propagate child TaskRun results to pipeline level and name the pipeline results in type hinting way even though the pulled tasks already produce type hinting results. Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection` to allow Chains to inspect both pipeline results and child task results to understand what artifacts were generated throughout a pipeline. This way, pipeline authors no longer need to worry about the rules when writting a pipeline as long as they pull in right tasks that produce type hinting results. Meantime, this also gives users ability to propagate task results to pipeline level if the tasks they referenced do not produce type hinting results. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 1/2 of tektoncd#850 Prior, Chains only looks for pipeline results to understand what artifacts were generated in a pipeline. That means pipeline authors need to propagate child TaskRun results to pipeline level and name the pipeline results in type hinting way even though the pulled tasks already produce type hinting results. Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection` to allow Chains to inspect both pipeline results and child task results to understand what artifacts were generated throughout a pipeline. This way, pipeline authors no longer need to worry about the rules when writting a pipeline as long as they pull in right tasks that produce type hinting results. That said, users still have ability to propagate task results to pipeline level if the tasks they referenced do not produce type hinting results. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 1/2 of #850 Prior, Chains only looks for pipeline results to understand what artifacts were generated in a pipeline. That means pipeline authors need to propagate child TaskRun results to pipeline level and name the pipeline results in type hinting way even though the pulled tasks already produce type hinting results. Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection` to allow Chains to inspect both pipeline results and child task results to understand what artifacts were generated throughout a pipeline. This way, pipeline authors no longer need to worry about the rules when writting a pipeline as long as they pull in right tasks that produce type hinting results. That said, users still have ability to propagate task results to pipeline level if the tasks they referenced do not produce type hinting results. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 2/2 of tektoncd#850 Add deep inspection for materials, which will be applied to both slsa v0.2 and slsa v1.0 provenance. Signed-off-by: Chuang Wang <chuangw@google.com>

Step 2/2 of #850 Add deep inspection for materials, which will be applied to both slsa v0.2 and slsa v1.0 provenance. Signed-off-by: Chuang Wang <chuangw@google.com>

chuangw6 · 2023-09-20T17:44:25Z

/close

Close this ticket as the feature 1 (type hinting improvement) described in the original description was done, and feature 2&3 may or may not be considered in the future, but not at the moment.

Specifically, feature 1 was completed by the 2 main PRs

and some other related/derivative PRs

tekton-robot · 2023-09-20T17:44:27Z

@chuangw6: Closing this issue.

In response to this:

/close

Close this ticket as the feature 1 (type hinting improvement) described in the original description was done, and feature 2&3 may or may not be considered in the future, but not at the moment.

Specifically, feature 1 was completed by the 2 main PRs

Smarter Chains: check taskrun level results for Subjects #866

Deep inspection for materials #903

and some other related/derivative PRs

Pass slsa-related config via explicit struct instead of separate args #885

Add slsa compare options #887

Clean up Subjects compare #891

Fix SPDX format function #904

Add user-facing doc "SLSA Provenance" #913

Handle duplicates in subjects and materials consistently #926

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chuangw6 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 6, 2023

tekton-robot assigned chuangw6 Jul 14, 2023

chuangw6 mentioned this issue Jul 18, 2023

Smarter Chains: check taskrun level results for Subjects #866

Merged

4 tasks

chuangw6 mentioned this issue Aug 21, 2023

Deep inspection for materials #903

Merged

5 tasks

tekton-robot pushed a commit that referenced this issue Aug 24, 2023

Deep inspection for materials (#903)

de28e92

Step 2/2 of #850 Add deep inspection for materials, which will be applied to both slsa v0.2 and slsa v1.0 provenance. Signed-off-by: Chuang Wang <chuangw@google.com>

tekton-robot closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement on Pipeline level provenance #850

Enhancement on Pipeline level provenance #850

chuangw6 commented Jul 6, 2023 •

edited

Loading

chuangw6 commented Jul 6, 2023

chuangw6 commented Jul 6, 2023

lcarva commented Jul 6, 2023

wlynch commented Jul 7, 2023

chuangw6 commented Jul 13, 2023 •

edited

Loading

wlynch commented Jul 13, 2023

chuangw6 commented Jul 13, 2023 •

edited

Loading

lcarva commented Jul 14, 2023

wlynch commented Jul 14, 2023

chuangw6 commented Jul 14, 2023

chuangw6 commented Sep 20, 2023

tekton-robot commented Sep 20, 2023

Enhancement on Pipeline level provenance #850

Enhancement on Pipeline level provenance #850

Comments

chuangw6 commented Jul 6, 2023 • edited Loading

Feature request

Use case

chuangw6 commented Jul 6, 2023

chuangw6 commented Jul 6, 2023

lcarva commented Jul 6, 2023

wlynch commented Jul 7, 2023

chuangw6 commented Jul 13, 2023 • edited Loading

wlynch commented Jul 13, 2023

chuangw6 commented Jul 13, 2023 • edited Loading

lcarva commented Jul 14, 2023

wlynch commented Jul 14, 2023

chuangw6 commented Jul 14, 2023

chuangw6 commented Sep 20, 2023

tekton-robot commented Sep 20, 2023

chuangw6 commented Jul 6, 2023 •

edited

Loading

chuangw6 commented Jul 13, 2023 •

edited

Loading

chuangw6 commented Jul 13, 2023 •

edited

Loading