Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement on Pipeline level provenance #850

Closed
chuangw6 opened this issue Jul 6, 2023 · 12 comments
Closed

Enhancement on Pipeline level provenance #850

chuangw6 opened this issue Jul 6, 2023 · 12 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@chuangw6
Copy link
Member

chuangw6 commented Jul 6, 2023

Feature request

  1. For a PipelineRun, Chains is able to dive into individual taskruns' type hinting results without pipeline author having to surface task results to pipeline results.
    (Current Chains: only recognize pipeline-level type hinting results for a PipelineRun.)

  2. Within a pipelinerun, Chains should generate a provenance as soon as it sees an artifact is generated by a task even though some tasks post artifact generation fail, which as a result fails the whole pipelinerun.
    (Current Chains: will not generate a provenance until individual taskruns and the pipelinerun succeed.)

  3. Per-artifact Provenance instead of Per-run Provenance
    (Current Chains: generate multiple provenances with the exact same content, and those provenances are associated with different artifacts being generated from that Run)
    Examples:

    • If a pipeline consists of tasks like clone -> build-push -> slack-notify, the provenance associated with the artifact should only contain the info of the tasks that contribute to the build process of the artifact, which would be clone and build-push in this case. In the 2nd feature request mentioned above, when slack-notify fails, the provenance should still be available for that artifact.
    • If a pipeline consists of 2 sets of tasks for producing 2 different artifacts, clone1 -> build-push-go (producing go artifact) and clone 2 -> build-push-oci (producing oci artifact), the provenance associated with "go artifact" should only contain information of tasks clone1 and build-push-go, whereas the provenance associated with "oci artifact" should only contain information of tasks clone2 and build-push-oci

Use case

  • Feature 1: better user experience and avoid errors that could happen when author propagates task results to pipeline results
  • Feature 2: the provenance associated with the artifact being generated in a pipeline is still available even though some post-artifact generation tasks i.e. slack notify or deploy task, or the tasks that have nothing related to building that artifact i.e. tasks for building other language artifacts failed. In particular, if we make provenance available before a deploy task, it would be helpful for some deployment gate mechanism which can do provenance verification and decide whether to proceed deployment.
  • Feature 3: "focused" provenance that only contain information of tasks that are related to the artifact generation is the real "per artifact" provenance.
@chuangw6 chuangw6 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 6, 2023
@chuangw6
Copy link
Member Author

chuangw6 commented Jul 6, 2023

Looking for feedback and thoughts! Thanks everyone.

@chuangw6
Copy link
Member Author

chuangw6 commented Jul 6, 2023

cc @chitrangpatel @lcarva @wlynch

@lcarva
Copy link
Contributor

lcarva commented Jul 6, 2023

Thanks for filing this feature request! I do disagree with it for various reasons that I have outlined below.

First, I do agree with the current stance that everything in the pipeline contributes to the making of the software artifact, even tasks that have occurred after a binary was built. Consider a pipeline like this: clone -> scan-source -> build-push -> scan-image. I definitely care that the two scan tasks succeed. I consider the software artifact to be viable if the whole pipeline succeeded. Unfortunately, I don't think Tekton Pipelines supports optional Tasks. But you can get around this by extending the task definition (e.g. script) to not fail if cannot send a notification, for example. You can even emit a TaskRun result to indicate whether or not the non-critical task was successful. And this information will be available in the PipelineRun provenance which is handy.

Second, the main reason we added PipelineRun attestations is to provide the full picture of how the artifact was created. Removing parts of the PipelineRun is dangerous. What if a TaskRun deemed unimportant ran in parallel and modified the contents of the local git clone? I wouldn't want users to be responsible for determining which tasks are important - in fact, I wouldn't trust them to do so. I would also be concerned if we tried to do this automatically in Chains. Tekton Pipelines is quite flexible and there's a high chance Chains would get it wrong, if not initially then eventually.

Third, I don't think it's within the scope of Chains to slice and dice a PipelineRun. If the user truly wants their artifacts to be treated separately, they should use separate constructs. I don't know the status of Pipelines in Pipelines but that might be worth looking into. I'd prefer to add whatever construct is needed in Tekton Pipelines instead of implementing something that is Chains specific.

@wlynch
Copy link
Member

wlynch commented Jul 7, 2023

Thanks for starting this thread @chuangw6! I agree PipelineRun provenance is far from perfect.

That said, general +1 to @lcarva!

For a PipelineRun, Chains is able to dive into individual taskruns' type hinting results without pipeline author having to surface task results to pipeline results.
(Current Chains: only recognize pipeline-level type hinting results for a PipelineRun.)

Long term I'd like for tektoncd/pipeline#6326 to be the answer here. I agree it's annoying and brittle today to need to force users to have to propagate these values back up. I don't think results are the right medium for this, they just happen to be an easy way we could hook into without upstream Pipeline changes.


Within a pipelinerun, Chains should generate a provenance as soon as it sees an artifact is generated by a task even though some tasks post artifact generation fail, which as a result fails the whole pipelinerun.
(Current Chains: will not generate a provenance until individual taskruns and the pipelinerun succeed.)

Few things here:

  1. PipelineRuns waiting for Tasks to succeed before publishing - This is somewhat intentional. There is a race condition when writing OCI layers, since the .att manifest that contains the list of attestation layers needs to be updated altogether. We wait on TaskRuns to succeed before publishing the PipelineRun attestations to reduce the chance of collision, since these are likely to write to the same artifact.

    Open to improvements here though! We've been waiting for Proposal: Document ETag and Precondition Semantics opencontainers/distribution-spec#250, but a blocking queue across reconcile threads would also help here.

  2. PipelineRuns not publishing provenance on failure - Not sure if this was intentional. This sounds like a valid use case though 🤔 We would need to make sure we're capturing the status though so we can distinguish whether everything succeeded though. We may still need to wait for the Pipeline to complete before we try and publish the attestation though.


Per-artifact Provenance instead of Per-run Provenance
(Current Chains: generate multiple provenances with the exact same content, and those provenances are associated with different artifacts being generated from that Run)

I do think the idea of trying to better model the actual used inputs/outputs is interesting, but probably needs some more exploration / experimentation before we agree to accept. I agree with @lcarva that this is probably a fairly tricky problem due to the flexibility of Pipelines (but don't let that deter you from experimenting!)

This sounds like a graph reachability problem based on provenance inputs/outputs. While I don't think we could/should strip out config information about the Task/Pipeline specs, using something like this to inform what provenance we publish to what subjects seems interesting. 🤔

Thoughts off the top of my head:

  • Shared resources like workspaces would create dependency relationships that you'd need to model.
  • What happens if multiple branches in the Pipeline claim they produce the same artifact?
  • How does this impact spec verification to ensure the spec hasn't been tampered with (hopefully it shouldn't)
  • The case @lcarva mentioned of "I want to make sure all of these checks ran in my Pipeline" probably doesn't mesh well with this, and is something we want to make sure we can keep. I agree Pipelines-in-Pipelines might be a better fit here (also Step CRD, which would allow you to do a similar thing but w/ Tasks).

@chuangw6
Copy link
Member Author

chuangw6 commented Jul 13, 2023

Thanks all for sharing the thoughts here!!

Agree that there are so much ambiguities in feature 2 & 3. It's worth noting that feature 3 is sort of derived from feature 2 to address the question: what amount of information we should include when we generate provenance as soon as an artifact was built. (So for feature 3, we might need to rethink what amount of information we should/need to include if we want to pursue 2).

But for feature 1, RE @wlynch point

Long term I'd like for tektoncd/pipeline#6326 to be the answer here. I agree it's annoying and brittle today to need to force users to have to propagate these values back up. I don't think results are the right medium for this, they just happen to be an easy way we could hook into without upstream Pipeline changes.

IMHO, it would be still beneficial and would be a plus to Chains v1 release to have Chains to dive into task level results to find type hinting instead of expecting authors to manually surface results to pipeline level. 2 reasons:

  • It's a step in the direction we want to pursue to replace type hinting approach. Regardless of which approach we ultimately adopt, the task is the level/block that generates artifacts. Therefore, I believe we would end up doing things at the task level instead of expecting authors to do things at the pipeline level. For example, Feature: Task provenance output pipeline#6326 proposes tasks to output structured provenance data. So feature 1 is not going the opposite.
  • Better user experience. Even with the current type hinting approach, it is the task (i.e., git-clone, and kaniko catalog tasks) that expects type hinting parameters and produces type hinting results. The names of Params/Results on the pipeline level should not matter as long as authors pipe the right data into the tasks. Therefore, enabling Chains to dive into task level to find type hinting results would be a better user experience and a less error-prone approach. This way, pipeline authors don't need to worry about the rules when defining their pipelines as long as they pull in right tasks.

@wlynch
Copy link
Member

wlynch commented Jul 13, 2023

Got it, so the idea is we would just take a union of all the Task provenance and promote that up as the Pipeline provenance? 🤔

@chuangw6
Copy link
Member Author

chuangw6 commented Jul 13, 2023

I am not sure if I perceive the term union of Task Provenance same way as you use :D Please let me know if the following explanation is aligned with your understanding. Happy to meet to sync up!

The idea is to take the union of all Tasks' results and use them for the subjects and materials/resolvedDependencies in the pipeline level provenance, rather than checking results on the pipeline level.

  • Current: Pipeline has to define some type hinting results on the pipeline level and populate the values with taskrun results in order to let chains understand what artifacts were generated in the pipeline.
  • Proposed: Pipeline author doesn't need to define those type hinting results on the pipeline level. Instead, Chains looks into each individual taskruns' results and understand what was used/generated.

@lcarva
Copy link
Contributor

lcarva commented Jul 14, 2023

The idea is to take the union of all Tasks' results and use them for the subjects and materials/resolvedDependencies in the pipeline level provenance, rather than checking results on the pipeline level.

When we implemented PipelineRun attestations, this was the initial idea. But we decided against it to keep things simpler on the implementation side. Also, the argument was that Tasks are likely to be reused way more often than Pipelines. So adding a specially named result to a Pipeline could make pretty much any Task compatible with Chains.

+1 to revisit this. Not having to "promote" the result from the Task to the Pipeline is a nice feature. I think it also makes it easier to reduce some blind spots in the SLSA Provenance.

@wlynch
Copy link
Member

wlynch commented Jul 14, 2023

@chuangw6 I think we're on the same page. By union I mean set union, so the Pipeline provenance would include the aggregation of all Task provenance.

As long as there isn't ambiguity around what would be included that seems reasonable to me. 👍

@chuangw6
Copy link
Member Author

/assign

chuangw6 added a commit to chuangw6/chains that referenced this issue Jul 18, 2023
Related feature 1 in tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Jul 31, 2023
Related feature 1 in tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 1, 2023
Related feature 1 in tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 1, 2023
Related feature 1 in tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 2, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

That said, the old behaviour - observing pipeline level results is reserved
by introducing a configmap field `artifacts.pipelinerun.observe-mode`
which allows configuring how chains observes the outputs.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 2, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

That said, the old behaviour - observing pipeline level results is reserved
by introducing a configmap field `artifacts.pipelinerun.observe-mode`
which allows configuring how chains observes the outputs.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 2, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

That said, the old behaviour - observing pipeline level results is reserved
by introducing a configmap field `artifacts.pipelinerun.observe-mode`
which allows configuring how chains observes the outputs.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 3, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to name pipeline results in the type hinting way and propagate its value
with individual TaskRun results.

Now, Chains is able to dive into individual TaskRun results to understand
what artifacts were generated throughout a pipeline. This way, pipeline
authors no longer need to worry about the rules when writting a pipeline
as long as they pull in right tasks that produce type hinting results.

That said, the old behaviour - observing pipeline level results is reserved
by introducing a configmap field `artifacts.pipelinerun.observe-mode`
which allows configuring how chains observes the outputs.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 9, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
Meantime, this also gives users ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 9, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
That said, users still have ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 10, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
That said, users still have ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 10, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
That said, users still have ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 10, 2023
Step 1/2 of tektoncd#850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
That said, users still have ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
tekton-robot pushed a commit that referenced this issue Aug 10, 2023
Step 1/2 of #850

Prior, Chains only looks for pipeline results to understand what
artifacts were generated in a pipeline. That means pipeline authors need
to propagate child TaskRun results to pipeline level and name the pipeline
results in type hinting way even though the pulled tasks already produce
type hinting results.

Now, we introduced a new configmap field `artifacts.pipelinerun.enable-deep-inspection`
to allow Chains to inspect both pipeline results and child task results
to understand what artifacts were generated throughout a pipeline.

This way, pipeline authors no longer need to worry about the rules when
writting a pipeline as long as they pull in right tasks that produce type hinting results.
That said, users still have ability to propagate task results to pipeline
level if the tasks they referenced do not produce type hinting results.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 21, 2023
Step 2/2 of tektoncd#850

Add deep inspection for materials, which will be applied to both slsa v0.2
and slsa v1.0 provenance.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 21, 2023
Step 2/2 of tektoncd#850

Add deep inspection for materials, which will be applied to both slsa v0.2
and slsa v1.0 provenance.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Aug 24, 2023
Step 2/2 of tektoncd#850

Add deep inspection for materials, which will be applied to both slsa v0.2
and slsa v1.0 provenance.

Signed-off-by: Chuang Wang <chuangw@google.com>
tekton-robot pushed a commit that referenced this issue Aug 24, 2023
Step 2/2 of #850

Add deep inspection for materials, which will be applied to both slsa v0.2
and slsa v1.0 provenance.

Signed-off-by: Chuang Wang <chuangw@google.com>
@chuangw6
Copy link
Member Author

/close

Close this ticket as the feature 1 (type hinting improvement) described in the original description was done, and feature 2&3 may or may not be considered in the future, but not at the moment.

Specifically, feature 1 was completed by the 2 main PRs

and some other related/derivative PRs

@tekton-robot
Copy link

@chuangw6: Closing this issue.

In response to this:

/close

Close this ticket as the feature 1 (type hinting improvement) described in the original description was done, and feature 2&3 may or may not be considered in the future, but not at the moment.

Specifically, feature 1 was completed by the 2 main PRs

and some other related/derivative PRs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants