Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Make PipelineResourceResult generic #1392

Closed
dlorenc opened this issue Oct 8, 2019 · 7 comments
Closed

Proposal: Make PipelineResourceResult generic #1392

dlorenc opened this issue Oct 8, 2019 · 7 comments

Comments

@dlorenc
Copy link
Contributor

dlorenc commented Oct 8, 2019

Today the TaskRunStatus object contains a field called PipelineResourceResult which is used to record the digest of a built container image output.

I propose we make this object more generic, to handle recording information for all types of resources.

Today's spec:

type PipelineResourceResult struct {
	Name        string `json:"name"`
	Digest      string `json:"digest"`
}

Desired Information

We would like to include all the information required to audit a pipelinerun or taskrun later. This varies per resource:

  • Git Resource
    • Commit SHA used, in case of a tag
  • Image Resource
    • Image digest, in case of a tag
  • Cluster Resource
    • Not sure?
  • GCS Resource
  • PullRequest Resource
    • Timestamps?
  • CloudEvent Resource
    • Not sure?

In a future with extensible resources, this spec should be flexible enough to support a wide array of information. In general, more is better for auditing.

Note: the data contained in the PipelineResource CRD instance itself is insufficient for full auditing in many cases. Consider a git repository specified by tag or branch. At the time of TaskRun execution, this is resolved to an exact commit digest. The commit digest may be needed later for reference, but would not be contained in the PipelineResource CRD instance.

Proposed spec:

We propose this spec:

type PipelineResourceResult struct {
	ResourceRef PipelineResourceRef
	Key        string `json:"key"`
	Value      string `json:"value"`
}

Similar to how image digests are parsed out of container logs, we will need to define a mechanism for resources to write this information back as well.

@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 8, 2019

In order to do this backwards-compatibly, I think we would basically add both versions for a release:

type PipelineResourceResult struct {
	ResourceRef PipelineResourceRef
	Key        string `json:"key"`
	Value      string `json:"value"`
         Name        string `json:"name"`
	Digest      string `json:"digest"`
}

Then drop name/digest later.

@bobcatfish
Copy link
Collaborator

Another idea could maybe be to dump the entire Resource as-is into the result? For inputs it would be the Resource as provided, for outputs it would be (in an immutable world) the Resource that will be created.

Pro: Resources dont need to specify in any way what to include
Con: Less readable, more stuff in status that ppl might not care about

@bobcatfish bobcatfish added this to the Pipelines 0.9 🐱 milestone Oct 8, 2019
@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 8, 2019

Another idea could maybe be to dump the entire Resource as-is into the result? For inputs it would be the Resource as provided, for outputs it would be (in an immutable world) the Resource that will be created.

Added a note on why Resources may not contain enough information in the initial proposal.

@dlorenc
Copy link
Contributor Author

dlorenc commented Oct 8, 2019

@bobcatfish had another proposal from chat:

maybe a resource could put that info in its own status?
ive thought for a while now that resources should get a status
and the taskrun status would contain the entire resource, spec + status

I think in this case, the extra data around the exact contents that have been resolved would be written to the Resource CRD instance's status section (which is empty today), then the TaskRun would contain this full entity.

I think this could work, but some possible problems:

  • contention/sharing of input resources
    • If a single input resource (say a git repo at branch master) is reused for a long period of time as an input to many pipelineruns, each "access" of it would be logged to the status. These would build up forever until a size limit gets hit.
    • If multiple pipelineruns use the same input resource in parallel, there could be contention "appending" to this status object.

I think (in my mental model anyway), PipelineResource objects are stateless - and they have no status today which makes sense. They are simply pointers to actual, physical systems. They take on shape when they are "resolved" or "dereferenced" by a TaskRun. I could see storing this information in a place other than a TaskRun, but it is important that the information be stored at TaskRun time.

@bobcatfish
Copy link
Collaborator

If a single input resource (say a git repo at branch master) is reused for a long period of time as an input to many pipelineruns, each "access" of it would be logged to the status. These would build up forever until a size limit gets hit.

I think this highlights that reusing a PipelineResource is kind of weird - since #215 I've been in favor of defaulting to embedding, in which case a PipelineResource instance could be ephemeral, and the lifecycle managed by the pipeline controller. At the moment no backing instance is created at all, but it could be!

If multiple pipelineruns use the same input resource in parallel, there could be contention "appending" to this status object.

One idea here is to have PipelineResources go through an (idempotent) "setup" phase where they could update their own status before being used (much like the volume resource in its current form will need to create the backing PVC before executing if not provided).

I think (in my mental model anyway), PipelineResource objects are stateless - and they have no status today which makes sense. They are simply pointers to actual, physical systems. They take on shape when they are "resolved" or "dereferenced" by a TaskRun.

I can get behind this too! A different way of looking at PipelineResources I suppose. One upside to making them a bit more "stateful" would be being add features like caching - e.g. pre-fetching a git resource before execution ever starts. I'm just hand waving tho.

@bobcatfish
Copy link
Collaborator

Talked about this some more with @dlorenc - if we want to support reusing a PipelineResource, then there is no practical way to limit the number of Runs that might try to update its status, so appending data like this to the status is not a great design.

I also got very fixated on the idea that if you are using a PipelineResource such as type git and you specify a commitish such as "master" that actually resolves to something else at runtime, the behavior for users is probably not going to be what they expect - every Task in a Pipeline could end up using a different actual commit. I was trying to solve that at the same time (by putting the resolved commit into the status of the PipelineResource) but that's actually a separate issue we can tackle on its own (#1401).

So anyway long story short I am 👍 to @dlorenc 's proposal to add this to the TaskRun status.

@bobcatfish
Copy link
Collaborator

I think this is completed via #1406 but plz reopen if not!

vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this issue Feb 6, 2020
The `Name` and `Digest` field have been deprecated in a previous
release. This removes the fields from the struct. See
tektoncd#1392 for the inital issue.

It also reduce code duplication by aliasing some types between
v1alpha1 and v1alpha2.

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this issue Feb 6, 2020
The `Name` and `Digest` field have been deprecated in a previous
release. This removes the fields from the struct. See
tektoncd#1392 for the inital issue.

It also reduce code duplication by aliasing some types between
v1alpha1 and v1alpha2.

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
vdemeester added a commit to vdemeester/tektoncd-pipeline that referenced this issue Feb 7, 2020
The `Name` and `Digest` field have been deprecated in a previous
release. This removes the fields from the struct. See
tektoncd#1392 for the inital issue.

It also reduce code duplication by aliasing some types between
v1alpha1 and v1alpha2.

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
tekton-robot pushed a commit that referenced this issue Feb 7, 2020
The `Name` and `Digest` field have been deprecated in a previous
release. This removes the fields from the struct. See
#1392 for the inital issue.

It also reduce code duplication by aliasing some types between
v1alpha1 and v1alpha2.

Signed-off-by: Vincent Demeester <vdemeest@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants