Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources, PVCs and Chaining #1076

Closed
dlorenc opened this issue Jul 15, 2019 · 10 comments
Closed

Resources, PVCs and Chaining #1076

dlorenc opened this issue Jul 15, 2019 · 10 comments
Assignees
Labels
Epic Issues that should be considered as Epics (aka multiple sub-tasks, …)

Comments

@dlorenc
Copy link
Contributor

dlorenc commented Jul 15, 2019

In the debugging of #937 and the subsequent rollback of our failed fix attempt in #1068, @bobcatfish and I did a bit of a deep dive into the current state of the Resource code to try to understand the current behavior and decide on the right fix.

We have some ideas on what we'd like to do to fix the issue, but I want to open this up early to track our initial findings and open up the conversation on next steps.

Current State

Tekton Pipeline provides several mechanisms for passing artifacts between Tasks within the same pipeline. This doesn't have a formal name in the Tekton architecture, but the logic is mostly encapsulated in the ArtifactStorageInterface.

Two implementations exist for this interface today:

  • ArtifactPVC
    • This creates a PVC inside the cluster and stores artifacts here.
  • ArtifactBucket
    • This uses a configured storage bucket (today just GCS) and stores artifacts here.

The artifacts that can be stored in this system are produced by the OutputResource system. Two types of resources can be used today: Storage and Git.

Additionally, this system is only used when a Task is bound to a Pipeline.

Before a PipelineRun is started, the artifact storage system is initialized, whether or not it is actually needed.

When each task runs that contains an output resource of one of the supported types, a PVC claim is injected into the pod along with Steps to copy/make the resources available inside the Task containers.

Problems

This existing system has a few problems:

  • The initial bug. PVCs are created and bound whether or not they are actually needed. For example, a pipeline with no output resources that can be stored does not need this PVC, but must wait for its creation.
  • The Git resource does not conceptually match the intended use of output resources. It is currently used to transmit state changes on a directory between Tasks, but the more general Storage resource should be used here instead.
  • Resources in general should represent reconciliation between a system in the outside world and a filesystem injected into a Tekton Task.
  • Input Resources perform a one-way sync, mapping an outside system to a filesystem
  • Output Resources perform a one-way reconciliation between a filesystem and an external system. Changes made to it need to be mapped back to the external system.

Most of our resources work this way today:

  • The PullRequest resource maps metadata about a PR to disk, and then back to Github
  • The Cluster resource maps auth and endpoint data from a k8s cluster to a canonical file used by the k8s ecosystem to address that cluster
  • The StorageResource provides a way to upload and download files from a GCS bucket or PVC

There are however, a few exceptions:

  • There is no canonical and efficient way to represent a container image on disk, so we files that are logically pointers instead to refer to images in a registry.
  • The Git Resource works correctly for inputs, but acts as an arbitrary directory for outputs. Arbitrary file changes are preserved and made available to downstream tasks, but nothing is ever synced back to a git remote.
  • TaskRuns should not need to know how they are being called conceptually. But today, they are responsible for knowing if they are being run inside a Pipeline and operating differently.

Christie and I talked through some ideas on how to fix this, and I'll write these up next.

@dlorenc
Copy link
Contributor Author

dlorenc commented Jul 17, 2019

OK, time for some suggestions!

Here's what we think we need to start fixing:

  • Fix the mental model of the git resource
    • It needs to support uploading back to git (features TBD)
  • Add a directory or fileset resource to encourage use of the existing replace the existing use-case of the git output
    • Add a PVC type to the storage resource
  • We need to fix the logic around what happens when things are both an input and an ouput.
    • Proposal: Get rid of the /workspace/ system
    • Replace it with /workspace/output/resource and /workspace/input/resource
  • Since outputs touch the outside world and can have side effects (by definition), we need a way to control the order in which they run

Question:

  • Does this let us get rid of the existing PVC/GCS copying that is not a resource?
    • We think it does.

This will probably be breaking, needs to be done across a few releases

@abayer
Copy link
Contributor

abayer commented Jul 17, 2019

@dlorenc #1087 seems like it may be germane. I opted to go with a new volume resource rather than adding it as an implementation of the storage resource, but we could definitely go down that road. The one thing I'd note in re: getting rid of chain-copying with this is that we'd need to have an option to not delete the PVC at the end of the run, at least to fit the JX use case for the volume resource as proposed.

@dlorenc
Copy link
Contributor Author

dlorenc commented Jul 22, 2019

Thanks for the explanation @abayer! So to be clear, the Volume resource would persist, but the Storage one would get deleted transparently?

@abayer
Copy link
Contributor

abayer commented Jul 22, 2019

In the PR, the new volume resource persists, the artifact-storage PVC does not, yes.

dlorenc added a commit to dlorenc/build-pipeline that referenced this issue Jul 22, 2019
The cluster type currently has no defined behavior as output resources, and
the git type behaves incorrectly. The new Volume type should be used instead
for the use-cases supported by the old Git resource.

This is part of the large cleanup in tektoncd#1076, and should not be submitted until
after tektoncd#1087.
@afrittoli
Copy link
Member

Christie and I talked through some ideas on how to fix this, and I'll write these up next.

Thanks for the nice write-up, it's really helpful!

In general it makes sense to me; I wonder if we need to restrict the output of and input resource / input of a an output resource to a filesystem.
Resources, (along with input params), define the interface of a Task, and I think that's their most defining feature. That should make it possible to do things like swap a Task for another one that matches the same interface.

@dlorenc
Copy link
Contributor Author

dlorenc commented Jul 25, 2019

In general it makes sense to me; I wonder if we need to restrict the output of and input resource / input of a an output resource to a filesystem.

I'm not sure I understand, can you explain a bit more? Basically disallow things to push to outside services directly from inside a Task?

@afrittoli
Copy link
Member

Heh, sorry, I was trying to say something simple in a very convoluted way :P

I was commenting on this point you made:

Resources in general should represent reconciliation between a system in the outside world and a filesystem injected into a Tekton Task.

Do we need to restrict us to a "filesystem injected into a Tekton Task" or can we say that resource in general should represent reconciliation between a system in the outside world and a resource made available to a Tekton Task?

Resource could be an environment variable or a combination of stream in/stream out endpoints?

@bobcatfish
Copy link
Collaborator

TODO for next release: break this down into component issues, esp. since this will probably span releases.

@bobcatfish bobcatfish added the Epic Issues that should be considered as Epics (aka multiple sub-tasks, …) label Sep 5, 2019
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Sep 14, 2019
In tektoncd#1109 we will be removing support for git as an output. In the
current implementation, git as an output is just a volume that holds the
data from the git repo, and copies it between Tasks (when git as an
output is linked to git as an input). As discussed in tektoncd#1076, the model
we want for PipelineResources is for them to take the outside world and
represent it on disk when used as an input, and when used as an output,
to update the outside world. In order to do this, what we actually want for a
git output is for it to create a commit the repo it is referencing.
However up until this point folks have been using git resources in the
way that we want Volume Resources to behave tektoncd#1062, so we want to
transition folks to Volume Resources and away from using git outputs.

Fixes tektoncd#1283
@bobcatfish
Copy link
Collaborator

Latest design from @sbwsg to address confusion and bugs by re-designing PipelineResources: https://docs.google.com/document/d/1euQ_gDTe_dQcVeX4oypODGIQCAkUaMYQH5h7SaeFs44/edit#

@bobcatfish
Copy link
Collaborator

Closing this in favor of #1673

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic Issues that should be considered as Epics (aka multiple sub-tasks, …)
Projects
None yet
Development

No branches or pull requests

4 participants