Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using CronJobs to automatically clean up completed Runs #479

Closed
ghost opened this issue Feb 26, 2020 · 16 comments · Fixed by #626
Closed

Using CronJobs to automatically clean up completed Runs #479

ghost opened this issue Feb 26, 2020 · 16 comments · Fixed by #626
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@ghost
Copy link

ghost commented Feb 26, 2020

Expected Behavior

Create a new tool that uses CronJob objects to schedule the cleanup of completed TaskRuns and PipelineRuns.

This could have use in our own dogfooding and also provide the community with help managing their own completed runs.

In a prior PR we explored the idea of using a TTL on runs and leveraging the Kubernetes TTL Controller to help clean them up. During that review process a user suggested CronJobs as an alternative to baking this TTL support directly into the Tekton Pipelines controller.

Actual Behavior

We don't currently have any way to automatically clean up completed TaskRuns and PipelineRuns but we definitely hear feedback that some kind of tooling or guidance would be very useful.

@assertion
Copy link

We have the same requirement to delete the completed resources automatically. ( Status may need to be considered, for example Failed PipelineRun should be deleted after a longer time than the Succeeded PipelineRun).

(Actually, now we delete these resources by our apiServer above Tekton through monioring the pipelineRun related events and when it's finished, we'll delete the succeeded pipelineRuns after a short time, and delete the failed pipelineRuns after a longer time ).

@jlpettersson
Copy link
Member

I'll have a look into this.

/assign

@k
Copy link

k commented May 28, 2020

@jlpettersson any update on this? Right now I have a pretty janky kubectl command to delete old pipeline runs lol

@jlpettersson
Copy link
Member

@jlpettersson any update on this? Right now I have a pretty janky kubectl command to delete old pipeline runs lol

It should not be much more job. Give me a few days.

@mattmoor
Copy link
Member

In a prior PR we explored the idea of using a TTL on runs and leveraging the Kubernetes TTL Controller to help clean them up. During that review process a user suggested CronJobs as an alternative to baking this TTL support directly into the Tekton Pipelines controller.

🤔 Seems like following the idioms established for K8s Jobs would be somewhat prudent. If we were to establish a "ttl" duck type (e.g. for how this is embedded into specs) combined with the use of a Succeeded condition, then you could write a shareable meta controller that handles this for all types.

cc @n3wscott (this would benefit from the ideas in our last Kubecon talk)

@ghost
Copy link
Author

ghost commented Jun 25, 2020

@afrittoli has implemented this kind of pruning behaviour in our dogfooding cluster now 🎉 🎉 . That work could help inform this issue. Here's the PR where his changes were added: tektoncd/plumbing#442

@tekton-robot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 15, 2020
@ghost
Copy link
Author

ghost commented Aug 17, 2020

/reopen
/remove-lifecycle stale
/remove-lifecycle rotten
/freeze

@tekton-robot tekton-robot reopened this Aug 17, 2020
@tekton-robot
Copy link

@sbwsg: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle stale
/remove-lifecycle rotten
/freeze

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 17, 2020
@ghost
Copy link
Author

ghost commented Aug 17, 2020

I'm keeping this issue open as it's a feature area that is still seeing semi-regular community requests.

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 17, 2020
@psschwei
Copy link
Contributor

@sbwsg you're thinking that this would just be a sample cronjob (and related resources) that users could apply to their own clusters, correct (similar to what @afrittoli did on the dogfooding cluster)? Perhaps under the examples/ directory.

Assuming that's the case...
/assign

@ghost
Copy link
Author

ghost commented Aug 31, 2020

Great! Documenting this as part of Pipelines would also be really useful.

@jlpettersson
Copy link
Member

This CronJob was shared by Tekton twitter account: https://gist.github.com/ctron/4764c0c4c4ea0b22353f2a23941928ad

@raelga
Copy link

raelga commented Oct 7, 2021

This CronJob was shared by Tekton twitter account: https://gist.github.com/ctron/4764c0c4c4ea0b22353f2a23941928ad

An evolution of that CronJob, to keeping up to NUM_TO_KEEP of each Pipeline.

...
- name: kubectl
  image: docker.io/alpine/k8s:1.20.7
  env:
    - name: NUM_TO_KEEP
      value: "3"
  command:
    - /bin/bash
    - -c
    - >
      while read -r PIPELINE; do
        while read -r PIPELINE_TO_REMOVE; do
          test -n "${PIPELINE_TO_REMOVE}" || continue;
          kubectl delete ${PIPELINE_TO_REMOVE} \
              && echo "$(date -Is) PipelineRun ${PIPELINE_TO_REMOVE} deleted." \
              || echo "$(date -Is) Unable to delete PipelineRun ${PIPELINE_TO_REMOVE}.";
        done < <(kubectl get pipelinerun -l tekton.dev/pipeline=${PIPELINE} --sort-by=.metadata.creationTimestamp -o name | head -n -${NUM_TO_KEEP});
      done < <(kubectl get pipelinerun -o go-template='{{range .items}}{{index .metadata.labels "tekton.dev/pipeline"}}{{"\n"}}{{end}}' | uniq);

Full example with rbac at
https://gist.github.com/raelga/e75e6de4fd04be60f267128e985bde6d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants