Start measuring Tekton Pipelines performance #540

bobcatfish · 2019-02-21T01:59:37Z

Expected Behavior

We should be measuring performance for Pipelines. This task includes both adding the actual measurement mechanism and also the design re. what exactly we want to measurement.

Some ideas for measurement:

Null Task / null Pipeline (i.e. it doesnt actually do anything)
Null Tasks that have linked inputs outputs
Stress testing (make recommendations about cluster size)
?

Requirements

We should have a set of "happy SLOs" defined for Task and Pipeline execution
We should be regularly measuring these SLOs
Maintainers should be made aware when we are in violation of these SLOs

Actual Behavior

We do not measure or track this.

Additional Info

Other knative projects use this https://testgrid-dot-knative-tests.appspot.com/knative-build#latency
Note that the metrics collector being used by these projects is global and our end to end tests run in parallel, so the metrics will not work as is

pradeepitm12 · 2019-03-29T11:57:55Z

Hello @bobcatfish
Need your thoughts on this.
1- A service outside of tekton that watches tekton object and expose it to prometheus.
2- Introduce an endpoint in the tekton pipeline itself to expose all the metric to Prometheus.

bobcatfish · 2019-04-19T17:17:00Z

My gut feeling is that I'd lean more toward exposing the metrics from Pipelines itself:

2- Introduce an endpoint in the tekton pipeline itself to expose all the metric to Prometheus.

Question: I'm not super familair with Prometheus, how vital would it be to making metrics usable? Could we simply emit the metrics, and allow the user to provide their own metrics gathering mechanism (which could be prometheus but could be something else), or would it make more sense for us to include Prometheus out of the box? (I've very sensitive to adding new dependencies, esp. since I'm under the impression that managing Prometheus is a job in itself, but maybe I'm wrong!)

Another option, which I think is a variation on your first suggestion @pradeepitm12 :
3 - (For now) only measure the performance in tests we write specifically for this purpose (i.e. we don't expose anything new for users of Tekton Pipelines, but we start doing our own measurements)

rawlingsj · 2019-05-17T17:03:11Z

+1 we're looking at the same thing and just started looking at prometheus too, hopefully we can help each other out here.

📈

bobcatfish · 2019-05-18T00:22:41Z

+1 we're looking at the same thing and just started looking at prometheus too, hopefully we can help each other out here.

Maybe the first thing to do would be to identify the metrics we're interested in? I'm not super familiar with prometheus but I would think before we want to monitor the metrics, we'd want to figure out what needs monitoring (maybe there's a Jenkins/Jenkins X precedent we can draw on :D?)

ghost · 2019-09-19T17:24:48Z

We had our first meeting regarding observability, specifically metrics, today and work is now underway. There are a couple of other issues that overlap in theme with this one. I am linking them together here for us to review later and figure out which to keep and which to close.

Related issues:
#164
#540
#855

Metrics Design Doc

Notes from the initial metrics meeting

Often, as a developer or administartor(ops) I want some insights about pipeline behavior in terms time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limted ways to surface such information or its hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164

Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - tektoncd#540 - tektoncd#164

Often, as a developer or administrator(ops) I want some insights about pipeline behavior in terms of time taken to execute pipleinerun/taskrun, its success or failure ratio, pod latencies etc. At present tekton pipelines has very limited ways to surface such information or it's hard to get those details looking at resources yamls. This patch exposes above mentioned pipelines metrics on '/metrics' endpoint using knative `pkg/metrics` package. User can collect such metrics using prometheus, stackdriver or other supported metrics system. To some extent its solves - #540 - #164

tekton-robot · 2020-08-12T18:16:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.

/lifecycle stale

Send feedback to tektoncd/plumbing.

bobcatfish · 2020-08-12T20:18:31Z

We haven't worked on this lately but it is an item in our roadmap and I think we should keep it open.

/lifecycle frozen

Use patch -p1 instead of git am to apply patch

bobcatfish · 2020-11-11T17:29:14Z

I want to start gathering some requirements around this and get it moving :D

bobcatfish · 2020-11-13T21:28:19Z

#3521 has some use cases that we might be able use

This PR starts a TEP to begin to measure tekton pipelines performance and address tektoncd/pipeline#540 This first iteration just tries to describe the problem vs suggesting the solution. It DOES recommend measuring SLOs and SLIs as a goal, which is kind of part of the solution, so if we think it's useful we could step back even further, but I think this is a reasonable path forward, curious what other folks think!

mengjieli0726 · 2021-03-08T02:38:15Z

@bobcatfish, any tektone performance white paper have? as so far, how many pipeline run or run we can support in middle cluster (just like: 1 master + 1 compute node.)
the node spec: 8 core + 64 G memory + 250 G disk.

bobcatfish added design This task is about creating and discussing a design meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given labels Feb 21, 2019

bobcatfish added this to the Pipelines 0.2 🎉 🎉 🎉 milestone Feb 21, 2019

bobcatfish added the okr This is for some internal Google project tracking label Feb 21, 2019

bobcatfish removed this from the Pipelines 0.2 🎉 🎉 🎉 milestone Apr 25, 2019

This was referenced Sep 19, 2019

Tracking pipeline and task execution time as well as controller time #164

Closed

Observability aspects of CD pipelines #855

Closed

hrishin mentioned this issue Oct 7, 2019

Adds pipeline metrics 🔭 #1387

Merged

3 tasks

bobcatfish added this to Needs triage in Tekton Pipelines Feb 26, 2020

vdemeester moved this from Needs triage to Roadmap in Tekton Pipelines Mar 16, 2020

jlpettersson mentioned this issue Jun 15, 2020

Instrument Tekton resources for tracing #2814

Closed

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2020

tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 12, 2020

bobcatfish added the area/roadmap Issues that are part of the project (or organization) roadmap (usually an epic) label Aug 24, 2020

chmouel pushed a commit to chmouel/tektoncd-pipeline that referenced this issue Oct 7, 2020

Merge pull request tektoncd#540 from chmouel/update-apply-patch

8bd6b79

Use patch -p1 instead of git am to apply patch

bobcatfish self-assigned this Nov 11, 2020

bobcatfish mentioned this issue Nov 12, 2020

Setup monitoring components for infra clusters tektoncd/plumbing#235

Open

bobcatfish mentioned this issue Nov 20, 2020

TEP-0036: Start measuring Pipelines performance tektoncd/community#277

Merged

bobcatfish removed their assignment Jan 4, 2021

jerop added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 20, 2021

afrittoli added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 17, 2021

This was referenced Apr 12, 2023

TEP-0036: Start Measuring Tekton Pipelines Performance tektoncd/community#602

Closed

Optimize Tekton Pipeline #5452

Open

afrittoli mentioned this issue Jul 25, 2023

Regression testing on a scheduled duration #6969

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start measuring Tekton Pipelines performance #540

Start measuring Tekton Pipelines performance #540

bobcatfish commented Feb 21, 2019 •

edited

Loading

pradeepitm12 commented Mar 29, 2019

bobcatfish commented Apr 19, 2019

rawlingsj commented May 17, 2019 •

edited

Loading

bobcatfish commented May 18, 2019

ghost commented Sep 19, 2019

tekton-robot commented Aug 12, 2020

bobcatfish commented Aug 12, 2020

bobcatfish commented Nov 11, 2020

bobcatfish commented Nov 13, 2020

mengjieli0726 commented Mar 8, 2021

Start measuring Tekton Pipelines performance #540

Start measuring Tekton Pipelines performance #540

Comments

bobcatfish commented Feb 21, 2019 • edited Loading

Expected Behavior

Requirements

Actual Behavior

Additional Info

pradeepitm12 commented Mar 29, 2019

bobcatfish commented Apr 19, 2019

rawlingsj commented May 17, 2019 • edited Loading

bobcatfish commented May 18, 2019

ghost commented Sep 19, 2019

tekton-robot commented Aug 12, 2020

bobcatfish commented Aug 12, 2020

bobcatfish commented Nov 11, 2020

bobcatfish commented Nov 13, 2020

mengjieli0726 commented Mar 8, 2021

bobcatfish commented Feb 21, 2019 •

edited

Loading

rawlingsj commented May 17, 2019 •

edited

Loading