Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chains should expose some more detailed metrics #1028

Open
jhutar opened this issue Jan 9, 2024 · 11 comments
Open

Chains should expose some more detailed metrics #1028

jhutar opened this issue Jan 9, 2024 · 11 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@jhutar
Copy link

jhutar commented Jan 9, 2024

Feature request

Hello and thank you for Chains development! It would be great if Chains would expose some more detailed metrics besides Knative controller default metrics. Maybe something like "how many artefacts are there still to be processed".

Use case

I'm trying to measure performance of Chains. Mine approach is to generate as many sign-able artefacts as possible and measure difference between when pod that created artefacts finished and when "chains.tekton.dev/signed": "true" annotation appeared.

This is hard to do, because TaskRun yaml does not seem to contain any indication of when the annotation was added, so I have to iterate frequently to get that number.

If there is a metric that could be used to observe the count of remaining images to sign, that would be great.

@jhutar jhutar added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 9, 2024
@lcarva
Copy link
Contributor

lcarva commented Jan 9, 2024

@jhutar, the immediate alternative that I can think of is through some clever parsing of the controller logs.

Adding the suggested metric seems reasonable to me. I just have no idea how to do so.

@wlynch, @chitrangpatel, are you guys aware of any existing mechanism that could help @jhutar?

@chitrangpatel
Copy link
Member

Could we add another annotation "chains.tekton.dev/signedAt": "YYYY-mm-dd HH:MM:SS"?

Another possible way:

The Taskrun status also has the following fields:

status:
  completionTime: "2024-01-08T19:16:46Z"
  conditions:
  - lastTransitionTime: "2024-01-08T19:16:46Z"
    message: All Steps have completed executing
    reason: Succeeded
    status: "True"
    type: Succeeded
  startTime: "2024-01-08T19:16:37Z"

I'm not sure if Chains could update the completionTime (or add field to the status) to when it signs the artifacts of that Task?

@wlynch
Copy link
Member

wlynch commented Jan 9, 2024

For tracking when an object was processed, +1 for using a Condition though we should create a new condition for chains, not reuse an existing pipelines condition. This would also give us a place to return debug information to address #984. If you're looking to generate periodic reports with scripts, this would be the minimum needed.

If real-time metrics are needed, custom metrics can be added with otel. See https://github.com/tektoncd/pipeline/blob/main/pkg/taskrunmetrics/metrics.go for an example for how this is done in Pipelines.

@lcarva
Copy link
Contributor

lcarva commented Jan 9, 2024

My only concern with using a condition is having to grant access to modify TaskRunStatus resources to the ServiceAccount used to run the Chains controller.

@wlynch
Copy link
Member

wlynch commented Jan 9, 2024

We already have update permissions in order to add the current annotation. (and even though we have both the normal object + status permissions, idk what happens if you try to update the status via the CRD object - one may be a subset of the other anyway 🤔 )

resources: ["tasks", "clustertasks", "taskruns", "pipelines", "pipelineruns", "pipelineresources", "conditions", "runs"]
verbs: ["get", "list", "create", "update", "delete", "patch", "watch"]
- apiGroups: ["tekton.dev"]
resources: ["taskruns/finalizers", "pipelineruns/finalizers", "runs/finalizers"]
verbs: ["get", "list", "create", "update", "delete", "patch", "watch"]
- apiGroups: ["tekton.dev"]
resources: ["tasks/status", "clustertasks/status", "taskruns/status", "pipelines/status", "pipelineruns/status", "pipelineresources/status", "runs/status"]
verbs: ["get", "list", "create", "update", "delete", "patch", "watch"]

We should probably look into reducing these - we don't need some of these update/delete permissions (but out of scope of this issue).

@khrm
Copy link
Contributor

khrm commented Jan 17, 2024

I think for telemetry as well as for performance testing, having otel metrics is sufficient. We can have a counter here.

@khrm
Copy link
Contributor

khrm commented Jan 17, 2024

I discussed this with @sudhishmk on how to do that. He will pick this up.

/assign @sudhishmk

@tekton-robot
Copy link

@khrm: GitHub didn't allow me to assign the following users: sudhishmk.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

I discussed this with @sudhishmk on how to do that. He will pick this up.

/assign @sudhishmk

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sudhishmk
Copy link
Contributor

/assign

@sudhishmk
Copy link
Contributor

Related PR

@khrm
Copy link
Contributor

khrm commented Feb 7, 2024

I we can close this. @jhutar I think #1034 resolves what you wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants