Don't return HTTP 202 if a global interceptor fails #1465

Brutus5000 · 2022-10-18T12:06:37Z

Expected Behavior

If a global interceptor fails to process (e.g. interceptor stopped trigger processing: rpc error: code = FailedPrecondition desc = event type is not allowed), the event listener http response should return status code 400 Bad Request. This indicates the user, that something was wrong with his payload and (even more important) that the pipeline did not run at all.

Also some artifact or at least log event with level warn/error should be created indicating that something happened

Actual Behavior

Event listener return status code 202 Accepted even if an interceptor fails.
The listener pod just logs the error on info level.
No pipeline run artifact or anything is created.

Steps to Reproduce the Problem

Deploy & create an event listener with an interceptor similar to

apiVersion: triggers.tekton.dev/v1beta1
kind: EventListener
metadata:
  name: listener-gitlab-webhook
spec:
  serviceAccountName: tekton-robot
  triggers:
    - name: gitlab-push-events-trigger
      interceptors:
        - name: "verify-gitlab-payload"
          ref:
            name: "gitlab"
            kind: ClusterInterceptor
          params:
            - name: secretRef
              value:
                secretName: "gitlab-secret"
                secretKey: "secretToken"
            - name: eventTypes
              value:
                - "Push Hook"

Send an empty json object payload to the event listener, e.g. curl -v -H 'content-Type: application/json' -d '{}' http://localhost:8080

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"archive", BuildDate:"1980-01-01T00:00:00Z", GoVersion:"go1.17.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4+k3s1", GitCommit:"43b1cb48200d8f6af85c16ed944d68fcc96b6506", GitTreeState:"clean", BuildDate:"1970-01-01T01:01:01Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:

Output of tkn version or kubectl get pods -n tekton-pipelines -l app=tekton-pipelines-controller -o=jsonpath='{.items[0].metadata.labels.version}'

Client version: 0.23.1
Pipeline version: v0.40.2
Triggers version: v0.21.0
Dashboard version: v0.29.2

The text was updated successfully, but these errors were encountered:

khrm · 2022-10-18T18:43:22Z

We did this for performance reasons. Is putting out the log only at Error Level OK?

khrm · 2022-10-18T18:46:54Z

Also, we have k8s events as well as cloudevents.

Brutus5000 · 2022-10-18T21:15:43Z

Eventually the decision is up to you. I can only comment from both the user (app developer) as well as admin (cluster operator) perspective as I'm working in both worlds.

As an admin knowing it to be in k8s events and having it on error level would be sufficient.
As a user this might not be sufficient, and I'd like to elaborate why:

Your answer "performance reasons" actually surprised me.

From your mission statement Tekton is placed as CI/CD solution. So if some (mis)uses it for other cases, then the argumentation might be different, but CI/CD it is. So with CI/CD we are talking about long running processes, usually in the multiple minutes if not longer. I'm not sure which scenarios you think about, but even on large projects I wouldn't expect multiple requests per second, rather multiple requests per minute if even.

So as a user I don't really care if my pipeline takes 0.1 seconds, 1 seconds or 2 seconds to start.
But I do care if I setup a webhook from my VCS that it gives me a red marker telling me it doesn't work if I did something wrong.

I don't know what kind of performance issues you faced when implementing it. For me it sounds like a classic case of Premature Optimization issue. But you are the experts, I can only give an opinion from an external perspective.

dibyom · 2022-10-28T18:22:32Z

the event listener http response should return status code 400 Bad Request. This indicates the user, that something was wrong with his payload and (even more important) that the pipeline did not run at all.

From the example your provided, it does not necessarily mean that something was wrong with the payload - the payload itself may be fine but the trigger author wanted to filter out that particular event type. One complication here is that there isn't a 1-1 mapping between an incoming event and a trigger - a single incoming event can fire multiple triggers.

#931 and #1183 (comment) has some context on the issues we faced that led to the current design.

(One of the few times it might make sense to actually return a 4xx error is if the incoming payload is not proper json)

tekton-robot · 2023-01-26T19:11:17Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-02-25T19:24:21Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

Brutus5000 · 2023-03-24T22:07:26Z

/remove-lifecycle rotten

I'd like to raise yet another issue once more, with a different "outcome". Right now I'm working on an Azure RedHat OpenShift cluster with Tekton installed (via Red Hat OpenShift Pipelines operator).
On this particular cluster I'm not cluster admin, but just admin for some namespaces. And again due to incorrect yamls the trigger did not work. But in that setup it is impossible to read any error message if it is not part of the payload response.

I don't have a good suggestion how to make this visible, but seeing failed EventListener requests somewhere (a new CRD?) would be really helpful in such a scenario.

khrm · 2023-03-25T08:56:53Z

What about events that generated? Are they sufficient to debug?

Brutus5000 · 2023-03-28T18:53:54Z

I have not seen any events, but I susspect the OpenShift cluster to be buggy, as I have troubles opening terminals and other things too.

tekton-robot · 2023-06-26T19:28:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-07-26T20:11:59Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-08-25T20:18:58Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

tekton-robot · 2023-08-25T20:19:01Z

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Brutus5000 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 18, 2022

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2023

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 25, 2023

tekton-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 24, 2023

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2023

tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 26, 2023

tekton-robot closed this as completed Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't return HTTP 202 if a global interceptor fails #1465

Don't return HTTP 202 if a global interceptor fails #1465

Brutus5000 commented Oct 18, 2022 •

edited

Loading

khrm commented Oct 18, 2022

khrm commented Oct 18, 2022

Brutus5000 commented Oct 18, 2022 •

edited

Loading

dibyom commented Oct 28, 2022 •

edited

Loading

tekton-robot commented Jan 26, 2023

tekton-robot commented Feb 25, 2023

Brutus5000 commented Mar 24, 2023

khrm commented Mar 25, 2023

Brutus5000 commented Mar 28, 2023 •

edited

Loading

tekton-robot commented Jun 26, 2023

tekton-robot commented Jul 26, 2023

tekton-robot commented Aug 25, 2023

tekton-robot commented Aug 25, 2023

Don't return HTTP 202 if a global interceptor fails #1465

Don't return HTTP 202 if a global interceptor fails #1465

Comments

Brutus5000 commented Oct 18, 2022 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Additional Info

khrm commented Oct 18, 2022

khrm commented Oct 18, 2022

Brutus5000 commented Oct 18, 2022 • edited Loading

dibyom commented Oct 28, 2022 • edited Loading

tekton-robot commented Jan 26, 2023

tekton-robot commented Feb 25, 2023

Brutus5000 commented Mar 24, 2023

khrm commented Mar 25, 2023

Brutus5000 commented Mar 28, 2023 • edited Loading

tekton-robot commented Jun 26, 2023

tekton-robot commented Jul 26, 2023

tekton-robot commented Aug 25, 2023

tekton-robot commented Aug 25, 2023

Brutus5000 commented Oct 18, 2022 •

edited

Loading

Brutus5000 commented Oct 18, 2022 •

edited

Loading

dibyom commented Oct 28, 2022 •

edited

Loading

Brutus5000 commented Mar 28, 2023 •

edited

Loading