Forcing eventlistener sink to resolve eventlistener #977

jmcshane · 2021-03-02T03:42:30Z

Changes

This change forces the eventlistenersink process to be able to resolve
the EventListener before starting the HTTP server.

This means that the sink HTTP process (and readiness probe)
will not be listen until the EventListenerLister is able to resolve
the EventListener from the API server.

This is especially useful in startup cases, but can also assist
if the pod is started without permission to read the EventListener
object. In this situation, given this change, the eventlistener
pod will restart with a logged error message about lack of access to
that specific API resource.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

Includes tests (if functionality changed/added)
Includes docs (if user facing)
Commit messages follow commit message best practices
Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Release Notes

Force eventlistener pod to resolve EventListener resource before startup

tekton-robot · 2021-03-02T03:43:31Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	32.0%	-21.3

tekton-robot · 2021-03-02T04:03:20Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	58.3%	5.0

tekton-robot · 2021-03-02T12:08:24Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	58.3%	5.0

dibyom · 2021-03-03T00:07:01Z

pkg/sink/initialization.go

+	Duration: 50 * time.Millisecond,
+	Factor:   2.0,
+	Jitter:   0.3,
+	Steps:    10,


If I'm understanding this correctly, it will try a max of 10 times right? (Not try for a max of 5seconds?)

so, its a little of both. the steps is the max number of times, if it ever hits the cap it will set the wait time to that amount and then bail out.

in this case, the first wait time is 50 milliseconds with a scale factor of 2 and jitter of 0.3. The jitter means the scale factor will be somewhere between 1.7 and 2.3. This means that after 10 steps, it waits somewhere around 10 seconds max before it fails

makes sense...do you think 10 seconds is quick enough? I was thinking 30s might be a better default?

Let me increase the factor and the num steps. Again, this is basically a question of:

how long will it take for the k8s API server to commit the eventlistener

I think closer to the 15-20 seconds is appropriate, especially considering that this is a function that blocks the startup of the HTTP endpoint. Let me modify the factor so it ends up around this value

pkg/sink/initialization.go

pkg/sink/initialization_test.go

tekton-robot · 2021-03-03T14:08:05Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	54.5%	1.2

dibyom

Just two minor comments. Otherwise good to go!

dibyom · 2021-03-04T19:04:56Z

pkg/sink/initialization.go

+	Duration: 50 * time.Millisecond,
+	Factor:   2.0,
+	Jitter:   0.3,
+	Steps:    10,


makes sense...do you think 10 seconds is quick enough? I was thinking 30s might be a better default?

pkg/sink/initialization.go

dibyom · 2021-03-04T19:08:09Z

pkg/sink/initialization_test.go

+		r.WaitForEventListenerOrDie()
+	}
+	cmd := exec.Command(os.Args[0], "-test.run=TestWaitForEventlistener_Fatal") //nolint:gosec
+	cmd.Env = append(os.Environ(), "EL_FATAL_CRASH=1")


this is clever...though we might be able to get away with passing in a test logger and capturing its output and then comparing (like we do here:

triggers/pkg/sink/sink_test.go

Lines 926 to 943 in 1066d18

logger := zaptest.NewLogger(t, zaptest.WrapOptions(zap.WrapCore(func(zapcore.Core) zapcore.Core { return core }))).Sugar()

sink.Logger = logger

ts := httptest.NewServer(http.HandlerFunc(sink.HandleEvent))

defer ts.Close()

resp, err := http.Post(ts.URL, "application/json", bytes.NewReader(tc.eventBody))

if err != nil {

t.Fatalf("error making request to eventListener: %s", err)

}

if resp.StatusCode != tc.wantStatusCode {

t.Fatalf("Status code mismatch: got %d, want %d", resp.StatusCode, http.StatusInternalServerError)

}

if tc.wantErrLogMsg != "" {

matches := logs.FilterMessage(tc.wantErrLogMsg)

if matches == nil || matches.Len() == 0 {

t.Fatalf("did not find log entry: %s.\n Logs are: %v", tc.wantErrLogMsg, logs.All())

}

)

What do you think?

Another option would be to have the function return an error and then have main.go call the log.Fatal.

yes, let me take a look at this strategy and try to implement it here

so, it doesn't look like this is possible because the behavior of fatal can't be overridden even though there is a noop action in the library. basically, its not a supported action on fatal.

i updated this function to return an error and panic instead

tekton-robot · 2021-03-05T03:20:43Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	65.2%	11.9

tekton-robot · 2021-03-05T03:23:19Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	65.2%	11.9

This means that the sink HTTP process (and readiness probe) will not pass until the EventListenerLister is able to resolve the EventListener from the API server. This is especially useful in startup cases, but can also assist if the pod is started without permission to read the EventListener object. In this situation, given this change, the eventlistener pod will restart with a logged error message about lack of access to that specific API resource.

tekton-robot · 2021-03-05T03:31:08Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	65.2%	11.9

cmd/eventlistenersink/main.go

dibyom · 2021-03-05T16:02:50Z

/approve

Just one minor thing -> panic to logger.Fatal in main

tekton-robot · 2021-03-05T16:02:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dibyom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dibyom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: Dibyo Mukherjee <dibyo@google.com>

tekton-robot · 2021-03-05T16:07:07Z

The following is the coverage report on the affected files.
Say /test pull-tekton-triggers-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/sink/initialization.go	53.3%	65.2%	11.9

dibyom · 2021-03-05T16:54:09Z

/test pull-tekton-triggers-integration-tests

dibyom · 2021-03-05T21:17:45Z

/lgtm

Previously, we had a race where the EL would start serving traffic before the lister caches were synced. This leads to the intermittent resolution issues described in tektoncd#896. tektoncd#977 was an attempt to fix this for the EventListener resource. This commit fixes it for all resource types by first registering all the listers, and then syncing the cache before serving traffic. Tested by modifying the e2e test to run the intermittently failing test 10 times. Without this fix, it fails while with it it does not (See tektoncd#1012). Fixes tektoncd#896 Signed-off-by: Dibyo Mukherjee <dibyo@google.com>

Previously, we had a race where the EL would start serving traffic before the lister caches were synced. This leads to the intermittent resolution issues described in tektoncd#896. tektoncd#977 was an attempt to fix this for the EventListener resource. This commit fixes it for all resource types by first registering all the listers, and then syncing the cache before serving traffic. We will wait for the cache to sync for 1 minute before timing out. Tested by modifying the e2e test to run the intermittently failing test 10 times. Without this fix, it fails while with it it does not (See tektoncd#1012). Fixes tektoncd#896 Signed-off-by: Dibyo Mukherjee <dibyo@google.com>

Previously, we had a race where the EL would start serving traffic before the lister caches were synced. This leads to the intermittent resolution issues described in #896. #977 was an attempt to fix this for the EventListener resource. This commit fixes it for all resource types by first registering all the listers, and then syncing the cache before serving traffic. We will wait for the cache to sync for 1 minute before timing out. Tested by modifying the e2e test to run the intermittently failing test 10 times. Without this fix, it fails while with it it does not (See #1012). Fixes #896 Signed-off-by: Dibyo Mukherjee <dibyo@google.com>

tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Mar 2, 2021

tekton-robot requested review from khrm and vtereso March 2, 2021 03:42

tekton-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 2, 2021

jmcshane force-pushed the eventlistener-resolve branch from 40536ad to a64e56e Compare March 2, 2021 04:00

tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 2, 2021

jmcshane force-pushed the eventlistener-resolve branch from a64e56e to b2c0d1a Compare March 2, 2021 12:06

dibyom reviewed Mar 3, 2021

View reviewed changes

savitaashture reviewed Mar 3, 2021

View reviewed changes

pkg/sink/initialization.go Outdated Show resolved Hide resolved

savitaashture reviewed Mar 3, 2021

View reviewed changes

pkg/sink/initialization_test.go Outdated Show resolved Hide resolved

dibyom reviewed Mar 4, 2021

View reviewed changes

dibyom added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 4, 2021

jmcshane mentioned this pull request Mar 5, 2021

Adding label selector for triggers #970

Merged

4 tasks

jmcshane force-pushed the eventlistener-resolve branch from 13bd2d7 to 7d926bb Compare March 5, 2021 03:19

jmcshane force-pushed the eventlistener-resolve branch from 7d926bb to 4401def Compare March 5, 2021 03:21

jmcshane force-pushed the eventlistener-resolve branch from 4401def to 78c2587 Compare March 5, 2021 03:29

dibyom reviewed Mar 5, 2021

View reviewed changes

cmd/eventlistenersink/main.go Outdated Show resolved Hide resolved

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2021

Update cmd/eventlistenersink/main.go

c0678c2

Co-authored-by: Dibyo Mukherjee <dibyo@google.com>

tekton-robot assigned dibyom Mar 5, 2021

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2021

tekton-robot merged commit 42be9fa into tektoncd:master Mar 5, 2021

dibyom mentioned this pull request Mar 12, 2021

EventListener sometimes fails to resolve binding #896

Closed

dibyom mentioned this pull request Mar 19, 2021

Sync lister informers before serving traffic #1013

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forcing eventlistener sink to resolve eventlistener #977

Forcing eventlistener sink to resolve eventlistener #977

jmcshane commented Mar 2, 2021

tekton-robot commented Mar 2, 2021

tekton-robot commented Mar 2, 2021

tekton-robot commented Mar 2, 2021

dibyom Mar 3, 2021

jmcshane Mar 3, 2021

dibyom Mar 4, 2021

jmcshane Mar 5, 2021

tekton-robot commented Mar 3, 2021

dibyom left a comment

dibyom Mar 4, 2021

dibyom Mar 4, 2021

jmcshane Mar 5, 2021

jmcshane Mar 5, 2021

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

dibyom commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

dibyom commented Mar 5, 2021

dibyom commented Mar 5, 2021

	logger := zaptest.NewLogger(t, zaptest.WrapOptions(zap.WrapCore(func(zapcore.Core) zapcore.Core { return core }))).Sugar()
	sink.Logger = logger

	ts := httptest.NewServer(http.HandlerFunc(sink.HandleEvent))
	defer ts.Close()

	resp, err := http.Post(ts.URL, "application/json", bytes.NewReader(tc.eventBody))
	if err != nil {
	t.Fatalf("error making request to eventListener: %s", err)
	}
	if resp.StatusCode != tc.wantStatusCode {
	t.Fatalf("Status code mismatch: got %d, want %d", resp.StatusCode, http.StatusInternalServerError)
	}
	if tc.wantErrLogMsg != "" {
	matches := logs.FilterMessage(tc.wantErrLogMsg)
	if matches == nil \|\| matches.Len() == 0 {
	t.Fatalf("did not find log entry: %s.\n Logs are: %v", tc.wantErrLogMsg, logs.All())
	}

Forcing eventlistener sink to resolve eventlistener #977

Forcing eventlistener sink to resolve eventlistener #977

Conversation

jmcshane commented Mar 2, 2021

Changes

Submitter Checklist

Release Notes

tekton-robot commented Mar 2, 2021

tekton-robot commented Mar 2, 2021

tekton-robot commented Mar 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Mar 3, 2021

dibyom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

dibyom commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

tekton-robot commented Mar 5, 2021

dibyom commented Mar 5, 2021

dibyom commented Mar 5, 2021