Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retriable error bug? #79

Closed
Neal opened this issue Sep 3, 2021 · 2 comments
Closed

retriable error bug? #79

Neal opened this issue Sep 3, 2021 · 2 comments

Comments

@Neal
Copy link
Contributor

Neal commented Sep 3, 2021

(apologies for the bad issue title, I don't have enough context yet so this is the best I could name the issue but feel free to update it)

so I noticed after bumping from v0.10.2 to v0.10.3, my producer was not working. I was using sync producer so it was just stuck until context got canceled. I figured to use the bench example to try reproducing and it was not producing anything as well:

$ ./bench -brokers localhost:61810 -topic test
0.00 MiB/s; 0.00k records/s
0.00 MiB/s; 0.00k records/s
0.00 MiB/s; 0.00k records/s

after some testing, I think there's something with the retriable code. this change makes it work:

diff --git a/pkg/kgo/client.go b/pkg/kgo/client.go
index 486708c..9db4d30 100644
--- a/pkg/kgo/client.go
+++ b/pkg/kgo/client.go
@@ -619,6 +619,7 @@ start:
 				// just retry now. If the error is *not* retriable but
 				// is a broker-specific network error, and the next
 				// broker is different than the current, we also retry.
+				fmt.Printf("retriable: err=%v retryErr=%v\n", err, retryErr)
 				if r.cl.shouldRetry(tries, err) || r.cl.shouldRetry(tries, retryErr) {
 					if r.cl.waitTries(ctx, tries) {
 						next, nextErr = r.br()
diff --git a/pkg/kgo/errors.go b/pkg/kgo/errors.go
index e4995bf..af17799 100644
--- a/pkg/kgo/errors.go
+++ b/pkg/kgo/errors.go
@@ -11,7 +11,7 @@ import (

 func isRetriableBrokerErr(err error) bool {
 	if err == nil { // sanity
-		return true
+		// return true
 	}
 	// https://github.com/golang/go/issues/45729
 	//
diff --git a/pkg/kgo/go115.go b/pkg/kgo/go115.go
index 634f131..473672c 100644
--- a/pkg/kgo/go115.go
+++ b/pkg/kgo/go115.go
@@ -6,5 +6,8 @@ package kgo
 import "strings"

 func isNetClosedErr(err error) bool {
+	if err == nil { // sanity
+		return false
+	}
 	return strings.Contains(err.Error(), "use of closed network connection")
 }
$ ./bench -brokers localhost:61810 -topic test
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
33.99 MiB/s; 356.41k records/s
12.99 MiB/s; 136.23k records/s
12.99 MiB/s; 136.23k records/s
38.98 MiB/s; 408.69k records/s
38.98 MiB/s; 408.69k records/s

I left the debug log and reverted the rest and this is what I get:

$ ./bench -brokers localhost:61810 -topic test
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
0.00 MiB/s; 0.00k records/s
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
0.00 MiB/s; 0.00k records/s
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>
0.00 MiB/s; 0.00k records/s
retriable: err=broker is too old; the broker has already indicated it will not know how to handle the request retryErr=<nil>

for what it's worth, I am using the latest redpanda:

$ rpk version
v21.8.1 (rev 3522ce7e9e4b9fbe62b9474f80138c4deeb8359b)
@twmb
Copy link
Owner

twmb commented Sep 3, 2021

Good find, this is from the idempotent request being retried.

In client.go,

 622                                 if r.cl.shouldRetry(tries, err) || r.cl.shouldRetry(tries, retryErr) {

we hit this case when one or the other error is non-nil, and shouldRetry previously returned false unless it was truly retriable. Now it returns true due to what you found.

I chose true because no error should mean retriable, but I'm going to change it to false because we specifically only want to evaluate if a non-nil error is retriable.

@twmb twmb closed this as completed in 1469495 Sep 3, 2021
@twmb
Copy link
Owner

twmb commented Sep 3, 2021

I've tagged the fix in v0.11.1. v1 should be released either tomorrow or Monday (unless you find a very tricky bug 😄).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants