Skip to content

Getting $PoolSize connection errors in a row permanently breaks a redis client #3062

Closed
@philandstuff

Description

@philandstuff

Expected Behavior

A flurry of transient connection errors should not result in a client getting into a permanently broken state. At some point, it should try a new connection.

Current Behavior

If you have a redis.Client with a PoolSize of, say, 5, and you get 5 failed connections in a row without an intervening successful connection, every subsequent call to that redis client will fail. It will return a stored error without trying to make a new connection.

Possible Solution

See "Possible Implementation" below.

Steps to Reproduce

Reproduction courtesy of my colleague @nickstenning: https://gist.github.com/nickstenning/e9479b62b3609927e27dc95e05385250

In this repro, we create a client with a PoolSize of 2. It then experiences two failed connections in a row, and from that point forward never attempts to make a new connection again, only returns the last stored connection error.

Context (Environment)

This affected us because we had PoolSize=2 (which makes sense for our use case where we have mostly single-threaded access to the redis.Client), we saw 2 connection errors in a row, and then our app got stuck in a tight spinloop where it was repeatedly calling redis and getting an error back. Our expectation was that calling the redis client would perform some sort of network operation, so it was surprising that it did not.

We were running go-redis v9.5.3 (which has disappeared, like v9.5.4 - see #3057).

Detailed Description

The relevant code is here: https://github.com/redis/go-redis/blob/v9.5.2/internal/pool/pool.go#L209-L220

In particular, once p.dialErrorsNum reaches p.cfg.PoolSize, dialConn() will always return an error without attempting to create a connection.

Possible Implementation

Remove the code that's counting and storing errors. It's not clear to me what the intention of this code is, but I don't think getting the client into a permanently broken state like this is a good idea.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions