Description
There are occasions when a connection handling a TxPipeline gets closed unexpectedly (maybe because the remote server shut down). Despite the pool's attempts to discard bad idle connections, this validation is not 100% reliable (since the connection can be closed after it's returned from the pool and while it's being used). In my experience this results in an EOF error not while writing the commands on the connection but only once attempting to read the result.
Expected Behavior
When using a ClusterClient.TxPipeline
, pipelines should be retried when an EOF (or another retryable error) is returned while reading the pipeline responses.
Current Behavior
ClusterClient.TxPipeline
will retry if an EOF is observed while writing the commands to the connection:
Lines 1497 to 1505 in f3fe611
However, when reading the response from the connection in
Lines 1508 to 1525 in f3fe611
Lines 1537 to 1547 in f3fe611
In particular, there may be an EOF in these circumstances:
- Instead of the OK response to
MULTI
:Line 1537 in f3fe611
- Instead of the QUEUED response to the pipelined commands:
Line 1542 in f3fe611
- Instead of the response to
EXEC
:Line 1550 in f3fe611
- While reading the actual pipelined command responses:
Line 1525 in f3fe611
However, only the last case handles retryable errors and updates failedCmds
to trigger the retry machinery:
Lines 1354 to 1360 in f3fe611
Possible Solution
Update txPipelineReadQueued
and the error handling for it in processTxPipelineNodeConn
to account for retryable errors and update the failedCmds
parameter appropriately.
Steps to Reproduce
Let me know if there's appetite for addressing this and I can work on a repro.
Context (Environment)
Redis Server: 7.1
go-redis client: 9.5.1