ClusterClient does not retry TxPipeline on EOF when reading response.

There are occasions when a connection handling a TxPipeline gets closed unexpectedly (maybe because the remote server shut down). Despite the pool's attempts to discard bad idle connections, this validation is not 100% reliable (since the connection can be closed after it's returned from the pool and while it's being used). In my experience this results in an EOF error not while writing the commands on the connection but only once attempting to read the result.



## Expected Behavior

When using a `ClusterClient.TxPipeline`, pipelines should be retried when an EOF (or another retryable error) is returned while reading the pipeline responses.



## Current Behavior

`ClusterClient.TxPipeline` will retry if an EOF is observed while writing the commands to the connection: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1497-L1505

However, when reading the response from the connection in https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1508-L1525 and https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1537-L1547.

In particular, there may be an EOF in these circumstances:

- Instead of the OK response to `MULTI`: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1537
- Instead of the QUEUED response to the pipelined commands: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1542
- Instead of the response to `EXEC`: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1550
- While reading the actual pipelined command responses: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1525

However, only the last case handles retryable errors and updates `failedCmds` to trigger the retry machinery: https://github.com/redis/go-redis/blob/f3fe61148b2b8fe0a669dc23620690407f5f92af/osscluster.go#L1354-L1360 



## Possible Solution

Update `txPipelineReadQueued` and the error handling for it in `processTxPipelineNodeConn` to account for retryable errors and update the `failedCmds` parameter appropriately.

## Steps to Reproduce

Let me know if there's appetite for addressing this and I can work on a repro.




1.
2.
3.
4.

## Context (Environment)

Redis Server: 7.1
go-redis client: 9.5.1

## Detailed Description



## Possible Implementation

	if err := cn.WithWriter(c.context(ctx), c.opt.WriteTimeout, func(wr *proto.Writer) error {
	return writeCmds(wr, cmds)
	}); err != nil {
	if shouldRetry(err, true) {
	_ = c.mapCmdsByNode(ctx, failedCmds, cmds)
	}
	setCmdsErr(cmds, err)
	return err
	}

	statusCmd := cmds[0].(*StatusCmd)
	// Trim multi and exec.
	trimmedCmds := cmds[1 : len(cmds)-1]

	if err := c.txPipelineReadQueued(
	ctx, rd, statusCmd, trimmedCmds, failedCmds,
	); err != nil {
	setCmdsErr(cmds, err)

	moved, ask, addr := isMovedError(err)
	if moved \|\| ask {
	return c.cmdsMoved(ctx, trimmedCmds, moved, ask, addr, failedCmds)
	}

	return err
	}

	return pipelineReadCmds(rd, trimmedCmds)

	if err := statusCmd.readReply(rd); err != nil {
	return err
	}

	for _, cmd := range cmds {
	err := statusCmd.readReply(rd)
	if err == nil \|\| c.checkMovedErr(ctx, cmd, err, failedCmds) \|\| isRedisError(err) {
	continue
	}
	return err
	}

	if !isRedisError(err) {
	if shouldRetry(err, true) {
	_ = c.mapCmdsByNode(ctx, failedCmds, cmds)
	}
	setCmdsErr(cmds[i+1:], err)
	return err
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ClusterClient does not retry TxPipeline on EOF when reading response. #2954

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Detailed Description

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ClusterClient does not retry TxPipeline on EOF when reading response. #2954

Description

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment)

Detailed Description

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions