New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close and retry a RemoteSigner on err #2876

Closed
wants to merge 2 commits into
base: develop
from

Conversation

Projects
None yet
3 participants
@zmanian
Copy link
Contributor

zmanian commented Nov 18, 2018

This allows recovery if the RemoteSigner service is restarted etc without restarting the process.

Zaki Manian added some commits Nov 18, 2018

@zmanian zmanian requested review from ebuchman , melekes and xla as code owners Nov 18, 2018

@melekes
Copy link
Contributor

melekes left a comment

🥑

@@ -25,4 +25,6 @@ program](https://hackerone.com/tendermint).

### IMPROVEMENTS:

Retry RemoteSigner connections on Error

This comment has been minimized.

@melekes

melekes Nov 19, 2018

Contributor
Suggested change Beta
Retry RemoteSigner connections on Error
- [privval] retry RemoteSigner connections on Error
@@ -123,6 +123,9 @@ func (sc *TCPVal) OnStart() error {
"Ping",
"err", err,
)
sc.conn.Close()
sc.RemoteSignerClient = NewRemoteSignerClient(sc.conn)

This comment has been minimized.

@melekes

melekes Nov 19, 2018

Contributor

If we're closing the sc.conn, why do we create a new remote signer client with it? Shouldn't we call OnStop(), then OnStart() to wait for conn and recreate all necessary structures and goroutines?

Shouldn't we assert the err's type? We may want to shut everything down in case of ErrUnexpectedResponse, no?

This comment has been minimized.

@Liamsi

Liamsi Nov 27, 2018

Member

If we're closing the sc.conn, why do we create a new remote signer client with it? Shouldn't we call OnStop(), then OnStart() to wait for conn and recreate all necessary structures and goroutines?

I think that makes sense! Do we need to limit the recursion depth here though (as we are already in OnStart)?

Shouldn't we assert the err's type? We may want to shut everything down in case of ErrUnexpectedResponse, no?

Probably yes. ErrUnexpectedResponse means the other side responded but with another message than expected (not PingResponse).

This comment has been minimized.

@Liamsi

Liamsi Nov 27, 2018

Member

I'll continue here: #2923

This comment has been minimized.

@Liamsi

Liamsi Nov 27, 2018

Member

BTW calling OnStart won't be enough as this might time out (again) if the remote signer isn't up yet. We might need sth like retrying with exponential backoff.
Also, the whole ConsensusState shuts down if the remote signer drops (restarting the remote signer won't help with this): #2926

@Liamsi

This comment has been minimized.

Copy link
Member

Liamsi commented Nov 27, 2018

closing in favour of: #2923

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment