-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consensus Failure when remote signer connection drops #4275
Comments
Have been experiencing the same |
@franono can you attach here the whole log message you are displayed please? |
If it's the same as the one I experience above
|
I believe this is fixed in the current master. PR: #4534 That said, we can still create a backport fix, in which |
Refs #4707 |
Backporting the fix would be great @melekes |
by "I restart" you mean https://en.wikipedia.org/wiki/Process_supervision (systemd, runit, etc.) or you restart gaiad by hand? |
I built Gaiad v2.0.8 using Tendermint branch release/v0.32.11 on a testnet validator which is using a KMS signer. Disrupted the KMS signer by:
Throughout the tests, gaiad did not crash. It had a few signer pong timeouts however it continued signing once the KMS was back available. All looks good, thanks! |
@marbar3778 let's get on with respective releases for both cosmos-sdk and gaia! |
* privval: retry GetPubKey, SignVote/Proposal indefinitely Fixes #4275
Fixed in v0.32.11. Thanks everyone ❤️ Related issue on master: #4707 |
* privval: retry GetPubKey, SignVote/Proposal indefinitely Fixes #4275
Tendermint version
Environment:
gaiad in alpine based docker container
tendermint/kms on Ubuntu 16.04
What happened:
Duplicate of closed issue: 2926. I am still experiencing this issue.
What you expected to happen:
If my KMS service temporarily disconnects (which seems to happen at least once a week), my gaiad instance experiences a consensus failure. Afterwards, it does not resolve/reconnect unless I restart my gaiad instance.
Have you tried the latest version: yes/no
I am using latest version of gaia (2.0.3) and tmkms (0.7.0)
How to reproduce it (as minimally and precisely as possible):
Start gaiad, start tmkms, shut off tmkms for a few seconds.
Logs (paste a small part showing an error (< 10 lines) or link a pastebin, gist, etc. containing more of the log file):
Config (you can paste only the changes you've made):
n/a
node command runtime flags:
n/a
/dump_consensus_state
output for consensus bugsappears normal
Anything else we need to know:
I've read in previously related issues the consensus failure is related to the
timeout_propose
flag. I have left that at the default 3s value. Is this supposed to be tweaked when using a remote signer?The text was updated successfully, but these errors were encountered: