New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid consensus message: missing blockId #130

Closed
tarcieri opened this Issue Dec 1, 2018 · 7 comments

Comments

Projects
None yet
4 participants
@tarcieri
Copy link
Collaborator

tarcieri commented Dec 1, 2018

Encountered this error using tmkms live on gaia-9002 (first time I've been able to use it on a testnet):

22:48:06 [DEBUG] tmkms::session: started handling request ...
22:48:06 [ERROR] [gaia-9002@tcp://10.10.8.10:5007] invalid consensus message: missing blockId
22:48:11 [INFO] KMS node ID: 1B234585630E2CF85D7972D91FE8D849F911022C
22:48:11 [DEBUG] tmkms::session: gaia-9002: Connecting to 10.10.8.10:5007...

FWIW, I've been planning on moving tmkms to do structural logging (via slog) which I think would be helpful in debugging these sorts of issues.

@tarcieri

This comment has been minimized.

Copy link
Collaborator

tarcieri commented Dec 1, 2018

Here's the corresponding error on the gaiad/tendermint side:

Dec  1 23:20:16 cosmos01 gaiad: I[1126-12-01|23:20:16.831] Updates to validators                        module=state updates=5E8673673E37450F01B64138FBF4B172BDB52D68:40015556314313,3363E8F97B02ECC00289E72173D827543047ACDA:1040547,E39B5778E0D4297A46F45CF6940B0A163A430FF8:69145,5FBB7871FCE8E33E2899C140FDEF7F95C92AF3FB:13132,4C3CCEB2273E0FE4E5746E1317BC41A00A636F48:12850
Dec  1 23:20:16 cosmos01 gaiad: I[1126-12-01|23:20:16.836] Committed state                              module=state height=30057 txs=6 appHash=FADA718C1B97B9C8F2D43F72063EE6D17FAE42AAF2AEE46FD91AE3545856B70D
Dec  1 23:20:17 cosmos01 gaiad: E[1126-12-01|23:20:17.319] Error signing vote                           module=consensus height=30058 round=0 vote="Vote{60:E307483A08C3 30058/00/2(Precommit) 000000000000 000000000000 @ 2018-12-01T23:20:17.319362831Z}" err=EOF
Dec  1 23:20:17 cosmos01 gaiad: I[1126-12-01|23:20:17.710] Executed block                               module=state height=30058 validTxs=0 invalidTxs=0
Dec  1 23:20:17 cosmos01 gaiad: I[1126-12-01|23:20:17.717] Committed state                              module=state height=30058 txs=0 appHash=3768ECCAF55DD8B778AE498B1B1892BE3BDCC4478BD7A76C851C9164D1D728C8
Dec  1 23:20:17 cosmos01 gaiad: E[1126-12-01|23:20:17.951] CONSENSUS FAILURE!!!                         module=consensus err=EOF stack="goroutine 11547 [running]:\nruntime/debug.Stack(0xc00212d2d8, 0xe63f00, 0xc000062040)\n\t/usr/lib/golang/src/runtime/debug/stack.go:24 +0xa7\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine.func2(0xc00007e700, 0x1117ab8)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:576 +0x57\npanic(0xe63f00, 0xc000062040)\n\t/usr/lib/golang/src/runtime/panic.go:513 +0x1b9\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/privval.(*RemoteSignerClient).GetAddress(0xc0000f0100, 0x0, 0x0, 0x0)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/privval/remote_signer.go:39 +0x91\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterPropose(0xc00007e700, 0x756b, 0x0)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:832 +0x5bd\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterNewRound(0xc00007e700, 0x756b, 0x0)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:782 +0x76e\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc00007e700, 0xc003f14140, 0xc000822060, 0x28, 0x1c227e0, 0xc00212dad8, 0x44021d)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1619 +0xb96\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc00007e700, 0xc003f14140, 0xc000822060, 0x28, 0xc003bdde60, 0x109, 0x109)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1469 +0x59\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc00007e700, 0x11ff1a0, 0xc00424a690, 0xc000822060, 0x28)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:650 +0x696\ngithub.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc00007e700, 0x0)\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:607 +0x670\ncreated by github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart\n\t/builder/home/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:300 +0x132\n"

(Edit: This particular trace doesn't appear that helpful. I will post a more detailed trace privately)

@ebuchman

This comment has been minimized.

Copy link

ebuchman commented Dec 2, 2018

It looks like the connection may have broken? I don't think Tendermint supports reconnecting to the KMS properly yet.

@tarcieri

This comment has been minimized.

Copy link
Collaborator

tarcieri commented Dec 2, 2018

@ebuchman yeah, the KMS is closing the connection after tmkms errors out with:

[ERROR] [gaia-9002@tcp://10.10.8.10:5007] invalid consensus message: missing blockId

This is also reproducible now if I start both gaiad and tmkms in their current states if there's anything I can test.

@zmanian

This comment has been minimized.

Copy link
Collaborator

zmanian commented Dec 2, 2018

What in think happened is that Tendermint tried to get a signature for a nil vote and the message validation rules on the KMS side rejected it.

Do we sign a message without a blockid when we vote nil?

@ebuchman

This comment has been minimized.

Copy link

ebuchman commented Dec 2, 2018

Yep, we sign BlockID{nil, PartSetHeader{}}, which encodes as [], so it's empty in the vote.

@zmanian

This comment has been minimized.

Copy link
Collaborator

zmanian commented Dec 2, 2018

My interpretation is that
https://github.com/tendermint/kms/blob/master/tendermint-rs/src/amino_types/vote.rs#L177
should not be returning an error.

@Liamsi

This comment has been minimized.

Copy link
Member

Liamsi commented Dec 2, 2018

Oh OK! Because of https://github.com/tendermint/tendermint/blob/c4d93fd27b5b2b9785f47a71edf5eba569411a72/types/vote.go#L127 I (falsely) thought the BlockID won't be empty (nil). I'll remove that check from the validation method. Thank you all!

Liamsi added a commit that referenced this issue Dec 2, 2018

more idiomatic code checking for block_id
 - also remove obsolete TODO: empty PartsSetHeader is OK
 (#130 (comment))

@Liamsi Liamsi closed this in #131 Dec 2, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment