Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected log output - appears to double sign #314

Closed
mdyring opened this issue Jul 26, 2019 · 14 comments

Comments

@mdyring
Copy link
Contributor

commented Jul 26, 2019

We're testing an active/active setup of two validators connecting to a single tmkms v0.6.0-rc1 process on the gaia-13004 testnet.

AFAICS in the source, "signed at height" is only logged after a signature is created:

self.log_signing_request(&request, started_at);

So it appears tmkms is double signing for the same chain id?

tmkms[9826]: 12:49:39 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height: 55157 (0 ms)
tmkms[9826]: 12:49:39 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height: 55157 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height: 55158 (0 ms)
tmkms[9826]: 12:49:45 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height: 55158 (0 ms)

Our validator is not getting slashed so far.. Maybe because we have yet to propose a block and have been lucky so far (prevoting/commiting on same data on both vals).

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

I should probably have it print out the entire consensus state here rather than just the height. There's also the round and step to consider, which is my guess of what's happening here.

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

PR to add height/round/step logging here: https://github.com/tendermint/kms/pull/316/files

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

@mdyring that's merged to master now... are you capable of building from that to test this out again?

@mdyring

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2019

Thanks, I've tested with master and it seems something is fishy:

tmkms[26754]: 20:25:02 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:02 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59807/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:09 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59808/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.156.81.246:26659] signed "PreCommit" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:15 [info] [gaia-13004@35.158.126.240:26659] signed "PreCommit" at height/round/step: 59809/0/6 (0 ms)
tmkms[26754]: 20:25:20 [info] [gaia-13004@35.156.81.246:26659] signed "PreVote" at height/round/step: 59810/0/6 (0 ms)
tmkms[26754]: 20:25:20 [info] [gaia-13004@35.158.126.240:26659] signed "PreVote" at height/round/step: 59810/0/6 (0 ms)

I am testing this with softsign, but don't believe it makes a difference. Also happy to share config if you'd like.

@mdyring

This comment has been minimized.

Copy link
Contributor Author

commented Jul 26, 2019

Interestingly, on our current production setup, with a single connection from kms -> val, we see two log files for each block - so also not a line for each step as you'd expect:

tmkms[25012]: 20:31:24 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192058 (142 ms)
tmkms[25012]: 20:31:25 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192058 (142 ms)
tmkms[25012]: 20:31:31 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192059 (142 ms)
tmkms[25012]: 20:31:32 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192059 (142 ms)
tmkms[25012]: 20:31:38 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreVote" at height: 1192060 (142 ms)
tmkms[25012]: 20:31:38 [info] [cosmoshub-2@x.x.x.x:26659] signed "PreCommit" at height: 1192060 (142 ms)
@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

I'm now guessing both validators are submitting an identical block and requesting it be signed, which is allowed under the current logic (and results in an identical signature, as Ed25519 is deterministic):

https://github.com/tendermint/kms/blob/master/src/chain/state.rs#L97

I can add the block ID being signed to the log output to confirm this.

I think this should be allowed, since it tolerates faults where the KMS signs a block for a validator which goes down. If another validator asks to sign the same block as the original, and the signature algorithm is deterministic, it's really just recomputing the original signature.

Interestingly, on our current production setup, with a single connection from kms -> val, we see two log files for each block

Those are the steps of Tendermint consensus: PreVote -> PreCommit -> Commit (where the first two each require a signature)

@tarcieri tarcieri added the security label Jul 26, 2019

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

PR to add block ID logging and a [dup] flag when the KMS resigns the same block here: #317

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

@mdyring want to try it out again with the logging changes from #317 and see what the output is? They now look something like this and will add a [dup] flag when the KMS recomputes the same signature as the last block it signed:

22:47:37 [info] [test_chain_id@127.0.0.1:60313] signed PreVote:736F6D65 h/r/s:12345/2/6 (0 ms)
@mdyring

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2019

Happy to report it is as guessed (identical blocks) :-)

Minor detail: it shows [dup] for same PreVote/PreCommit. Since block id is now logged, I think we can safely remove the [dup] logging as it doesn't add that much value.

tmkms[18934]: 08:52:28 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:48620B69 h/r/s:67611/0/6 (0 ms)
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:28 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:48620B69 h/r/s:67611/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.156.81.246:26659] signed PreVote:51A5AB0D h/r/s:67612/0/6 (0 ms)
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.158.126.240:26659] signed PreVote:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]
tmkms[18934]: 08:52:34 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:51A5AB0D h/r/s:67612/0/6 (0 ms) [dup]

Thanks for the work on this and it appears we can safely do active/active now with multiple vals and a single KMS. I will increase number of vals for further testing, but feel free to close this issue.

@mdyring

This comment has been minimized.

Copy link
Contributor Author

commented Jul 27, 2019

Btw would be great to add block id logging when there is a problem, such as these:

tmkms[19431]: 11:30:11 [warn] [gaia-13004:35.158.126.240:26659] attempt to double sign at height/round/step: 69251/0/3
tmkms[19431]: 11:30:11 [error] [gaia-13004@tcp://18.196.63.108:26659] attempted double sign: step regression: round regression at height:69251 round:0 last step:6 new step:3
@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 27, 2019

Yeah, the logging isn't correct, and making it correct would require changes to the state file, so I think I'll follow your recommendation and remove it for now, but add more block ID logging.

@tarcieri tarcieri closed this in 5033acb Jul 27, 2019

@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 27, 2019

@mdyring #319 implemented your suggestions if you want to give it one more go before a final KMS release

@mdyring

This comment has been minimized.

Copy link
Contributor Author

commented Jul 28, 2019

Looks great, thanks!

Some minor details for consistency:

  1. added tcp:// prefix in the "attempted double sign" lines with step regression:
  2. warn is emitted for trying to double sign different blocks, error given for step regression
  3. "attempt to double sign" != "attempted double sign" (can make monitoring easier if consistent)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.156.81.246:26659] signed Proposal:DE23ECA605 at h/r/s 82953/0/3 (0 ms)
tmkms[3137]: 09:29:25 [warn] [gaia-13004:18.196.63.108:26659] attempt to double sign at h/r/s: 82953/0/3 (DE23ECA605 != AEFFF4498D)
tmkms[3137]: 09:29:25 [warn] [gaia-13004:35.158.126.240:26659] attempt to double sign at h/r/s: 82953/0/3 (DE23ECA605 != 1B867E0652)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.156.81.246:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [info] [gaia-13004@18.196.63.108:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [info] [gaia-13004@35.158.126.240:26659] signed PreCommit:(none) at h/r/s 82953/0/6 (0 ms)
tmkms[3137]: 09:29:25 [error] [gaia-13004@tcp://54.93.169.244:26659] attempted double sign: step regression: round regression at height:82953 round:0 last step:6 new step:3
tmkms[3137]: 09:29:25 [error] [gaia-13004@tcp://18.196.17.223:26659] attempted double sign: step regression: round regression at height:82953 round:0 last step:6 new step:3
@tarcieri

This comment has been minimized.

Copy link
Collaborator

commented Jul 28, 2019

@mdyring those should all be taken care of in #322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.