Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Standalone network stops keygen after session 73 #396

Closed
dutterbutter opened this issue Sep 27, 2022 · 6 comments
Closed

[BUG] Standalone network stops keygen after session 73 #396

dutterbutter opened this issue Sep 27, 2022 · 6 comments
Assignees
Labels
bug 🪲 Something isn't working

Comments

@dutterbutter
Copy link
Contributor

dutterbutter commented Sep 27, 2022

Describe the bug

The current standalone network which contains three nodes seems to have stop generating a dkg public key after session 73 or block 43,800. The three nodes are still finalizing blocks and progressing the chain but keygen has stalled at session 73. We are currently running t=1 (signing threshold) n=2 (keygen threshold) if I am not mistaken, and it seems standalone-node-1 has been jailed for some misbehaviour and there are only 2 authority nodes participating.

The logs indicate a keygen misbehaviour by KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD which I believe to be standalone-node-1. There is still no details about why this node was issued a misbehaviour report.

To Reproduce
Steps to reproduce the observe behaviour:

  1. Go to polkadot UI > Developer > ChainState > DKG > dkgPublickey()

You will notice:

dkg.dkgPublicKey: (u64,Bytes)
[
  73
  0x03f005859977b45e09c55c6a4ac09af580e5f208c9fbd70fcc89652b555f2e4604
]

Log output

From standalone-node-1

Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.946 DEBUG tokio-runtime-worker dkg_gadget::worker: Going to handle Finality notification
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.946 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  Processing block notification for block 89544
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.946 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  Latest header is now: 89544
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.947 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  QUEUED KEYGEN IN PROGRESS: false
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.947 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  QUEUED DKG ID: 74
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.947 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  QUEUED VALIDATOR SET ID: 74
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.947 DEBUG tokio-runtime-worker dkg_gadget::worker: 🕸️  QUEUED DKG STATUS: None
Sep 27 20:02:42 eggnet-arana-1 dkg-standalone-node[2643155]: 2022-09-27 20:02:42.947  INFO tokio-runtime-worker dkg_gadget::worker: 🕸️  NOT IN THE SET OF BEST AUTHORITIES: round 73

From standalone-node-2

Sep 27 20:01:37 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:37 Received error: KeygenTimeout { bad_actors: [2] }
Sep 27 20:01:37 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:37 🕸️  DKG Keygen misbehaviour by KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD
Sep 27 20:01:37 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:37 🕸️  IN THE SET OF BEST AUTHORITIES: round 73
Sep 27 20:01:41 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:41 💤 Idle (3 peers), best: #89535 (0xe2f4…a64b), finalized #89533 (0x4a0a…3e0e), ⬇ 3.1kiB/s ⬆ 2.9kiB/s
Sep 27 20:01:42 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:42 ✨ Imported #89536 (0x447d…6009)
Sep 27 20:01:42 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:42 Failed to get next signed proposal: Unable to get next proposal batch
Sep 27 20:01:43 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:43 Received error: KeygenTimeout { bad_actors: [2] }
Sep 27 20:01:43 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:43 🕸️  DKG Keygen misbehaviour by KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD
Sep 27 20:01:43 eggnet-arana-2 dkg-standalone-node[125856]: 2022-09-27 20:01:43 🕸️  IN THE SET OF BEST AUTHORITIES: round 73

Expected behaviour

It should continue to progress through keygen every session, and in these network params that should be every 600 blocks.

Environment

Machine specs:

  • vCPU/s: 2 vCPUs
  • RAM: 8192 MB
  • Storage: 50 GB NVMe + 128GB of Block Storage
  • Bandwidth: 6.94 GB of 5000 GB

Note: The standalone network uses the integration tests parameters which can be found here.

Other information and links

@dutterbutter dutterbutter added the bug 🪲 Something isn't working label Sep 27, 2022
@dutterbutter dutterbutter changed the title [BUG] Standalone network stops generating key at session 73 [BUG] Standalone network stops keygen at session 73 Sep 27, 2022
@dutterbutter
Copy link
Contributor Author

dutterbutter commented Sep 27, 2022

I submitted a sudo forceUnjailAuthority call on block 90,140. So at block 90,600 we should see a new dkg public key submitted. Although the logs still indicate KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD is a bad actor.

(Followup)

The dkg key was not submitted on block 90,600 and standalone-node-1 is still not included in the authority set.

@drewstone
Copy link
Contributor

drewstone commented Sep 27, 2022 via email

@dutterbutter
Copy link
Contributor Author

@drewstone the keygen has not started since session 73. In my estimation, between session 73 and 74, a misbehaviour report was issued against KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD. Since its considered a bad actor it cannot participate in keygen and given we only have 3 running authority nodes keygen stalled.

Keygen is working as intended given 1 of 3 authority nodes has been jailed but the issue is understanding why KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD was issued a misbehaviour report, and to my understanding shouldn't the keygen threshold be updated dynamically to reflect the current network condition?

@shekohex
Copy link
Contributor

<...>, and to my understanding shouldn't the keygen threshold be updated dynamically to reflect the current network condition?

Yes, it should update, and start a new keygen process immediately with the two new best authorities. you can see now it is keygenThreshold = 2 on chain. what should happen after un:jailing KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD and they got into the validator set, you need to increase the keygen threshold, then after exactly 2 sessions, we should see KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD again in the keygen process. nonetheless, as you mentioned that even after kicking the misbehaved party from the DKG, it still not able to continue to progress, the main reason for that is that another peer (from the current two) is also misbehaving (could be the whole network got into a bad state and all of them was misbehaving but the unlucky KW63ywCaFXGM5GjrVdWKsMPCqmQGRfPCyT89ZtbryNRuqqQpD got jailed first), since there are just two DKG authorities now, we can't jail any of them, you need t+1 votes of misbehaviour to jail someone, so it is like none of them can prove the other is the source of the misbehaviour, unless one node reported itself.

@dutterbutter
Copy link
Contributor Author

@shekohex oddly enough if you make call dkg.jailedKeygenAuthorities the response is empty. So I am not sure if the rpc call is not working or we don't have a node jailed.

@shekohex
Copy link
Contributor

@shekohex oddly enough if you make call dkg.jailedKeygenAuthorities the response is empty. So I am not sure if the rpc call is not working or we don't have a node jailed.

It's correct, we do not have any jailed authorities, you did unjail it. That does not mean it will enter the DKG again, not until you increase the keygen threshold again.

@dutterbutter dutterbutter changed the title [BUG] Standalone network stops keygen at session 73 [BUG] Standalone network stops keygen after session 73 Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants