Block miner computations until everyone agrees to resume#3174
Block miner computations until everyone agrees to resume#3174lukasz-zimnoch merged 4 commits intomainfrom
Conversation
From now on, Miner has an internal latch counting how many times `Stop()` function has been called. Miner expects `Resume()` function to be called the same number of times. The reasoning behind it is that multiple goroutines can call `Stop()` and we do not want the computations to unblock until the last goroutine is done. For example: 1. Relay entry signing starts, computations are stopped. 2. ECDSA DKG starts, computations are already stopped. 3. Relay entry signing completes. Relay entry goroutine asks the computations to resume. Computations should not resume until ECDSA DKG goroutine asks to resume them. This approach requires really good hygiene with calls to miner - never forget to resume if you are stopping.
|
Actually I think we have the following alternative to the presented approach:
This way we don't need to bother about controlling the miner lifecycle by client goroutines. The lifecycle controller will be single-threaded so there will be no concurrency problems as well. Also, applications that don't need a reference to the miner (beacon) will not be forced to manage it. We can also reuse that mechanism to expose metrics if needed. WDYT? |
Clearly express the requirements for goroutines calling Stop() and Resume().
|
I think this option is definitely worth considering and experimenting with. Regarding goroutines assumptions, I think we'll have the same as now because it is not enough to say "DKG is happening" - we need to know if the current node participates in that DKG and when the last goroutine finished. But! This may be simpler in terms of concurrency because we should be able to eliminate the latch. Also, we'll avoid passing miner reference around which I already tried locally and didn't like. I would do the following: let's review/merge #3177 and #3180 to have the current approach finished in terms of miner and pool (without yet stopping/resuming miner from goroutines) and then I'll start experimenting with the approach you suggested. |
Removed deferred call to miner.Stop() given this call is already done in the body of the test and miner.Stop() can not be called twice without resuming the sane number of times.
Refs #3161
Depends on #3172
From now on, Miner has an internal latch counting how many times
Stop()function has been called. Miner expectsResume()function to be called the same number of times. The reasoning behind it is that multiple goroutines can callStop()and we do not want the computations to unblock until the last goroutine is done.For example:
With the change in this PR we want to resume the computations only when all goroutines that requested to stop them have signalled that they completed their work and computations can be resumed.
The signalled element is crucial. We really need all of them to tell the miner that the computations can be resumed. We do not know when the operations that are supposed to stop the computations are invoked and how long they are going to take. I was playing with an idea of a timeout but that turned out to be complicated and error-prone.
Since we need to trust goroutines to call
Resume, it means we assume goroutines are working as expected. I am not totally in love with this assumption but I could not figure out anything else if we do not want the generator to just keep running in the background.If we assume goroutines are working as expected and always call
ResumeafterStop, it also means we assume they are not going to callResumetwice because it breaks the assumption of goroutines being correctly implemented.In the current setup, the lock is acquired only to actually stop or resume goroutines and after the lock is acquired we check the state.
StopandResumeare not atomic but I think everything should work fine under the assumption goroutines are:Stop()beforeResume()Resume()if they calledStop().I am open to all suggestions on how to improve this mechanism. I could come up only with this solution if we want to stop the miner. I was also thinking about abandoning this path completely and reworking the pre-params generator code in tss-lib to always work in a single thread, even if generating pre-params will take ages but I don't think it's a better option. 4 years ago, when we were working on the safe prime generator and were optimizing the code pretty heavily, we decided to go with concurrency (benchmarks). What appeals to me about the current approach is that in the real world, we'll not have thousands of goroutines on a single client. Just a few executing DKG or relay entry generation and the call to the miner is quite simple:
miner.Stop(); defer miner.Resume().