New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tor] Add timeout to GetStatus request #8878
Conversation
69e0243
to
3de06f2
Compare
Running tests for a couple of hours. |
Actual test results. There are almost no failures in the requests, so what we did so far seems to be good. Partial conclusions. GetResult Largest is below 40 seconds, but according to the number of FatalFails the 40 secs timeout never activated. Maybe because of the halving in the request frequency. This PR is one possible way to do it. @kiminuo what do you think?
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK, LGTM
Question: I wonder how often we need to send that GetStatus
request. Could you explain it a bit? Like you propose 5 -> 10. Would 10 -> 15 be too much? I'm not suggesting to increase it further, just asking what are the pros & cons of increasing that period.
WalletWasabi/WabiSabi/Client/RoundStateAwaiters/RoundStateUpdater.cs
Outdated
Show resolved
Hide resolved
Final results
|
The most important consideration is the reaction time. In the CJ we are waiting for phases. It is not synchronized nor can be, the CoinJoin phases are stepped by the coordinator. The client can poll from time to time. The time interval sets the maximum possible delay. For example, I am waiting for the Output registration phase - the phase total timeout is 3 minutes. When I start waiting it might happen that I just made a request so I have to wait 10 seconds to get the actual phase. |
…ter.cs Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>
…sabi into cancelgetstatus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just out of curiosity I have checked the backend request time statistics.
The largest was |
Do these times involve waiting for acquiring the arena lock? |
|
Look at the following log entries.
In a couple of cases, GetStatus requests are hanging there for minutes. For example:
10778 times. Median: 0.3s Average: 0.7s Largest 159.8s SuccessRatio: 1 FatalFails: 0
SuccessfulRatio = 100% means there were no retries at all - all the requests started were ended successfully. This sometimes means the request hangs there for 160 seconds until it returns.
Why is it a big problem?
CoinJoinClient depends on this because it is waiting for a specific phase to happen before it can continue. If GetStatus get stuck for almost 3 minutes, there is a great chance we miss the phase with the requests.
Solution
After 30 seconds we cancel the request and try again. Also, I halved the number of GetStatus requests. The client will ask every 10 seconds - this will reduce the load on Tor (on the client side as well).