Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tor] Add timeout to GetStatus request #8878

Merged
merged 8 commits into from Aug 8, 2022

Conversation

molnard
Copy link
Collaborator

@molnard molnard commented Aug 5, 2022

Look at the following log entries.

2022-08-05 09:23:29.675 [21] INFO	RequestTimeStatista2.Display (69)	Responded to 'XXX-API-OK-GetStatus'	 10778 times. Median: 0.3s Average: 0.7s Largest 159.8s SuccessRatio: 1 FatalFails: 0

2022-08-05 09:20:56.982 [41] INFO	RequestTimeStatista2.Display (69)	Responded to 'XXX-API-OK-GetStatus'	 2984 times. Median: 0.3s Average: 0.5s Largest 22.2s SuccessRatio: 1 FatalFails: 0

2022-08-03 18:49:21.478 [36] DEBUG	WabiSabiHttpApiClient.SendWithRetriesAsync (88)	Received a response for GetStatus in 316s after 2 failed attempts: System.AggregateException: One or more errors occurred. (Failed to get/read an HTTP response from Tor.)
 ---> System.Net.Http.HttpRequestException: Failed to get/read an HTTP response from Tor.
 ---> WalletWasabi.Tor.Socks5.Exceptions.TorConnectionReadException: Could not read HTTP response.
 ---> WalletWasabi.Tor.Socks5.Exceptions.TorConnectionReadException: HTTP start-line is incomplete. Tor circuit probably died.
   at WalletWasabi.Tor.Http.Helpers.HttpMessageHelper.ReadStartLineAsync(Stream stream, CancellationToken ctsToken) in WalletWasabi\Tor\Http\Helpers\HttpMessageHelper.cs:line 43
   at WalletWasabi.Tor.Http.Extensions.HttpResponseMessageExtensions.CreateNewAsync(Stream responseStream, HttpMethod requestMethod, CancellationToken cancellationToken) in WalletWasabi\Tor\Http\Extensions\HttpResponseMessageExtensions.cs:line 39
   at WalletWasabi.Tor.Socks5.Pool.TorHttpPool.SendCoreAsync(TorTcpConnection connection, HttpRequestMessage request, CancellationToken token) in WalletWasabi\Tor\Socks5\Pool\TorHttpPool.cs:line 339

In a couple of cases, GetStatus requests are hanging there for minutes. For example:
10778 times. Median: 0.3s Average: 0.7s Largest 159.8s SuccessRatio: 1 FatalFails: 0
SuccessfulRatio = 100% means there were no retries at all - all the requests started were ended successfully. This sometimes means the request hangs there for 160 seconds until it returns.

Why is it a big problem?

CoinJoinClient depends on this because it is waiting for a specific phase to happen before it can continue. If GetStatus get stuck for almost 3 minutes, there is a great chance we miss the phase with the requests.

Solution

After 30 seconds we cancel the request and try again. Also, I halved the number of GetStatus requests. The client will ask every 10 seconds - this will reduce the load on Tor (on the client side as well).

@molnard
Copy link
Collaborator Author

molnard commented Aug 5, 2022

Running tests for a couple of hours.

@molnard
Copy link
Collaborator Author

molnard commented Aug 6, 2022

Actual test results. There are almost no failures in the requests, so what we did so far seems to be good.

Partial conclusions. GetResult Largest is below 40 seconds, but according to the number of FatalFails the 40 secs timeout never activated. Maybe because of the halving in the request frequency.
It is also clear other requests can fly away in time (293 seconds), which is not good at all. So adding timeouts will definitely require.

This PR is one possible way to do it. @kiminuo what do you think?

2022-08-06 10:10:54.439 [58] INFO       RequestTimeStatista2.Display (61)       Response times for the last 1038 minutes:
2022-08-06 10:10:54.441 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-GetStatus'      5719 times. Median: 0.7s Average: 0.9s Largest 34.6s SuccessRatio: 1 FatalFails: 0
2022-08-06 10:10:54.442 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ConfirmConnection'      1949 times. Median: 0.9s Average: 1.6s Largest 143.8s SuccessRatio: 1 FatalFails: 3
2022-08-06 10:10:54.443 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReissueCredential'      592 times. Median: 3.4s Average: 5.3s Largest 293.6s SuccessRatio: 1 FatalFails: 0
2022-08-06 10:10:54.443 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterInput'  468 times. Median: 2.1s Average: 3.5s Largest 54.7s SuccessRatio: 1 FatalFails: 4
2022-08-06 10:10:54.444 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterOutput'         379 times. Median: 2.5s Average: 3.7s Largest 46.4s SuccessRatio: 1 FatalFails: 0
2022-08-06 10:10:54.445 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReadyToSign'    347 times. Median: 2.6s Average: 4.7s Largest 96.0s SuccessRatio: 1 FatalFails: 0
2022-08-06 10:10:54.445 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-SignTransaction'        340 times. Median: 2.3s Average: 4.2s Largest 102.3s SuccessRatio: 1 FatalFails: 0
2022-08-06 10:10:54.446 [58] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RemoveInput'    71 times. Median: 0.0s Average: 0.0s Largest 0.1s SuccessRatio: -1 FatalFails: 71

@molnard
Copy link
Collaborator Author

molnard commented Aug 7, 2022

2022-08-07 08:41:48.447 [35] INFO       RequestTimeStatista2.Display (61)       Response times for the last 2389 minutes:
2022-08-07 08:41:48.451 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-GetStatus'      13094 times. Median: 0.7s Average: 0.9s Largest 40.0s SuccessRatio: 1 FatalFails: 10
2022-08-07 08:41:48.453 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ConfirmConnection'      5446 times. Median: 0.9s Average: 1.9s Largest 194.4s SuccessRatio: 1 FatalFails: 6
2022-08-07 08:41:48.454 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReissueCredential'      1565 times. Median: 3.4s Average: 5.6s Largest 293.6s SuccessRatio: 1 FatalFails: 0
2022-08-07 08:41:48.454 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterInput'  1274 times. Median: 2.1s Average: 3.8s Largest 63.0s SuccessRatio: 1 FatalFails: 13
2022-08-07 08:41:48.455 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterOutput'         993 times. Median: 2.5s Average: 3.8s Largest 66.8s SuccessRatio: 1 FatalFails: 0
2022-08-07 08:41:48.456 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReadyToSign'    986 times. Median: 2.6s Average: 4.1s Largest 96.0s SuccessRatio: 1 FatalFails: 0
2022-08-07 08:41:48.457 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-SignTransaction'        982 times. Median: 2.3s Average: 4.2s Largest 102.3s SuccessRatio: 1 FatalFails: 0
2022-08-07 08:41:48.457 [35] INFO       RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RemoveInput'    114 times. Median: 0.0s Average: 0.0s Largest 0.1s SuccessRatio: -1 FatalFails: 114

Copy link
Collaborator

@kiminuo kiminuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, LGTM

Question: I wonder how often we need to send that GetStatus request. Could you explain it a bit? Like you propose 5 -> 10. Would 10 -> 15 be too much? I'm not suggesting to increase it further, just asking what are the pros & cons of increasing that period.

@molnard
Copy link
Collaborator Author

molnard commented Aug 8, 2022

Final results

2022-08-08 09:14:00.816 [108] INFO      RequestTimeStatista2.Display (61)       Response times for the last 3862 minutes:
2022-08-08 09:14:00.821 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-GetStatus'      21060 times. Median: 0.7s Average: 1.0s Largest 40.0s SuccessRatio: 1 FatalFails: 17
2022-08-08 09:14:00.824 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ConfirmConnection'      9033 times. Median: 0.9s Average: 1.7s Largest 212.4s SuccessRatio: 1 FatalFails: 9
2022-08-08 09:14:00.825 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReissueCredential'      2660 times. Median: 3.4s Average: 5.3s Largest 293.6s SuccessRatio: 1 FatalFails: 0
2022-08-08 09:14:00.826 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterInput'  2128 times. Median: 2.1s Average: 3.8s Largest 89.3s SuccessRatio: 1 FatalFails: 16
2022-08-08 09:14:00.827 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-ReadyToSign'    1680 times. Median: 2.4s Average: 3.8s Largest 204.0s SuccessRatio: 1 FatalFails: 7
2022-08-08 09:14:00.828 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-SignTransaction'        1670 times. Median: 2.1s Average: 4.1s Largest 120.2s SuccessRatio: 1 FatalFails: 0
2022-08-08 09:14:00.829 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RegisterOutput'         1656 times. Median: 2.4s Average: 3.6s Largest 66.8s SuccessRatio: 1 FatalFails: 0
2022-08-08 09:14:00.830 [108] INFO      RequestTimeStatista2.Display (72)       Responded to 'XXX-API-OK-RemoveInput'    191 times. Median: 0.0s Average: 0.0s Largest 0.1s SuccessRatio: -1 FatalFails: 191

@molnard
Copy link
Collaborator Author

molnard commented Aug 8, 2022

utACK, LGTM

Question: I wonder how often we need to send that GetStatus request. Could you explain it a bit? Like you propose 5 -> 10. Would 10 -> 15 be too much? I'm not suggesting to increase it further, just asking what are the pros & cons of increasing that period.

The most important consideration is the reaction time. In the CJ we are waiting for phases. It is not synchronized nor can be, the CoinJoin phases are stepped by the coordinator. The client can poll from time to time. The time interval sets the maximum possible delay. For example, I am waiting for the Output registration phase - the phase total timeout is 3 minutes. When I start waiting it might happen that I just made a request so I have to wait 10 seconds to get the actual phase.

molnard and others added 2 commits August 8, 2022 12:32
@pull-request-size pull-request-size bot added size/M and removed size/XS labels Aug 8, 2022
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 8, 2022
@pull-request-size pull-request-size bot added size/S and removed size/L labels Aug 8, 2022
@molnard molnard marked this pull request as ready for review August 8, 2022 10:44
@molnard molnard requested a review from kiminuo August 8, 2022 10:48
Copy link
Collaborator

@kiminuo kiminuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@molnard molnard merged commit d503e83 into zkSNACKs:master Aug 8, 2022
@molnard molnard deleted the cancelgetstatus branch August 8, 2022 12:15
@molnard
Copy link
Collaborator Author

molnard commented Aug 8, 2022

Just out of curiosity I have checked the backend request time statistics.

2022-08-08 06:21:54.715 [810] INFO	RequestTimeStatista.Display (60)	Response times for the last 60 minutes:
2022-08-08 06:21:54.820 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'status'	 277916 times. Median: 0.0s Average: 0.0s Largest 0.1s
2022-08-08 06:21:54.822 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'connection-confirmation'	 6442 times. Median: 0.0s Average: 0.2s Largest 3.1s
2022-08-08 06:21:54.823 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'arena-period'	 3552 times. Median: 0.0s Average: 0.0s Largest 1.3s
2022-08-08 06:21:54.823 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'credential-issuance'	 2035 times. Median: 0.4s Average: 0.5s Largest 1.7s
2022-08-08 06:21:54.823 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'input-registration'	 1682 times. Median: 0.0s Average: 0.0s Largest 1.4s
2022-08-08 06:21:54.824 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'transaction-signature'	 1415 times. Median: 0.0s Average: 0.0s Largest 0.0s
2022-08-08 06:21:54.824 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'output-registration'	 1303 times. Median: 0.0s Average: 0.0s Largest 0.4s
2022-08-08 06:21:54.824 [810] INFO	RequestTimeStatista.Display (64)	Responded to 'ready-to-sign'	 1230 times. Median: 0.0s Average: 0.0s Largest 0.3s

The largest was
INFO RequestTimeStatista.Display (64) Responded to 'connection-confirmation' 11883 times. Median: 0.0s Average: 0.3s Largest 6.1s

@kiminuo
Copy link
Collaborator

kiminuo commented Aug 8, 2022

Do these times involve waiting for acquiring the arena lock?

@molnard
Copy link
Collaborator Author

molnard commented Aug 8, 2022

Do these times involve waiting for acquiring the arena lock?

Yes. https://github.com/molnard/WalletWasabi/blob/308f2baa6cb37031ec95d0597bb16803d47f891c/WalletWasabi.Backend/Controllers/WabiSabiController.cs#L104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants