New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash gracefully #12395
Crash gracefully #12395
Conversation
38e448c
to
3d4b157
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK, the crash reporter is invoked and Wasabi crashes gracefully.
(Crash reporter is broken atm, this PR didn't break it, nor its responsibility to fix it)
I tested with #12392 and can confirm that it works as expected |
3d4b157
to
5c7a143
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this, the crash reporter didn't show up.
2024-02-09 01:17:38.102 [20] INFO HybridFeeProvider.OnAllFeeEstimateArrived (132) Accurate fee rates are acquired from WasabiSynchronizer ranging from target 2 blocks at 1 sat/vByte to target 2 blocks at 1 sat/vByte.
2024-02-09 01:17:38.107 [20] ERROR WalletFilterProcessor.ExecuteAsync (189) System.Exception: test
at WalletWasabi.Wallets.WalletFilterProcessor.ExecuteAsync(CancellationToken cancellationToken) in WalletWasabi\Wallets\WalletFilterProcessor.cs:line 108
2024-02-09 01:17:38.108 [20] INFO Wallet.Dispose (302) StartAsync finished in 4 milliseconds.
2024-02-09 01:17:38.111 [20] ERROR WalletLoadWorkflow.LoadWalletAsync (105) System.Exception: test
at WalletWasabi.Wallets.WalletFilterProcessor.ExecuteAsync(CancellationToken cancellationToken) in WalletWasabi\Wallets\WalletFilterProcessor.cs:line 108
at WalletWasabi.Wallets.WalletFilterProcessor.StartAsync(CancellationToken cancellationToken) in WalletWasabi\Wallets\WalletFilterProcessor.cs:line 288
at WalletWasabi.Wallets.Wallet.StartAsync(CancellationToken cancel) in WalletWasabi\Wallets\Wallet.cs:line 306
at WalletWasabi.Wallets.WalletManager.StartWalletAsync(Wallet wallet) in WalletWasabi\Wallets\WalletManager.cs:line 243
at WalletWasabi.Wallets.WalletManager.StartWalletAsync(Wallet wallet) in WalletWasabi\Wallets\WalletManager.cs:line 251
at WalletWasabi.Fluent.Models.Wallets.WalletLoadWorkflow.<LoadWalletAsync>b__21_0() in WalletWasabi.Fluent\Models\Wallets\WalletLoadWorkflow.cs:line 97
at WalletWasabi.Fluent.Models.Wallets.WalletLoadWorkflow.LoadWalletAsync(Boolean isBackendAvailable) in WalletWasabi.Fluent\Models\Wallets\WalletLoadWorkflow.cs:line 97
2024-02-09 01:17:38.194 [22] INFO WasabiAppBuilder.TerminateApplicationAsync (139) Wasabi GUI stopped gracefully (7f49a81b-4684-4aed-8eb2-ca0c9a2de20f).
2024-02-09 01:17:38.196 [22] WARNING Global.DisposeAsync (402) Process is exiting.
.
.
.
2024-02-09 01:17:38.305 [19] INFO Global.DisposeAsync (523) AllTransactionStore is disposed.
2024-02-09 01:17:38.305 [1] INFO TerminateService.Terminate (184) Wasabi stopped gracefully (7f49a81b-4684-4aed-8eb2-ca0c9a2de20f).
2024-02-09 01:17:38.322 [1] INFO WasabiAppBuilder.BeforeStopping (105) Wasabi GUI stopped gracefully (7f49a81b-4684-4aed-8eb2-ca0c9a2de20f).
2024-02-09 01:17:38.340 [1] CRITICAL Program.Main (80) System.Exception: test
at WalletWasabi.Fluent.Desktop.Program.Main(String[] args) in WalletWasabi.Fluent.Desktop\Program.cs:line 67
From what @yahiheb sent, the PR is working as expected, but it seems that the |
Yep, still no Crash Reporter on Win11. Only a hanging WW process, which does nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a look after the release.
We have talked about this PR with Clement and this approach is really not good. Just to shortly say why:
A better approach is IMO that one detects that It's IMO better to better to crash immeditely in case some unhandled exception was thrown. Otherwise, there is a risk that the graceful shutdown makes things worse. This concept of checking of services for exceptions can be then used for all services without modifying the services themselves. That would be a win as well. |
Graceful shutdown means to me that I ask the components to stop. Crash immediately means to just stop the execution of the Threads / Tasks wherever they are. Is this what you mean by that? First, we need to decide on this!
We have multiple options, see this and below. Add the static and make it work - is something preventing us from that? I don't see anything that would be broken by that, moreover, it could fix what is broken now. In general, it is a red flag but red flags do not mean it is an issue by default, just something we need to look very closely and consider wisely. One more perspective on this, is TerminateService a program-wise service that everyone can use? Like Logging or HostedServices. If it is like that we might consider having a similar structure here.
I can imagine a solution with an interface that is added to all classes that can cause crashes and gather them in TerminateService. Whenever there is a signal it will start the crash. |
Graceful shutdown is typically: A program is running fine and at some point user decides to turn it off and at that point everything is stopped and disposed. The important part is that the program is running "fine" (no service is terribly broken)
By crash I mean that the program should stop all its threads immediately and no cleanup should be done whatsoever. So I'm saying "if WalletFilterProcessor (WFP) throws an exception that is unhandled or effectively unhandled (e.g. just logging is not enough to handle an exception) then we should crash and show the crash reporter, not ask for a graceful shutdown"1. The reason why I think it makes good sense is: If there is an unhandled exception in the WFP (or any other background service really, WFP is a placeholder here), then there is a risk that during that graceful shutdown you make things worse (e.g. store corrupted data, or throw another exception at a different place that would obfuscate the original error, or some other service stops working because the other is dead, etc. etc).2 However, to clarify it, it's important to say when one should not crash. And that is whenever we can fully recover from an exceptional state and we actually implemented that handling (like if there are connectivity issues we can recover, we simply try again, etc. So this is covered and hence no crash is needed.) Why should we care? Because if the app crashes, then we can fix it. If we don't crash and we just log the error, then the likelyhood of the issue being fixed is small (the so called it somehow works so why bother principle)3. Footnotes
|
|
Yes, the exception needs to lead to that action in general. I proposed "propagation of the exception", you did that stuff with the
Could you describe them?
So my concern are scenarios like:
Something like this. Anyway, the question is a philosofical one, not that much practical1. I'm saying "if a part of application crashes, attempt to terminate it immediately, not to make things worse" (a pessimistic approach). You are saying "if a part of application crashes, then it's still safe to terminate gracefully" (that's an optimistic approach). Note that there are still scenarios when you don't have it under your control -- when a machine battery dies. I don't believe that there is an approach that would work always. It feels like all approaches are compomises. edit: This is what I basically talk about https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html#guidelines-for-error-handling. Footnotes
|
I assumed we use
|
Yes I agree, the cases have to be evaluated one by one. However, we could set up a rule of what is the default crash and what can be used only if there is a very strong reason - to help future decisions and minimize confusion. |
I'll try to summarize my thoughts a bit more: I would add that my Anyway, then there are two approaches what to do when such a big bug happens:
Now regarding #12395 (comment), I'll try to address it one by one:
We would not put Environment.Exit in components. The idea was to throw
I don't think we use finalizers really. But even if we do, then you are not guaranteed that they are called
This is true. However, we try to establish here if risk of losing some data is worse than risk of storing invalid data. This is not a trivial question. Your approach can be better, mine can be better, or it depends on a situation. So the question should be narrowed down to "what is more likely be better", I guess.
So once the application terminates, file handles, network connections and other resources are released. If an instance of bitcoind is started and a crash occurs, then it might not be stopped (btw: this is what happens in integration tests all the time). Tor is either left to be running (by design), or there is an option to stop the process once WW terminates (AFAIK). The next step: We can pick either my approach or your approach and just implement it and we can later on re-evaluate if the decision was good or not (in my mind, that should be a minimal change to change "graceful" -> "crash" or vice versa) We should not really use files for storage purposes. History has IMO shown good enough that it has limitations. A database is more robust and allows to do more with the data. Footnotes
|
The rule for unmanaged exceptions is to crash. Also, the reference to TerminateService looks ni good. |
Do it as we do now and did in the past years. Signal for termination. We can only crash if all the issues mentioned above like File writings are made crash-safe - not the way around. But I guess that's not worth it. I suggest starting with graceful and investigating the reorg stuff as well. |
# Conflicts: # WalletWasabi.Fluent.Desktop/Program.cs
a12d871
to
b0fa8f4
Compare
Ok so well it has been a mess :D I'm pretty convinced myself this PR was not the way to go, but as I understood we chose to go with it. Maybe it will just be temporary? Master is merged, IMO this PR is ready to go, but still a few questions @molnard :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK
No, it is the final solution. This argument caused a major distraction in our team (typical minefield category). I will try to avoid getting back to this for a while by making more powerful/forceful decisions for the sake of team integrity.
It is OK.
No. I want graceful shutdowns.
It is fine. |
Thanks @molnard for part of the code and @adamPetho for the help
This PR provides a way to crash the software when we reach an unrecoverable exception and display the crash reporter. The PR also uses this new function when the
WalletFilterProcessor
cannot work anymoreOn
Fluent.Desktop
, the exception is logged twice whenWalletFilterProcessor
crashes, but I think it's ok because on theDaemon
it will only be logged once.