Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tendermint breaks when BeginBlock (any ABCI call?) in ABCI server is not returning #1993

Closed
srmo opened this issue Jul 17, 2018 · 9 comments
Closed
Labels
C:abci Component: Application Blockchain Interface T:bug Type Bug (Confirmed)
Milestone

Comments

@srmo
Copy link
Contributor

srmo commented Jul 17, 2018

BUG REPORT

Tendermint version
0.22.4-c64a3c74 (self compiled: make dist)

ABCI app
JavaCounter - part of jABCI https://github.com/jTendermint/jabci

Environment

  • OSX 10.13.6
  • go version go1.10.2 darwin/amd64

What happened:
I've forced the BeginBlock implementation of the ABCI app to throw an exception instead of returning a ResponseBeginBlock. This was implemented such that the exception is thrown after the third block.
After that, no more Blocks are being built, broadcastTx etc still working.
No messages in tendermint log (default loglevel) could be found, hinting at a problem with beginBlock.
Ctrl+Cing tendermint also wasn't possible, i.e. it was just hanging there after sending the signal.

What you expected to happen:
Tendermint should timeout on an ABCI request, maybe?
Expected this to be fixed via
#1891
#1890

How to reproduce it (as minimally and precisely as possible):
Just throw an exception in your ABCI app on BeginBlock or DeliverTx

Logs (you can paste a small part showing an error or link a pastebin, gist, etc. containing more of the log file):
https://gist.github.com/srmo/a647b5db2a3c1a5268edc0ec62006b08
Contains the full flow, throwing exception at height 3->4
Then doing a broadcast_tx
Then doing Ctrl+C
Then doing multiple Ctrl+C

Config (you can paste only the changes you've made):
only changed loglevel to "*:debug"

@srmo
Copy link
Contributor Author

srmo commented Jul 17, 2018

One could think about guarding the ABCI app against exceptions, but then the question is what to return on ABCI calls.
It doesn't seem to make sense to return an error ABCI response on BeginBlock, right?
Maybe the ABCI app just have to shutdown in such cases?

@melekes melekes added T:bug Type Bug (Confirmed) C:abci Component: Application Blockchain Interface labels Jul 17, 2018
@melekes
Copy link
Contributor

melekes commented Jul 17, 2018

It doesn't seem to make sense to return an error ABCI response on BeginBlock, right?

Why not? Important thing is that all instances should return error. If exception is caused by non-deterministic source, you need to eliminate it (if possible).

@srmo
Copy link
Contributor Author

srmo commented Jul 17, 2018

@melekes Yeah, I guess I was referring to the non-deterministic part. Those probably can't be solved via ABCI.

@melekes
Copy link
Contributor

melekes commented Jul 18, 2018

NOTE: there is ResponseException, which is supposed to be thrown by the app. But that doesn't really solve the problem of app not behaving properly and tendermint not saying anything about it.

@xla xla added this to the launch milestone Jul 18, 2018
@ebuchman ebuchman modified the milestones: launch, post-launch Jul 23, 2018
@ebuchman
Copy link
Contributor

@srmo thanks for raising this. I think @ethanfrey found and fixed a related issue recently but perhaps it wasn't comprehensive enough.

Unfortunately, we need to de-prioritize all non-Cosmos-launch related issues right now as we try to focus on getting this thing launched, so that means issues relating to external ABCI apps will be put on our back log. Of course, we're still happy to accept PRs for fixes.

Thanks for understanding.

@ethanfrey
Copy link
Contributor

What abci driver are you using?
In process, socket, or grpc?

My fix was only on the socket client. I'd like to see the launch scripts

@srmo
Copy link
Contributor Author

srmo commented Jul 24, 2018

@ebuchman sure. It's not really breaking anything on our side and I agree with @melekes here. The basic use case is that the ABCI app has to handle such exceptions either way and such non-deterministic errors are usually handled manually per node/abci app.

So this issue is rather a reminder to consider more convenience on tendermint side (like the nuissance of needing a sigkill) or to generally think through possible implications and how to work with them.

@srmo
Copy link
Contributor Author

srmo commented Jul 24, 2018

@ethanfrey socket

@tessr tessr added this to Untriaged 🥚 in Tendermint Core Project Board via automation Nov 16, 2020
@melekes
Copy link
Contributor

melekes commented Nov 24, 2020

I believe this is fixed now (i.e. if ABCI app dies, Tendermint will exit too with the expectation that process supervisor will restart both failed processes).

@melekes melekes closed this as completed Nov 24, 2020
Tendermint Core Project Board automation moved this from Untriaged 🥚 to Done 🎉 Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:abci Component: Application Blockchain Interface T:bug Type Bug (Confirmed)
Projects
No open projects
Development

No branches or pull requests

6 participants