-
Notifications
You must be signed in to change notification settings - Fork 10k
ERRCODE::error: throw std::runtime_error instead of abort() #4420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I have several comments:
So these changes are API changes. Maybe we should postpone.them to a release 6. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is very important that tesseract
runtime errors (like for example those detected by assertions) abort the program and create a usable stack trace in core dumps or debug sessions.
My use case is a mass production of OCR with thousands of tesseract
runs. Ideally all runs work without any problem. But if there are single runs with runtime issues, such runs should be able to create core dumps which can be examined later.
With this PR my use case no longer works.
Meanwhile I did some tests on macOS and Linux. By default, the std::runtime_error exception is called by the C++ runtime library which calls abort(). So applications which don't catch C++ exceptions abort like before which is desired. |
// This used to trigger a segfault or abort(); | ||
// However, at least for library use, only exceptions should be acceptable. | ||
// Even in the standalone application case, exceptions are better, | ||
// because the default handler will print the message along with the stack trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it? It does not print a stack trace in my test on macOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Linux, I get:
ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer
terminate called after throwing an instance of 'std::runtime_error'
what(): ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer
Abgebrochen (Speicherabzug geschrieben)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a stack trace. And please rebase your code with all recent updates from the main branch. Then you will no longer get a core dump.
// This used to trigger a segfault or abort(); | ||
// However, at least for library use, only exceptions should be acceptable. | ||
// Even in the standalone application case, exceptions are better, | ||
// because the default handler will print the message along with the stack trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This used to trigger a segfault or abort(); | |
// However, at least for library use, only exceptions should be acceptable. | |
// Even in the standalone application case, exceptions are better, | |
// because the default handler will print the message along with the stack trace. | |
// Abort unless std::runtime_error is caught by the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is more precise. But I'd like to also comment what used to be the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There used to be comments about the history of some code parts in the past, but most of them are removed in the recent code. I don't think that new comments of this kind are helpful.
Frankly, I don't know. I suppose we could differentiate C and C++ behaviour via ifdefs, and revert to the old (
If you want that for the standalone CLI, I guess that would simply be doable by installing the right kind of exception handler somewhere in tesseractmain.cpp, or even catch exceptions explicitly and call But I'm surprised by your statements regarding the new behaviour. For me the default handler on the CLI still delegates to
That is an awkward position IMO. The API itself does not even change here. The only change is in the case of errors – where prior to the change, nothing could be done anymore. The changes now make errors actionable (via std exceptions). So in a word, nothing that used to work breaks here, this only fixes what happens if things don't work. |
BTW, doing
Again, that should still be doable for the standalone CLI application. In fact, it should still be the default behaviour.
Your use case – with the standalone CLI – should still work. Other productive use cases invole Tesseract as a library, where it is absolutely vital that the application can do its own error handling (instead of getting itself killed from within the library). |
That's the normal nature of fatal errors. As their name says, they are not vital. Even without the few abort() calls in the release code of Tesseract, there remain numerous ways to cause other fatal runtime errors (segmentation faults, floating point exceptions and more). Many (maybe even most?) C and C++ libraries have interfaces which can trigger fatal runtime errors. The Tesseract library is not a special case. |
That statement does not make any sense to me.
Sure, there will always remain other levels of failure, enforced by the OS. Again, this is besides the point. We are talking about preventable failures here – errors which the application could recover from (for example by skipping the task, perhaps re-initializing Tesseract). Which is precisely what my Python use-case (error handling of ocrd-tesserocr) showed. |
I added global If you need a binary without them to make exceptions uncaught, considering your huge volume, it is probably worth it for you to use tess as library (libtess) and make custom binary wrapper without catching exceptions. For stacktraces C++ adds exceptions with stacktraces IIRC in C++23 or C++26. C code should not be considered since libtess is C++. |
I know. That's the reason why @bertsky's code now only prints an error message with |
@kba promised to provide an example. (Mine would be less generic.) So what about the above discussion? Should the new global catch in |
Related to #2990 (although it does not replace the
exit()
calls – yet).@kba will provide an easily reproducable example of such failing assertions. (My current case is more special / harder to reconstruct.)
But regardless of the concrete issues that cause these failed assertions, which of course need to be addressed themselves, this makes Tesseract usable productively as a library. (You can catch these exceptions.)
(No idea what's the matter with CI here, btw.)