ERRCODE::error: throw std::runtime_error instead of abort() #4420

bertsky · 2025-05-07T14:23:53Z

Related to #2990 (although it does not replace the exit() calls – yet).

@kba will provide an easily reproducable example of such failing assertions. (My current case is more special / harder to reconstruct.)

But regardless of the concrete issues that cause these failed assertions, which of course need to be addressed themselves, this makes Tesseract usable productively as a library. (You can catch these exceptions.)

(No idea what's the matter with CI here, btw.)

stweil · 2025-05-07T16:49:34Z

I have several comments:

What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?
This PR changes the behaviour of the tesseract executable for fatal errors. All releases until now could trigger a core dump which could be analyzed post mortem. It also returns a specific exit code (134 for abort, 139 for SIGSEGV). The new code raises an exception which is caught in main(), prints a message to stderr and returns the exit code 1.

So these changes are API changes. Maybe we should postpone.them to a release 6.

stweil · 2025-05-07T16:52:09Z

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

stweil

I think it is very important that tesseract runtime errors (like for example those detected by assertions) abort the program and create a usable stack trace in core dumps or debug sessions.

My use case is a mass production of OCR with thousands of tesseract runs. Ideally all runs work without any problem. But if there are single runs with runtime issues, such runs should be able to create core dumps which can be examined later.

With this PR my use case no longer works.

stweil · 2025-05-07T18:54:41Z

1. What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?

Meanwhile I did some tests on macOS and Linux. By default, the std::runtime_error exception is called by the C++ runtime library which calls abort(). So applications which don't catch C++ exceptions abort like before which is desired.

stweil · 2025-05-07T18:59:09Z

src/ccutil/errcode.cpp

+      // This used to trigger a segfault or abort();
+      // However, at least for library use, only exceptions should be acceptable.
+      // Even in the standalone application case, exceptions are better,
+      // because the default handler will print the message along with the stack trace.


Will it? It does not print a stack trace in my test on macOS.

On Linux, I get:

ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer terminate called after throwing an instance of 'std::runtime_error' what(): ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer Abgebrochen (Speicherabzug geschrieben)

That's not a stack trace. And please rebase your code with all recent updates from the main branch. Then you will no longer get a core dump.

stweil · 2025-05-07T19:01:31Z

src/ccutil/errcode.cpp

+      // This used to trigger a segfault or abort();
+      // However, at least for library use, only exceptions should be acceptable.
+      // Even in the standalone application case, exceptions are better,
+      // because the default handler will print the message along with the stack trace.


Suggested change

// This used to trigger a segfault or abort();

// However, at least for library use, only exceptions should be acceptable.

// Even in the standalone application case, exceptions are better,

// because the default handler will print the message along with the stack trace.

// Abort unless std::runtime_error is caught by the application.

Yes, that is more precise. But I'd like to also comment what used to be the behaviour.

There used to be comments about the history of some code parts in the past, but most of them are removed in the recent code. I don't think that new comments of this kind are helpful.

bertsky · 2025-05-07T19:13:36Z

What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?

Frankly, I don't know. I suppose we could differentiate C and C++ behaviour via ifdefs, and revert to the old (abort) implementation in plain C.

This PR changes the behaviour of the tesseract executable for fatal errors. All releases until now could trigger a core dump which could be analyzed post mortem. It also returns a specific exit code (134 for abort, 139 for SIGSEGV). The new code raises an exception which is caught in main(), prints a message to stderr and returns the exit code 1.

If you want that for the standalone CLI, I guess that would simply be doable by installing the right kind of exception handler somewhere in tesseractmain.cpp, or even catch exceptions explicitly and call abort() that way.

But I'm surprised by your statements regarding the new behaviour. For me the default handler on the CLI still delegates to abort(), which still writes a core dump (as long as ulimit -c is set appropriately).

So these changes are API changes. Maybe we should postpone.them to a release 6.

That is an awkward position IMO. The API itself does not even change here. The only change is in the case of errors – where prior to the change, nothing could be done anymore. The changes now make errors actionable (via std exceptions).

So in a word, nothing that used to work breaks here, this only fixes what happens if things don't work.

bertsky · 2025-05-07T19:18:47Z

BTW, doing abort() directly is also worse than going via exception handlers, even for the CLI, because it does not call destructors (and thus, does not shut down streams and sockets).

I think it is very important that tesseract runtime errors (like for example those detected by assertions) abort the program and create a usable stack trace in core dumps or debug sessions.

Again, that should still be doable for the standalone CLI application. In fact, it should still be the default behaviour.

My use case is a mass production of OCR with thousands of tesseract runs. Ideally all runs work without any problem. But if there are single runs with runtime issues, such runs should be able to create core dumps which can be examined later.

Your use case – with the standalone CLI – should still work.

Other productive use cases invole Tesseract as a library, where it is absolutely vital that the application can do its own error handling (instead of getting itself killed from within the library).

stweil · 2025-05-07T19:25:59Z

Other productive use cases invole Tesseract as a library, where it is absolutely vital that the application can do its own error handling (instead of getting itself killed from within the library).

That's the normal nature of fatal errors. As their name says, they are not vital. Even without the few abort() calls in the release code of Tesseract, there remain numerous ways to cause other fatal runtime errors (segmentation faults, floating point exceptions and more).

Many (maybe even most?) C and C++ libraries have interfaces which can trigger fatal runtime errors. The Tesseract library is not a special case.

bertsky · 2025-05-07T20:10:28Z

That's the normal nature of fatal errors. As their name says, they are not vital.

That statement does not make any sense to me.

Even without the few abort() calls in the release code of Tesseract, there remain numerous ways to cause other fatal runtime errors (segmentation faults, floating point exceptions and more).

Sure, there will always remain other levels of failure, enforced by the OS. Again, this is besides the point. We are talking about preventable failures here – errors which the application could recover from (for example by skipping the task, perhaps re-initializing Tesseract). Which is precisely what my Python use-case (error handling of ocrd-tesserocr) showed.

egorpugin · 2025-05-07T20:36:23Z

@stweil

My use case is a mass production of OCR with thousands of tesseract runs.

I added global catch(std::exception) and catch (...) into tesseract main() not so long ago.

If you need a binary without them to make exceptions uncaught, considering your huge volume, it is probably worth it for you to use tess as library (libtess) and make custom binary wrapper without catching exceptions.

For stacktraces C++ adds exceptions with stacktraces IIRC in C++23 or C++26.

C code should not be considered since libtess is C++.
Removing our own C layer can fix the possible issue. Or every C api wrapper must have catch all snippets.

stweil · 2025-05-08T04:55:35Z

I added global catch(std::exception) and catch (...) into tesseract main() not so long ago.

I know. That's the reason why @bertsky's code now only prints an error message with tesseract instead of the desired aborting. His PR is based on an older main branch without your try / catch block, so he could not notice this.

stweil · 2025-06-17T07:35:24Z

@kba will provide an easily reproducable example of such failing assertions. (My current case is more special / harder to reconstruct.)

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

Did I miss the examples? @bertsky, could you please send me a link?

bertsky · 2025-06-17T10:32:00Z

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

Did I miss the examples? @bertsky, could you please send me a link?

@kba promised to provide an example. (Mine would be less generic.)

So what about the above discussion? Should the new global catch in main be changed in some way as well? (For example, by making it configurable at compile-time)

ERRCODE::error: throw std::runtime_error instead of abort()

61517cf

bertsky mentioned this pull request May 7, 2025

convert more exceptions sirfz/tesserocr#365

Open

egorpugin approved these changes May 7, 2025

View reviewed changes

stweil requested changes May 7, 2025

View reviewed changes

stweil reviewed May 7, 2025

View reviewed changes

ERRCODE::error: throw std::runtime_error instead of abort() #4420

Are you sure you want to change the base?

ERRCODE::error: throw std::runtime_error instead of abort() #4420

Conversation

bertsky commented May 7, 2025

Uh oh!

stweil commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stweil commented May 7, 2025

Uh oh!

stweil left a comment

Choose a reason for hiding this comment

Uh oh!

stweil commented May 7, 2025

Uh oh!

stweil May 7, 2025

Choose a reason for hiding this comment

Uh oh!

bertsky May 7, 2025

Choose a reason for hiding this comment

Uh oh!

stweil May 7, 2025

Choose a reason for hiding this comment

Uh oh!

stweil May 7, 2025

Choose a reason for hiding this comment

Uh oh!

bertsky May 7, 2025

Choose a reason for hiding this comment

Uh oh!

stweil May 7, 2025

Choose a reason for hiding this comment

Uh oh!

bertsky commented May 7, 2025

Uh oh!

bertsky commented May 7, 2025

Uh oh!

stweil commented May 7, 2025

Uh oh!

bertsky commented May 7, 2025

Uh oh!

egorpugin commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stweil commented May 8, 2025

Uh oh!

stweil commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bertsky commented Jun 17, 2025

Uh oh!

Uh oh!

stweil commented May 7, 2025 •

edited

Loading

egorpugin commented May 7, 2025 •

edited

Loading

stweil commented Jun 17, 2025 •

edited

Loading