Skip to content

ERRCODE::error: throw std::runtime_error instead of abort() #4420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented May 7, 2025

Related to #2990 (although it does not replace the exit() calls – yet).

@kba will provide an easily reproducable example of such failing assertions. (My current case is more special / harder to reconstruct.)

But regardless of the concrete issues that cause these failed assertions, which of course need to be addressed themselves, this makes Tesseract usable productively as a library. (You can catch these exceptions.)

(No idea what's the matter with CI here, btw.)

@stweil
Copy link
Member

stweil commented May 7, 2025

I have several comments:

  1. What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?
  2. This PR changes the behaviour of the tesseract executable for fatal errors. All releases until now could trigger a core dump which could be analyzed post mortem. It also returns a specific exit code (134 for abort, 139 for SIGSEGV). The new code raises an exception which is caught in main(), prints a message to stderr and returns the exit code 1.

So these changes are API changes. Maybe we should postpone.them to a release 6.

@stweil
Copy link
Member

stweil commented May 7, 2025

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

Copy link
Member

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is very important that tesseract runtime errors (like for example those detected by assertions) abort the program and create a usable stack trace in core dumps or debug sessions.

My use case is a mass production of OCR with thousands of tesseract runs. Ideally all runs work without any problem. But if there are single runs with runtime issues, such runs should be able to create core dumps which can be examined later.

With this PR my use case no longer works.

@stweil
Copy link
Member

stweil commented May 7, 2025

1. What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?

Meanwhile I did some tests on macOS and Linux. By default, the std::runtime_error exception is called by the C++ runtime library which calls abort(). So applications which don't catch C++ exceptions abort like before which is desired.

// This used to trigger a segfault or abort();
// However, at least for library use, only exceptions should be acceptable.
// Even in the standalone application case, exceptions are better,
// because the default handler will print the message along with the stack trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it? It does not print a stack trace in my test on macOS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux, I get:

ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer
terminate called after throwing an instance of 'std::runtime_error'
  what():  ELIST_ITERATOR::forward:Error:List would have returned a nullptr data pointer
Abgebrochen (Speicherabzug geschrieben)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a stack trace. And please rebase your code with all recent updates from the main branch. Then you will no longer get a core dump.

Comment on lines +73 to +76
// This used to trigger a segfault or abort();
// However, at least for library use, only exceptions should be acceptable.
// Even in the standalone application case, exceptions are better,
// because the default handler will print the message along with the stack trace.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// This used to trigger a segfault or abort();
// However, at least for library use, only exceptions should be acceptable.
// Even in the standalone application case, exceptions are better,
// because the default handler will print the message along with the stack trace.
// Abort unless std::runtime_error is caught by the application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is more precise. But I'd like to also comment what used to be the behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be comments about the history of some code parts in the past, but most of them are removed in the recent code. I don't think that new comments of this kind are helpful.

@bertsky
Copy link
Contributor Author

bertsky commented May 7, 2025

  1. What will be the effect of throwing exceptions for C applications and other applications which depend on the C API?

Frankly, I don't know. I suppose we could differentiate C and C++ behaviour via ifdefs, and revert to the old (abort) implementation in plain C.

  1. This PR changes the behaviour of the tesseract executable for fatal errors. All releases until now could trigger a core dump which could be analyzed post mortem. It also returns a specific exit code (134 for abort, 139 for SIGSEGV). The new code raises an exception which is caught in main(), prints a message to stderr and returns the exit code 1.

If you want that for the standalone CLI, I guess that would simply be doable by installing the right kind of exception handler somewhere in tesseractmain.cpp, or even catch exceptions explicitly and call abort() that way.

But I'm surprised by your statements regarding the new behaviour. For me the default handler on the CLI still delegates to abort(), which still writes a core dump (as long as ulimit -c is set appropriately).

So these changes are API changes. Maybe we should postpone.them to a release 6.

That is an awkward position IMO. The API itself does not even change here. The only change is in the case of errors – where prior to the change, nothing could be done anymore. The changes now make errors actionable (via std exceptions).

So in a word, nothing that used to work breaks here, this only fixes what happens if things don't work.

@bertsky
Copy link
Contributor Author

bertsky commented May 7, 2025

BTW, doing abort() directly is also worse than going via exception handlers, even for the CLI, because it does not call destructors (and thus, does not shut down streams and sockets).

I think it is very important that tesseract runtime errors (like for example those detected by assertions) abort the program and create a usable stack trace in core dumps or debug sessions.

Again, that should still be doable for the standalone CLI application. In fact, it should still be the default behaviour.

My use case is a mass production of OCR with thousands of tesseract runs. Ideally all runs work without any problem. But if there are single runs with runtime issues, such runs should be able to create core dumps which can be examined later.

Your use case – with the standalone CLI – should still work.

Other productive use cases invole Tesseract as a library, where it is absolutely vital that the application can do its own error handling (instead of getting itself killed from within the library).

@stweil
Copy link
Member

stweil commented May 7, 2025

Other productive use cases invole Tesseract as a library, where it is absolutely vital that the application can do its own error handling (instead of getting itself killed from within the library).

That's the normal nature of fatal errors. As their name says, they are not vital. Even without the few abort() calls in the release code of Tesseract, there remain numerous ways to cause other fatal runtime errors (segmentation faults, floating point exceptions and more).

Many (maybe even most?) C and C++ libraries have interfaces which can trigger fatal runtime errors. The Tesseract library is not a special case.

@bertsky
Copy link
Contributor Author

bertsky commented May 7, 2025

That's the normal nature of fatal errors. As their name says, they are not vital.

That statement does not make any sense to me.

Even without the few abort() calls in the release code of Tesseract, there remain numerous ways to cause other fatal runtime errors (segmentation faults, floating point exceptions and more).

Sure, there will always remain other levels of failure, enforced by the OS. Again, this is besides the point. We are talking about preventable failures here – errors which the application could recover from (for example by skipping the task, perhaps re-initializing Tesseract). Which is precisely what my Python use-case (error handling of ocrd-tesserocr) showed.

@egorpugin
Copy link
Contributor

egorpugin commented May 7, 2025

@stweil

My use case is a mass production of OCR with thousands of tesseract runs.

I added global catch(std::exception) and catch (...) into tesseract main() not so long ago.

If you need a binary without them to make exceptions uncaught, considering your huge volume, it is probably worth it for you to use tess as library (libtess) and make custom binary wrapper without catching exceptions.


For stacktraces C++ adds exceptions with stacktraces IIRC in C++23 or C++26.


C code should not be considered since libtess is C++.
Removing our own C layer can fix the possible issue. Or every C api wrapper must have catch all snippets.

@stweil
Copy link
Member

stweil commented May 8, 2025

I added global catch(std::exception) and catch (...) into tesseract main() not so long ago.

I know. That's the reason why @bertsky's code now only prints an error message with tesseract instead of the desired aborting. His PR is based on an older main branch without your try / catch block, so he could not notice this.

@stweil
Copy link
Member

stweil commented Jun 17, 2025

@kba will provide an easily reproducable example of such failing assertions. (My current case is more special / harder to reconstruct.)

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

Did I miss the examples? @bertsky, could you please send me a link?

@bertsky
Copy link
Contributor Author

bertsky commented Jun 17, 2025

@kba, @bertsky, please open issues for reproducable examples of failing assertions.

Did I miss the examples? @bertsky, could you please send me a link?

@kba promised to provide an example. (Mine would be less generic.)

So what about the above discussion? Should the new global catch in main be changed in some way as well? (For example, by making it configurable at compile-time)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants