Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running epubcheck on an epub with a large opf file (13k+ lines) results in a StackOverflowError #1358

Closed
myork opened this issue Nov 14, 2022 · 3 comments
Assignees
Labels
status: completed Work completed, can be closed type: bug The issue describes a bug
Milestone

Comments

@myork
Copy link

myork commented Nov 14, 2022

When running epubcheck on an epub with a large opf file (13k+ lines, 841k) it throws the following StackOverflowError. Unfortunately I can't share the epub since it isn't public content.

epubcheck-4.2.6$ java -jar epubcheck.jar foobar.epub 
Validating using EPUB version 3.2 rules.
java.lang.StackOverflowError
	at java.base/java.util.LinkedList.unlinkFirst(LinkedList.java:178)
	at java.base/java.util.LinkedList.poll(LinkedList.java:679)
	at com.adobe.epubcheck.opf.XRefChecker.checkReadingOrder(XRefChecker.java:538)
	at com.adobe.epubcheck.opf.XRefChecker.checkReadingOrder(XRefChecker.java:599)
	at com.adobe.epubcheck.opf.XRefChecker.checkReadingOrder(XRefChecker.java:599)

I've seen that the recommended fix is to increase xss, but is there a way to change epubcheck to not require as much stack size?

This particular opf file has 6000+ <itemref> elements in the <spine>, and if I'm reading correctly it looks like these are evaluated using a recursive algorithm.

Question: Would changing the above to use iteration instead of recursion potentially decrease the required stack size?

@bduga
Copy link

bduga commented Nov 14, 2022

We are having a similar issue, which is blocking some content ingestion. Looking at the code, the iterative version seems to be about one line shorter - I don't understand why this is using recursion at all. Java is not good at tail recursion optimizations. @rdeltour is this something we can look at in 5.0? The fix seems reasonably straightforward, do you want a PR?

[Edit]: Oh, the reporter is us :) Hi Matt!

@bduga
Copy link

bduga commented Nov 14, 2022

FWIW, I think the recursion can be replaced with for (URLReference ref : references) {...}. I think that should work with a Queue.

@rdeltour
Copy link
Member

hi @myork and @bduga. Thanks for the report!
It seems this is the issue raised in PR #1356 by @kalaspuffar.

I'll look into it for v5.0.0, but need to merge the ongoing work first, as this code has been refactored and merging it early will create conflicts.

Unfortunately I can't share the epub since it isn't public content.

Would you be able to send the EPUB privately? I can possibly sign an NDA if needed.

@rdeltour rdeltour self-assigned this Nov 17, 2022
@rdeltour rdeltour added this to the v5.0.0-beta milestone Nov 17, 2022
@rdeltour rdeltour added type: bug The issue describes a bug status: accepted Ready to be further processed status: in progress The issue is being implemented by the development team and removed status: accepted Ready to be further processed labels Nov 17, 2022
@rdeltour rdeltour added status: completed Work completed, can be closed and removed status: in progress The issue is being implemented by the development team labels Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: completed Work completed, can be closed type: bug The issue describes a bug
Projects
None yet
Development

No branches or pull requests

3 participants