Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap space error while processing multiple files #992

Closed
bitsgalore opened this issue May 29, 2017 · 3 comments
Closed

Heap space error while processing multiple files #992

bitsgalore opened this issue May 29, 2017 · 3 comments
Assignees
Labels
bug A product defect that needs fixing P1 High priority issues to be scheduled in the upcoming release
Milestone

Comments

@bitsgalore
Copy link

bitsgalore commented May 29, 2017

Dev Effort

1D

Description

While trying to process a directory with 24 PDFs (total size 110 MB; size of largest PDF about 50 MB), veraPDF crashed with a Java heap space error. Here's the command-line I used:

~/verapdf/verapdf -x --policyfile /home/johan/pdfPolicyVeraPDF/schemas/demo-policy.sch ~/pdfAcrobatEngineering/multimedia/*.pdf > multimedia.xml

Errors/warnings sent to stderr:

WARNING: The JVM appears to have run out of memory
java.lang.OutOfMemoryError: Java heap space
	at com.adobe.xmp.impl.ByteBuffer.ensureCapacity(ByteBuffer.java:322)
	at com.adobe.xmp.impl.ByteBuffer.<init>(ByteBuffer.java:88)
	at com.adobe.xmp.impl.XMPMetaParser.parseXmlFromInputStream(XMPMetaParser.java:162)
	at com.adobe.xmp.impl.XMPMetaParser.parseXml(XMPMetaParser.java:128)
	at com.adobe.xmp.impl.XMPMetaParser.parse(XMPMetaParser.java:77)
	at com.adobe.xmp.XMPMetaFactory.parse(XMPMetaFactory.java:100)
	at com.adobe.xmp.impl.VeraPDFMeta.parse(VeraPDFMeta.java:75)
	at org.verapdf.gf.model.GFModelParser.obtainFlavour(GFModelParser.java:128)
	at org.verapdf.gf.model.GFModelParser.<init>(GFModelParser.java:82)
	at org.verapdf.gf.model.GFModelParser.createModelWithFlavour(GFModelParser.java:104)
	at org.verapdf.pdfa.VeraFoundry.createParser(VeraFoundry.java:80)
	at org.verapdf.pdfa.VeraFoundry.createParser(VeraFoundry.java:86)
	at org.verapdf.processor.ProcessorImpl.process(ProcessorImpl.java:113)
	at org.verapdf.processor.BatchFileProcessor.processItem(BatchFileProcessor.java:98)
	at org.verapdf.processor.BatchFileProcessor.processList(BatchFileProcessor.java:74)
	at org.verapdf.processor.AbstractBatchProcessor.process(AbstractBatchProcessor.java:102)
	at org.verapdf.cli.VeraPdfCliProcessor.processFiles(VeraPdfCliProcessor.java:157)
	at org.verapdf.cli.VeraPdfCliProcessor.processPaths(VeraPdfCliProcessor.java:131)
	at org.verapdf.cli.VeraPdfCli.main(VeraPdfCli.java:99)

All output of the PDFs that were processed before the crash were lost in the process. Stdout:

The JVM appears to have run out of memory
Memory Use: 529M/1975M
To increase the memory available to the JVM please assign the JAVA_OPTS environment variable.
The examples below increase the maximum heap available to the JVM to 2GB:
 - Mac or Linux users: 
   export JAVA_OPTS="-Xmx2048m"
 - Windows users: 
   SET JAVA_OPTS="-Xmx2048m"

It's not clear to me if the error arises from output that is kept in the buffer or something else. Using VeraPDF 1.4.7.

@bitsgalore
Copy link
Author

Update: after some additional testing I'm now pretty sure the error is not related to the output buffer, but occurs for one fairly large (49 MB) PDF. It's the VolvoS40V50-Full.pdf file which used to be available from the Adobe Acrobat Engineering website (before that site was taken down):

http://acroeng.adobe.com/Test_Files/classic_multimedia//VolvoS40V50-Full.pdf

I have a local copy of the file; happy to send it over if that helps (if so just drop me a line at twitter@bitsgalore.org or my KB email address).

@ghost ghost transferred this issue from veraPDF/veraPDF-apps Jan 3, 2019
@ghost ghost added bug A product defect that needs fixing P1 High priority issues to be scheduled in the upcoming release labels Jan 3, 2019
@ghost ghost added this to the v1.14-m4 milestone Jan 3, 2019
@carlwilson carlwilson self-assigned this Apr 5, 2019
@carlwilson
Copy link
Contributor

Hi @bitsgalore I've tested this on the latest development install against the adobe test corpus including the Volvo file. It now takes approx 2 seconds to process straightforwardly from a standard installation with JAVA_OPTS unassigned. I'm happy for somebody else to test it if you want to, I'll give it a go on the Mac at least just to see. If that gives a similar result I'm ready to close this.

@carlwilson
Copy link
Contributor

Update is that the Mac managed this in 5.86 seconds with no memory issues from a standard install. Ready to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P1 High priority issues to be scheduled in the upcoming release
Projects
None yet
Development

No branches or pull requests

2 participants