Split crashing for PDF with 2,000 pages #344

tomasgreif · 2019-02-27T00:02:26Z

C:\Users\tomas.greif>C:\Users\xxx\Desktop\split\sejda-console-3.2.67\bin\sejda-console splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
Configuring Sejda 3.2.67
Document root element "sejda", must match DOCTYPE root "null".
Document is invalid: no grammar found.
Starting execution with arguments: 'splitbyevery -f C:\Users\xxx\Desktop\split\01-04-18 2.pdf -o C:\Users\xxx\Desktop\split\out -n 1 -p [CURRENTPAGE#####]'
Java version: '1.8.0_191'
Validating parameters.
Starting task (org.sejda.impl.sambox.SplitByPageNumbersTask@62379589) execution.
Opening C:\Users\xxx\Desktop\split\01-04-18 2.pdf
Found 0 inherited images and 0 inherited fonts potentially unused
Starting split by page numbers for org.sejda.model.parameter.SplitByEveryXPagesParameters@50a638b5[step=1,optimizationPolicy=AUTO,discardOutline=false,outputPrefix=[CURRENTPAGE#####],output=org.sejda.model.output.FileOrDirectoryTaskOutput@1189dd52[C:\Users\tomas.greif\Desktop\split\out],sourceList=[C:\Users\xxx\Desktop\split\01-04-18 2.pdf],compress=true,version=VERSION_1_6,existingOutputPolicy=FAIL,lenient=false,1]
Starting split at page 1 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out.sejdaTmp8635042219708410613.tmp
Task progress: 0% done
Filtering annotations
Skipped acroform merge, nothing to merge
Ending split at page 1 of the original document, generated document size is 11.68 KB
Starting split at page 2 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out.sejdaTmp4677890173653656260.tmp
Exception in thread "main" java.lang.StackOverflowError
at java.util.Spliterator.getExactSizeIfKnown(Unknown Source)
at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.ReferencePipeline.collect(Unknown Source)
at org.sejda.sambox.pdmodel.PDPageTree.getKids(PDPageTree.java:172)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:318)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)

torakiki · 2019-02-28T15:18:59Z

You can try increasing your Xms size. Try setting the env variable JAVA_OPTS to something like -Xms1024m (1gb) or even more, depending on how much memory your machine has and if the task keeps failing.
If that doesn't work maybe you could provide the document and we can try to take a look.

tomasgreif · 2019-02-28T18:56:32Z

I tried both setting JAVA_OPTS and modifying the sejda-console.bat

Console.bat change
%JAVACMD% %JAVA_OPTS% -Dfile.encoding=UTF8 -Xms2G -Xmx10G -classpath %CLASSPATH% -Dapp.name="sejda-console" -Dapp.repo="%REPO%" -Dapp.home="%BASEDIR%" -Dbasedir="%BASEDIR%" org.sejda.cli.Main %CMD_LINE_ARGS%
call:
C:\Users\tomas.greif>C:\Users\tomas.greif\Desktop\split\sejda-console-3.2.67\bin\sejda-console.bat splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
result:
Java version: '1.8.0_191'
Validating parameters.
Starting task (org.sejda.impl.sambox.SplitByPageNumbersTask@62379589) execution.
Opening C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf
Found 0 inherited images and 0 inherited fonts potentially unused
Starting split by page numbers for org.sejda.model.parameter.SplitByEveryXPagesParameters@50a638b5[step=1,optimizationPolicy=AUTO,discardOutline=false,outputPrefix=[CURRENTPAGE#####],output=org.sejda.model.output.FileOrDirectoryTaskOutput@1189dd52[C:\Users\tomas.greif\Desktop\split\out],sourceList=[C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf],compress=true,version=VERSION_1_6,existingOutputPolicy=FAIL,lenient=false,1]
Starting split at page 1 of the original document
Created output temporary buffer C:\Users\tomas.greif\Desktop\split\out.sejdaTmp3357657715678415443.tmp
Task progress: 0% done
Filtering annotations
Skipped acroform merge, nothing to merge
Ending split at page 1 of the original document, generated document size is 11.68 KB
Starting split at page 2 of the original document
Created output temporary buffer C:\Users\tomas.greif\Desktop\split\out.sejdaTmp8549402881702398888.tmp
Exception in thread "main" java.lang.StackOverflowError
at java.util.Spliterators$IteratorSpliterator.estimateSize(Unknown Source)
at java.util.Spliterator.getExactSizeIfKnown(Unknown Source)
at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.ReferencePipeline.collect(Unknown Source)
at org.sejda.sambox.pdmodel.PDPageTree.getKids(PDPageTree.java:172)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:318)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)

For the JAVA_OPTS I am getting:
C:\Users\tomas.greif>C:\Users\tomas.greif\Desktop\split\sejda-console-3.2.67\bin\sejda-console.bat splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
Invalid maximum heap size: -Xmx4G -Xms2G
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Unfortunately I am unable to share the file as it contains some sensitive data. Is there other way I should set the xmx/xms or maybe I can run some jar file directly?

As a note, my laptop has 32GB RAM and java version is:
C:\Users\tomas.greif>java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

torakiki · 2019-03-01T15:39:49Z

Could you try with -Xss8m ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split crashing for PDF with 2,000 pages #344

Split crashing for PDF with 2,000 pages #344

tomasgreif commented Feb 27, 2019

torakiki commented Feb 28, 2019 •

edited

tomasgreif commented Feb 28, 2019 •

edited

torakiki commented Mar 1, 2019

Split crashing for PDF with 2,000 pages #344

Split crashing for PDF with 2,000 pages #344

Comments

tomasgreif commented Feb 27, 2019

torakiki commented Feb 28, 2019 • edited

tomasgreif commented Feb 28, 2019 • edited

torakiki commented Mar 1, 2019

torakiki commented Feb 28, 2019 •

edited

tomasgreif commented Feb 28, 2019 •

edited