Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split crashing for PDF with 2,000 pages #344

Open
tomasgreif opened this issue Feb 27, 2019 · 3 comments
Open

Split crashing for PDF with 2,000 pages #344

tomasgreif opened this issue Feb 27, 2019 · 3 comments

Comments

@tomasgreif
Copy link

C:\Users\tomas.greif>C:\Users\xxx\Desktop\split\sejda-console-3.2.67\bin\sejda-console splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
Configuring Sejda 3.2.67
Document root element "sejda", must match DOCTYPE root "null".
Document is invalid: no grammar found.
Starting execution with arguments: 'splitbyevery -f C:\Users\xxx\Desktop\split\01-04-18 2.pdf -o C:\Users\xxx\Desktop\split\out -n 1 -p [CURRENTPAGE#####]'
Java version: '1.8.0_191'
Validating parameters.
Starting task (org.sejda.impl.sambox.SplitByPageNumbersTask@62379589) execution.
Opening C:\Users\xxx\Desktop\split\01-04-18 2.pdf
Found 0 inherited images and 0 inherited fonts potentially unused
Starting split by page numbers for org.sejda.model.parameter.SplitByEveryXPagesParameters@50a638b5[step=1,optimizationPolicy=AUTO,discardOutline=false,outputPrefix=[CURRENTPAGE#####],output=org.sejda.model.output.FileOrDirectoryTaskOutput@1189dd52[C:\Users\tomas.greif\Desktop\split\out],sourceList=[C:\Users\xxx\Desktop\split\01-04-18 2.pdf],compress=true,version=VERSION_1_6,existingOutputPolicy=FAIL,lenient=false,1]
Starting split at page 1 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out.sejdaTmp8635042219708410613.tmp
Task progress: 0% done
Filtering annotations
Skipped acroform merge, nothing to merge
Ending split at page 1 of the original document, generated document size is 11.68 KB
Starting split at page 2 of the original document
Created output temporary buffer C:\Users\xxx\Desktop\split\out.sejdaTmp4677890173653656260.tmp
Exception in thread "main" java.lang.StackOverflowError
at java.util.Spliterator.getExactSizeIfKnown(Unknown Source)
at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.util.stream.ReferencePipeline.collect(Unknown Source)
at org.sejda.sambox.pdmodel.PDPageTree.getKids(PDPageTree.java:172)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:318)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)

@torakiki
Copy link
Owner

torakiki commented Feb 28, 2019

You can try increasing your Xms size. Try setting the env variable JAVA_OPTS to something like -Xms1024m (1gb) or even more, depending on how much memory your machine has and if the task keeps failing.
If that doesn't work maybe you could provide the document and we can try to take a look.

@tomasgreif
Copy link
Author

tomasgreif commented Feb 28, 2019

I tried both setting JAVA_OPTS and modifying the sejda-console.bat

  1. Console.bat change
    %JAVACMD% %JAVA_OPTS% -Dfile.encoding=UTF8 -Xms2G -Xmx10G -classpath %CLASSPATH% -Dapp.name="sejda-console" -Dapp.repo="%REPO%" -Dapp.home="%BASEDIR%" -Dbasedir="%BASEDIR%" org.sejda.cli.Main %CMD_LINE_ARGS%

  2. call:
    C:\Users\tomas.greif>C:\Users\tomas.greif\Desktop\split\sejda-console-3.2.67\bin\sejda-console.bat splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]

  3. result:
    Java version: '1.8.0_191'
    Validating parameters.
    Starting task (org.sejda.impl.sambox.SplitByPageNumbersTask@62379589) execution.
    Opening C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf
    Found 0 inherited images and 0 inherited fonts potentially unused
    Starting split by page numbers for org.sejda.model.parameter.SplitByEveryXPagesParameters@50a638b5[step=1,optimizationPolicy=AUTO,discardOutline=false,outputPrefix=[CURRENTPAGE#####],output=org.sejda.model.output.FileOrDirectoryTaskOutput@1189dd52[C:\Users\tomas.greif\Desktop\split\out],sourceList=[C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf],compress=true,version=VERSION_1_6,existingOutputPolicy=FAIL,lenient=false,1]
    Starting split at page 1 of the original document
    Created output temporary buffer C:\Users\tomas.greif\Desktop\split\out.sejdaTmp3357657715678415443.tmp
    Task progress: 0% done
    Filtering annotations
    Skipped acroform merge, nothing to merge
    Ending split at page 1 of the original document, generated document size is 11.68 KB
    Starting split at page 2 of the original document
    Created output temporary buffer C:\Users\tomas.greif\Desktop\split\out.sejdaTmp8549402881702398888.tmp
    Exception in thread "main" java.lang.StackOverflowError
    at java.util.Spliterators$IteratorSpliterator.estimateSize(Unknown Source)
    at java.util.Spliterator.getExactSizeIfKnown(Unknown Source)
    at java.util.stream.AbstractPipeline.copyInto(Unknown Source)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(Unknown Source)
    at java.util.stream.AbstractPipeline.evaluate(Unknown Source)
    at java.util.stream.ReferencePipeline.collect(Unknown Source)
    at org.sejda.sambox.pdmodel.PDPageTree.getKids(PDPageTree.java:172)
    at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:318)
    at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)
    at org.sejda.sambox.pdmodel.PDPageTree.get(PDPageTree.java:327)

For the JAVA_OPTS I am getting:
C:\Users\tomas.greif>C:\Users\tomas.greif\Desktop\split\sejda-console-3.2.67\bin\sejda-console.bat splitbyevery -f "C:\Users\tomas.greif\Desktop\split\01-04-18 2.pdf" -o C:\Users\tomas.greif\Desktop\split\out -n 1 -p [CURRENTPAGE#####]
Invalid maximum heap size: -Xmx4G -Xms2G
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Unfortunately I am unable to share the file as it contains some sensitive data. Is there other way I should set the xmx/xms or maybe I can run some jar file directly?

As a note, my laptop has 32GB RAM and java version is:
C:\Users\tomas.greif>java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

@torakiki
Copy link
Owner

torakiki commented Mar 1, 2019

Could you try with -Xss8m ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants