-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception raised when specifying area with pdf that has spare blank pages #130
Comments
jjelosua
pushed a commit
to jjelosua/tabula-java
that referenced
this issue
Dec 16, 2016
jazzido
added a commit
that referenced
this issue
Dec 16, 2016
ignore area if the page has no text. closes #130
jeremybmerrill
added a commit
that referenced
this issue
Dec 30, 2016
melisabok
added a commit
to melisabok/tabula-java
that referenced
this issue
Mar 8, 2017
# The first commit's message is: # This is a combination of 17 commits. # The first commit's message is: Fix TextElement creation # This is the 2nd commit message: fix tabs # This is the 3rd commit message: Use the code from LegacyPDFStreamEngine to create the TextElements # This is the 4th commit message: Fix removeText function using the example: org.apache.pdfbox.examples.util.RemoveAllText # This is the 5th commit message: close the document # This is the 6th commit message: close removed text document # This is the 7th commit message: fix array serialization # This is the 8th commit message: add spanning cells test with CSV format # This is the 9th commit message: - Remove capheight calculation - Temporally set height # This is the 10th commit message: Test writer two tables checking the json result object instead of the string Add a test writer two tables for CSV output # This is the 11th commit message: Fix pageTransform when there is a rotation Add more csv tests # This is the 12th commit message: fix path iterator # This is the 13th commit message: update json tests # This is the 14th commit message: Refactor table equality assertions for better reporting # This is the 15th commit message: Moved test fixture to a CSV file # This is the 16th commit message: rename spreadsheet/no-spreadsheet to lattice/stream to match web UI in CLI arguments and names of extraction algorithms # This is the 17th commit message: adjust expected output to use lattice/stream instead of spreadsheet/basic names for extraction mehthod # This is the 2nd commit message: ignore area restrictions on blank page. closes tabulapdf#130 # This is the 3rd commit message: Revert "ignore area restrictions on blank page. closes tabulapdf#130" This reverts commit dfd5f2f. # This is the 4th commit message: more consistent naming of avariable :) # This is the 5th commit message: fix and test for empty areas; which should have no text content # This is the 6th commit message: various additional null/empty checks to avoid exceptions when the user selects empty pages or regions # This is the 7th commit message: Update acknowledgments # This is the 8th commit message: tabula 0.9.2 # This is the 9th commit message: update version # This is the 10th commit message: -t/--stream, -l/--lattice in #whichExtractionMethod # This is the 11th commit message: Comment on line above
melisabok
added a commit
to melisabok/tabula-java
that referenced
this issue
Mar 8, 2017
# The first commit's message is: # This is a combination of 12 commits. # The first commit's message is: # This is a combination of 17 commits. # The first commit's message is: Fix TextElement creation # This is the 2nd commit message: fix tabs # This is the 3rd commit message: Use the code from LegacyPDFStreamEngine to create the TextElements # This is the 4th commit message: Fix removeText function using the example: org.apache.pdfbox.examples.util.RemoveAllText # This is the 5th commit message: close the document # This is the 6th commit message: close removed text document # This is the 7th commit message: fix array serialization # This is the 8th commit message: add spanning cells test with CSV format # This is the 9th commit message: - Remove capheight calculation - Temporally set height # This is the 10th commit message: Test writer two tables checking the json result object instead of the string Add a test writer two tables for CSV output # This is the 11th commit message: Fix pageTransform when there is a rotation Add more csv tests # This is the 12th commit message: fix path iterator # This is the 13th commit message: update json tests # This is the 14th commit message: Refactor table equality assertions for better reporting # This is the 15th commit message: Moved test fixture to a CSV file # This is the 16th commit message: rename spreadsheet/no-spreadsheet to lattice/stream to match web UI in CLI arguments and names of extraction algorithms # This is the 17th commit message: adjust expected output to use lattice/stream instead of spreadsheet/basic names for extraction mehthod # This is the 2nd commit message: ignore area restrictions on blank page. closes tabulapdf#130 # This is the 3rd commit message: Revert "ignore area restrictions on blank page. closes tabulapdf#130" This reverts commit dfd5f2f. # This is the 4th commit message: more consistent naming of avariable :) # This is the 5th commit message: fix and test for empty areas; which should have no text content # This is the 6th commit message: various additional null/empty checks to avoid exceptions when the user selects empty pages or regions # This is the 7th commit message: Update acknowledgments # This is the 8th commit message: tabula 0.9.2 # This is the 9th commit message: update version # This is the 10th commit message: -t/--stream, -l/--lattice in #whichExtractionMethod # This is the 11th commit message: Comment on line above # This is the 12th commit message: update json outputs # This is the 2nd commit message: upgrade pdfbox version
melisabok
added a commit
to melisabok/tabula-java
that referenced
this issue
Mar 8, 2017
fix tabs Use the code from LegacyPDFStreamEngine to create the TextElements Fix removeText function using the example: org.apache.pdfbox.examples.util.RemoveAllText close the document close removed text document fix array serialization add spanning cells test with CSV format - Remove capheight calculation - Temporally set height Test writer two tables checking the json result object instead of the string Add a test writer two tables for CSV output Fix pageTransform when there is a rotation Add more csv tests fix path iterator update json tests Refactor table equality assertions for better reporting Moved test fixture to a CSV file rename spreadsheet/no-spreadsheet to lattice/stream to match web UI in CLI arguments and names of extraction algorithms adjust expected output to use lattice/stream instead of spreadsheet/basic names for extraction mehthod ignore area restrictions on blank page. closes tabulapdf#130 Revert "ignore area restrictions on blank page. closes tabulapdf#130" This reverts commit dfd5f2f. more consistent naming of avariable :) fix and test for empty areas; which should have no text content various additional null/empty checks to avoid exceptions when the user selects empty pages or regions Update acknowledgments tabula 0.9.2 update version -t/--stream, -l/--lattice in #whichExtractionMethod Comment on line above update json outputs upgrade pdfbox version back to the old implementation and catch the IndexOutOfBoundsException Remove hardcoded code Remove more hardcoded code test all the elements of the detected table Change the expected table top value Increase the threshold factor to support a greater headings Fix rectangle comparator. fix wrong expected column size, 5 instead of 6. add more tests update expected table, more spaces are expected to respect the alingment. when the text value has length > 1, clean the spaces. clean code remove stackstrace add log error
EmpowerZ
pushed a commit
to EmpowerZ/tabula-java
that referenced
this issue
Oct 23, 2020
EmpowerZ
pushed a commit
to EmpowerZ/tabula-java
that referenced
this issue
Oct 23, 2020
ignore area if the page has no text. closes tabulapdf#130
EmpowerZ
pushed a commit
to EmpowerZ/tabula-java
that referenced
this issue
Oct 23, 2020
This reverts commit dfd5f2f.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
java -jar ./tools/tabula-0.9.1-jar-with-dependencies.jar pdf_with_blank_page.pdf --pages all --spreadsheet -u -a 0,0,864.567,1105.51
Exception in thread "main" java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:854)
at java.util.Collections.min(Collections.java:635)
at technology.tabula.Page.getArea(Page.java:68)
at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:163)
at technology.tabula.CommandLineApp.extractFileInto(CommandLineApp.java:138)
at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:128)
at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:104)
at technology.tabula.CommandLineApp.main(CommandLineApp.java:74)
Example doc:
pdf_with_blank_page.pdf
Will send a one liner PR to solve it
The text was updated successfully, but these errors were encountered: