You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I run the training step on some .pdfs of historical Indian census files, I get the following error:
Extracting text line images from ../data/district_reports/raw_pdfs/1981/27582_1981_MAI.pdf, page 3
Error reading image
com.sun.pdfview.PDFParseException: Unknown coding method:JBIG2Decode
and then
java.lang.NullPointerException
at com.sun.pdfview.font.TTFFont.getOutline(TTFFont.java:170)
at com.sun.pdfview.font.CIDFontType2.getOutline(CIDFontType2.java:270)
at com.sun.pdfview.font.OutlineFont.getGlyph(OutlineFont.java:130)
at com.sun.pdfview.font.PDFFont.getCachedGlyph(PDFFont.java:308)
at com.sun.pdfview.font.PDFFontEncoding.getGlyphFromCMap(PDFFontEncoding.java:155)
at com.sun.pdfview.font.PDFFontEncoding.getGlyphs(PDFFontEncoding.java:115)
at com.sun.pdfview.font.PDFFont.getGlyphs(PDFFont.java:274)
at com.sun.pdfview.PDFTextFormat.doText(PDFTextFormat.java:269)
at com.sun.pdfview.PDFParser.iterate(PDFParser.java:752)
at com.sun.pdfview.BaseWatchable.run(BaseWatchable.java:101)
at java.base/java.lang.Thread.run(Thread.java:834)
I think what's going on here is that the .pdf contains .jbig2 images, but the program doesn't know how to read these.
The text was updated successfully, but these errors were encountered:
When I run the training step on some .pdfs of historical Indian census files, I get the following error:
and then
I think what's going on here is that the .pdf contains .jbig2 images, but the program doesn't know how to read these.
The text was updated successfully, but these errors were encountered: