-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NewPdfReaderLazy supports reading PDF files in lazy-load mode #409
Conversation
Codecov Report
@@ Coverage Diff @@
## v3 #409 +/- ##
==========================================
+ Coverage 55.63% 59.83% +4.19%
==========================================
Files 153 153
Lines 27531 27596 +65
==========================================
+ Hits 15318 16513 +1195
- Misses 10345 10697 +352
+ Partials 1868 386 -1482
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks really good.
Provided via new initialier NewPdfReaderLazy. NewPdfReader still loads entire PDF structure upon loading.
Reduced from 17s -> 10.6s
Colorspaces can be very complex so deal with them as PdfObjects unless need to work with the content.
Avoid going through annotations on page loading, handle as generic PdfObject and load when GetAnnotations is called on the page.
Provided via new initialier NewPdfReaderLazy. NewPdfReader still loads entire PDF structure upon loading.
The e2e test In v3 branch (not lazy):
v3-lazyloading (with lazy reader):
Significant improvements in memory use and speed. Note that the reason for the long time the test is taking is that the input and output files are validated with ghostscript. In addition, debug.FreeOSMemory() is called prior to each test file to ensure consistent memory measurements. |
… Trace mode - Add IsLogLevel function to logger. Can be used to avoid calling resource intensive functions except when running trace only. - Remove large testdata file and generate data for test case dynamically instead for TestBigDictParse - Small improvement in String() for dictionary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great. Seems like a lot of unnecessary code has been removed.
pdf/core/primitives.go
Outdated
return MakeNull() | ||
} | ||
if obj == nil { | ||
common.Log.Debug("ERROR resolving reference: nil object - returning a null object", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra err argument.
common.Log.Debug("ERROR resolving reference: nil object - returning a null object")
The
NewPdfReader
still implements the traditional approach of loading the entire structure at opening time. Whereas,NewPdfReaderLazy
only accesses objects on an as-needed basis.The non-lazy reader may be more efficient whereas the lazy reader should reduce memory usage when only using parts of the PDF.
Changed ColorSpace Resources to a core.PdfObject and added a getter which loads the PdfObject to a colorspace resource model (*PdfPageResourcesColorspaces). Reason is that colorspaces can be quite heavy and in large files this slows down loading significantly.
Similarly changed page annotations Annots to a core.PdfObject and a getter GetAnnotations which loads the PDFObject to a slice of PdfAnnotations.
Added a function to Logger interface to check loglevel, used to avoid calling resource intensive functions for outputing Trace logs.
Addresses #128 .