Skip to content

sync: include missing PR806 follow-up commit in integration#24

Closed
vitormattos wants to merge 71 commits into
integration/pdfjs-fixesfrom
sync/pr806-tail-into-integration
Closed

sync: include missing PR806 follow-up commit in integration#24
vitormattos wants to merge 71 commits into
integration/pdfjs-fixesfrom
sync/pr806-tail-into-integration

Conversation

@vitormattos
Copy link
Copy Markdown
Owner

This sync PR applies the second commit currently present in upstream PR smalot#806 but absent from integration/pdfjs-fixes.\n\nApplied commit:\n- fix: recover repeated page refs in cyclic page trees\n\nWhy:\n- integration currently contains the first PR806 commit (cycle guard) but not this follow-up recovery change.\n- bringing this commit in avoids full branch rebuild and resolves the detected inconsistency.\n\nAfter merge, upstream PR smalot#809 will automatically reflect this update because it tracks integration/pdfjs-fixes.

vitormattos and others added 30 commits April 23, 2026 22:55
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
chore: merge getPages dedup fix into aggregate fork branch
Some PDFs include bytes before the %PDF- header while still using
absolute xref offsets from the beginning of the file.

The parser trimmed data before %PDF-, which shifted offsets and caused
xref lookup failures. This manifested as an Invalid object reference
error in the veraPDF corpus header case.

Changes:
- Keep original byte layout in RawDataParser::parseData
- Add stricter trailer key matching for /Size /Root /Encrypt /Info /Prev
- Add defensive handling in xref stream resolution when startxref is near,
  but not exactly at, the xref stream object
- Add regression fixture and integration test

Regression fixture:
- samples/bugs/PullRequestInvalidObjectReference.pdf

Test:
- DocumentIssueFocusTest::testParseFileWithCompressedObjRefInXrefStream

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Some PDFs set startxref to the whitespace immediately before the
xref keyword instead of the first letter of xref.

The parser required an exact match and incorrectly switched to xref
stream decoding, which then failed with Invalid object reference.

Changes:
- Skip PDF whitespace before checking startxref position
- Use adjusted offset when decoding classic xref
- Apply same whitespace tolerance for Unix line-ending detection
- Tighten trailer key regexes to match /Size /Root /Encrypt /Info /Prev
- Add regression fixture and integration test

Regression fixture:
- samples/bugs/PullRequestXrefWhitespaceStart.pdf

Test:
- DocumentIssueFocusTest::testParseFileWhenStartxrefPointsToLeadingWhitespace

Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
vitormattos and others added 26 commits April 24, 2026 13:20
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
(cherry picked from commit 3e2a9c6)
…fixes

reverse: integrate PR 805 into fork/libresign-parser-fixes
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
(cherry picked from commit 1ae0081)
reverse: integrate PR 806 into fork/libresign-all-fixes
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
(cherry picked from commit 3513040)
(cherry picked from commit df71b86)
reverse: integrate PR 807 into fork/libresign-all-fixes (clean)
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
(cherry picked from commit 3513040)
- Rename PullRequest807-pdf.js.pdf -> PullRequest807-pdfjs-xref-missing-keyword.pdf
- Rename PullRequest808-pdf.js.pdf -> PullRequest807-pdfjs-xref-startxref-misaligned.pdf
- Add *.pdf binary to .gitattributes to prevent CRLF conversion on Windows
  (fixes Undefined index: xref and Object list not found on Windows CI)
…ery-rebase

Fork/fix/issue9252 xref recovery rebase
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
…ll-fixes

fix: recover malformed or missing startxref stanzas
# Conflicts:
#	src/Smalot/PdfParser/RawData/RawDataParser.php
#	tests/PHPUnit/Integration/DocumentIssueFocusTest.php
sync: merge PR810 fix into integration branch
sync: bring PR811 fix into integration/pdfjs-fixes
Signed-off-by: Vitor Mattos <1079143+vitormattos@users.noreply.github.com>
@vitormattos
Copy link
Copy Markdown
Owner Author

Closing this sync PR because integration/pdfjs-fixes was fully rebuilt from upstream master by exact replay of PRs smalot#795,smalot#796,smalot#797,smalot#798,smalot#799,smalot#800,smalot#801,smalot#804,smalot#805,smalot#806,smalot#807,smalot#808,smalot#810,smalot#811 and force-updated directly.

@vitormattos vitormattos force-pushed the integration/pdfjs-fixes branch from ff2abc5 to e786ad9 Compare April 25, 2026 19:00
@vitormattos vitormattos deleted the sync/pr806-tail-into-integration branch April 27, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants