Skip to content

Improves PDF tagging detection#27

Merged
srkirkland merged 2 commits intomainfrom
srk/tweaks
Jan 30, 2026
Merged

Improves PDF tagging detection#27
srkirkland merged 2 commits intomainfrom
srk/tweaks

Conversation

@srkirkland
Copy link
Member

@srkirkland srkirkland commented Jan 30, 2026

Improves PDF tagging detection by treating PDFs with trivially tagged structures as broken. This forces re-tagging to ensure proper accessibility.

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection of inadequately tagged PDFs to trigger automatic re-tagging when needed
    • Ensured document titles are properly displayed in PDF viewer title bars
  • Tests

    • Added integration tests to verify correct handling of trivially tagged PDFs and title display settings

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

📝 Walkthrough

Walkthrough

This PR adds heuristics to detect effectively empty tag trees in PDFs and force re-tagging when detected, updates viewer preferences to display document titles in PDF viewers, and adds corresponding integration tests to verify these behaviors.

Changes

Cohort / File(s) Summary
Tag Tree Detection
server.core/Ingest/PdfProcessor.cs
Introduces TagTreeHasContentItems and Dereference helpers to detect effectively empty tag trees. Modifies ReadSourcePdfInfo to return TaggedBroken when tag tree root content is absent, forcing re-tagging of such PDFs.
Title Display Preference
server.core/Remediate/PdfRemediationProcessor.cs
Adds EnsurePdfDisplaysTitleInTitleBar helper to update PDF catalog's DisplayDocTitle viewer preference. Integrates into EnsurePdfHasTitleAsync at three points to ensure title bar consistency with document title.
Tag Tree Detection Tests
tests/server.tests/Integration/Ingest/PdfProcessorIntegrationTests.cs
Adds ProcessAsync_WithAdobeEnabledAndTriviallyTaggedPdf_Autotags integration test and CreateTriviallyTaggedPdf helper to verify trivially tagged PDFs trigger autotagging when Adobe services enabled.
Title Display Verification
tests/server.tests/Integration/Remediate/PdfRemediationProcessorTitleTests.cs
Introduces GetDisplayDocTitle test helper to inspect PDF catalog viewer preferences for DisplayDocTitle flag and verify it is enabled in remediated outputs across multiple tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 Tags so empty, we'd hop right past,
Now we find them and fix them fast!
Titles bright in the viewer's sight,
PDFs tagged and displayed just right! 📄✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Improves PDF tagging detection' accurately summarizes the main change: detecting effectively empty tag trees and treating them as broken to force re-tagging. This is the primary objective addressed across the core logic changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch srk/tweaks

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!


Comment @coderabbitai help to get the list of available commands and usage tips.

@srkirkland srkirkland merged commit 4397b1f into main Jan 30, 2026
3 checks passed
@srkirkland srkirkland deleted the srk/tweaks branch January 30, 2026 07:12
@coderabbitai coderabbitai bot mentioned this pull request Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant