Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch parsing mistakes #26

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Commits on Mar 20, 2024

  1. Catch parsing mistakes

    Sometimes with scanned pages we get '[NO_BLOCKS] PDF parsing resulted in empty content' and with GROBID parsing errors we get '[GENERAL] An exception occurred while running Grobid.'
    
    to catch these errors we need some additional logic
    manuelrech authored Mar 20, 2024
    Configuration menu
    Copy the full SHA
    9d34e2b View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2024

  1. Remove xml - html warning

    I have removed the xml waring by setting features = 'xml' and with some small adjustments
    manuelrech authored Mar 21, 2024
    Configuration menu
    Copy the full SHA
    5a67ba8 View commit details
    Browse the repository at this point in the history
  2. update checks on wrongly parsed articles

    with new xml parser we need a different checking system
    manuelrech authored Mar 21, 2024
    Configuration menu
    Copy the full SHA
    0d8252d View commit details
    Browse the repository at this point in the history