Skip to content

Improved document model, parsing of borderline cases & HTML annotation support

Compare
Choose a tag to compare
@AlbertWeichselbraun AlbertWeichselbraun released this 12 Jul 08:48
· 180 commits to master since this release
5e5fcc3

Changes

HTML parsing:

  • new: improved model for handling text blocks and lines
  • chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
  • chg: improved whitespace handling
  • add: cover more borderline cases with unit tests

Inscriptis core:

  • new: annotation support
  • new: processing of annotation rules and annotation output
  • new: type hints
  • add: extended and improved documentation

Inscript command line client:

  • new: added --annotation-rules option for annotation support.
  • new: added --post-processor option to export and visualize annotations (HTML, XML and surface form export)
  • chg: apply --encoding to Web URLs as well

Misc:

  • chg: migrated to the semantic versioning schema described on https://semver.org/ for versioning.

Note

In terms of functionality, this release corresponds to Inscriptis 2.0rc2.