Improved document model, parsing of borderline cases & HTML annotation support
AlbertWeichselbraun
released this
12 Jul 08:48
·
180 commits
to master
since this release
Changes
HTML parsing:
- new: improved model for handling text blocks and lines
- chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
- chg: improved whitespace handling
- add: cover more borderline cases with unit tests
Inscriptis core:
- new: annotation support
- new: processing of annotation rules and annotation output
- new: type hints
- add: extended and improved documentation
Inscript command line client:
- new: added
--annotation-rules
option for annotation support. - new: added
--post-processor
option to export and visualize annotations (HTML, XML and surface form export) - chg: apply
--encoding
to Web URLs as well
Misc:
- chg: migrated to the semantic versioning schema described on https://semver.org/ for versioning.
Note
In terms of functionality, this release corresponds to Inscriptis 2.0rc2.