Skip to content

Python script for performing various operations on ALTO XML files

License

Notifications You must be signed in to change notification settings

UB-Mannheim/alto-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alto-tools

Python3 script for performing various operations on ALTO files.

Usage

  • extract UTF-8 text content from ALTO file

    python3 alto_tools.py alto.xml -t

  • extract page OCR confidence score from ALTO file

    python3 alto_tools.py alto.xml -c

  • extract bounding boxes of illustrations from ALTO file

    python3 alto_tools.py alto.xml -l

Planned

  • write output to file(s) - currently all output is sent to stdout

    python3 alto-tools.py alto.xml [OPTION] -o

About

Python script for performing various operations on ALTO XML files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%