This repository is based on poppler 0.22.5 and includes only the pdftotext.cc file and a sample of output HTML generated by running it on a PDF.
Adds a new option, -bbox-layout, which is very similar to -bbox, but instead of only producing word coordinates, it also produces tags for flows, blocks, lines, and words. The blocks, lines, and words all include coordinates. This output is useful for producing ALTO-like XML for ingesting PDFs into our Historic Oregon Newspapers system.
The license is GPL v2, as specified in the version of code this is based on, and can be viewed in the source file, pdftotext.cc.