Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 910 Bytes

README.md

File metadata and controls

15 lines (9 loc) · 910 Bytes

Infuse

This project aims to create a pdf-processing Rust library, à la Grobid, which can be used to read scientific pdfs as if they were normal web pages. It will then be integrated in a webapp by compiling the whole thing to Wasm.

The implementation is still embryonic. But there is an interesting presentation (37m talk, 18m questions), and associated slides!

Status

Reading pdfs works, in the browser also.

Current work is focused on piecing together the various objects encoded in the pdf in orderto reconstruct the tree of content, including full body text, while also classifying those pieces into the various types we're interested in (footnote, caption, metadata, body, ...).

Read more in the issues!