-
Notifications
You must be signed in to change notification settings - Fork 120
Description
@xavctn Hi!
I'm currently working on implementing table extraction as part of a custom RAG flow and found img2table to be a great tool for handling tabular data. However, since I'm aiming to build a reliable solution that works for 99.9% of my documents, I wanted to clarify the library's behavior as I noticed some inconsistencies.
Specifically, the attached document contains a small table in the header of each page. img2table correctly detects the table on page 1, but fails to detect the same (or very similar) tables on pages 2 and 3. I’m wondering what might be causing this, as the table layout is nearly identical across all pages.
Here’s the basic code I’m using:
pdf = PDF(pdf_document_path)
pdf_tables = pdf.extract_tables()
Any suggestions or insights would be greatly appreciated!