Skip to content
This repository has been archived by the owner on Jan 20, 2021. It is now read-only.

Spreadsheets with no bounding frame #69

Closed
jazzido opened this issue Feb 14, 2014 · 4 comments
Closed

Spreadsheets with no bounding frame #69

jazzido opened this issue Feb 14, 2014 · 4 comments
Assignees

Comments

@jazzido
Copy link
Contributor

jazzido commented Feb 14, 2014

(as commented in jazzido/tabula#128)

There are spreadsheets with no outer frame.

Proposed fix: build a frame that encloses all the characters in the target area, and use it to clip all the rulings inside it. This, in fact, can be done whether the spreadsheet has a bounding frame or not.

@jeremybmerrill jeremybmerrill self-assigned this Sep 27, 2014
@jeremybmerrill
Copy link
Member

picking this up, with this spreadsheet http://www.conab.gov.br/OlalaCMS/uploads/arquivos/13_06_12_10_36_58_boletim_ingles_junho_2013.pdf (I'll find a relevant page and stick it in the test set)

@jeremybmerrill
Copy link
Member

I implemented this by building the frame to enclose all ruling lines in the target area -- rather than characters. My alternative is more likely to work in all cases for the spreadsheet algorithm, which depends on lines intersecting (or very nearly intersecting).

Is there a case in which building the frame using characters is superior? Unless I'm mistaken, the "original" method would ignore the frame no matter what, right?

@jeremybmerrill
Copy link
Member

My solution (33abbdb) causes test_issue78_some_ruling_lines_not_detected to fail. I think the failure is due to a typo fixed in f16eb31 and the new, actual result is a proper result (it basically returns an additional row).

@jeremybmerrill
Copy link
Member

Can you verify, @jazzido ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants