Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to train PICK-pytorch to detect table and all its line-items? #111

Open
locomotiivo opened this issue Apr 20, 2022 · 1 comment

Comments

@locomotiivo
Copy link

Hi wenwenyu, thank you for an amazing code.
I have been experimenting around with the code, and found out the training dataset can be adjusted for extractions of different informations.

However, there's one thing I am stuck on, and that is training the model to detect tables and its contents as well.
I want the customized model to be able to not only detect the header data, but also list all the table line-item.
From what I understand of the code, it seems possible to train it to detect table contents as well, but I don't know how I should set the training data's entities/labels, especially for tables with more than one line-items.

Any help or tip would be greatly apppreciated, thanks :)

@ziodos
Copy link

ziodos commented Apr 20, 2022

it would be better to check a model for detecting table shape, and then you can parse the content and arrange it, I think the PICK model would be more efficient when you want to extract unstructured data and not tabular ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants