-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task 3 Data Information #14
Comments
Hi,
If you want to create your own training file :
The create_data function is located in task3/src/my_data.py line 139 |
Could you please help me with providing an example of what a "texts" Would be along with an image, |
The text can be either be extracted by the already existing text file (provided by the SROIE challenge, where you have the text and the position). But since Task 3 don't need this information, the function sort_text (file my_data.py line 109), it takes as input the text file and return only the text with \n to separate lines. If you don't want to use the provided text files and rather use your own image, you can use an OCR engine such as Amazon Textract, Tesseract, or many other to extract the text from the image, (Be sure when you extract the text to convert it uppercase) and provide this text (so basically a string containing \n) to the create_data function (file my_data.py line 139), where you replace the txt_files by the text you have extracted. This also mean that you are not forced to save the text you extracted from an image in a text file, just tweak the create_data function to have as an input a path to a folder containing all the image you want to create a dataset from and create an array where for each image you can the extracted text corresponding |
@Karim-Baig I think we don't need images for task 3. Task 3 uses RNN for character-level classification. I guess that's also one reason why the task doesn't work for more complex documents. |
Yes, since Task 3 does not require images but only text, you can do the text extraction process as a standalone program instead of doing it while creating the training file |
@NISH1001 had mentioned about passing both the text and positions as embedding to a CNN in order to better localize the key-value pairs in documents . Is there any implementation of the same of what you mentioned? |
@Karim-Baig one possibility is to use some character-level embedding. In this case (task 3), it's already doing that using RNN. However, I presume that we could pre-train those layers in someways with a lot of text before doing the downstream (classification) tasks. That might help it to generalize I guess. On another project (not related to this thread), I am using fasttext-based embedding with Graph Neural Network for classification. The results are way better than the RNN-based architecture. There I generated the embedding in unsupervised manner with all the documents. And eventually, use those embedding for the classification (along with some custom features for each word/token). |
About CharGrid: I haven't tried that either. One variation I have tried is using fasttext embedding to UNet. Directly adding a 64-dimension vector to the input image. This bumps up the input channel to 67 (64 + 3). However, the training is very inefficient because of the high dimensionality. I saw one paper doing the same with 32+3 input channels. Tried that. still, training was very inefficient (both memory and time). |
I have placed the test files into: tmp/task3-test(347p) for images When I run the my_data.pyto create the test_dict.pth, it runs fine and shows no error. Traceback (most recent call last):
anyidea? @aureliensimon @patrick22414 @Karim-Baig @lyrab96 |
@anitchakraborty, I am facing the same issue, how did you mange to pass this issue? please help! |
Hello, can you please provide some information on how the dicts and keys pth files were created. I am trying to use the model on my own data but am failing to do so (I already have the other box, img & key files)
The text was updated successfully, but these errors were encountered: