You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason will be displayed to describe this comment to others. Learn more.
The files before this last one, launched 3 days ago was working fine.
The "Best" datat files are showing this.
Anyway, I posted the problem in Subtitle Edit's repository, too.
The reason will be displayed to describe this comment to others. Learn more.
use combine_tessdata -u to unpack both, look at their unicharsets and also convert the word-dawgs to wordlists using dawg2wordlist to see the differences.
The reason will be displayed to describe this comment to others. Learn more.
'jpn' contains whatever appears on the www that is labelled as the
language, trained only with fonts that can render Japanese.
As with most of the other Script traineddatas, 'Japanese' contains all the
languages that use that script (in this case just the one) PLUS English.
The resulting model is trained with a mix of both training sets, with the
expectation that some of the generalization to 4500 English training fonts
will also apply to the other script that has a lot less.
I haven't thoroughly tested whether this works, so I am interested to get
feedback on it.
'jpn_vert' is trained on text rendered vertically (but the image is rotated
so the long edge is still horizontal).
'jpn' loads 'jpn_vert' as a secondary language so it can try it in case the
text is rendered vertically. This seems to work most of the time as a
reasonable solution.
The reason will be displayed to describe this comment to others. Learn more.
@theraysmith When I replace eng.traineddata in my tessdata folder with tesseract 4, I receive 'System.AccessViolationException' occurred in InteropRuntimeImplementer.TessApiSignaturesInstance in BaseApiInit method. What is the problem? Tried it with CanParseMultipageTif test
The reason will be displayed to describe this comment to others. Learn more.
after copying osd.traineddata file from best to my tessdata folder, tesseract shows "Failed loading language 'osd'
Tesseract couldn't load any languages!" message. Any other language traineddata files are working fine and pulled and compiled latest commit of tesseract 4. So please check new osd.traineddata file.
The reason will be displayed to describe this comment to others. Learn more.
jpn_vert seems to have a glitch.
I don't know if it's the traineddata or tesseract 4.00 itself but it failed when longer edge has fewer characters aligned from the right side compared to the other sides.
This vertical text should be read from right to left.
After it reads "ったく", tesseract seems to understand it as the shorter edge, so it rotate the image in a wrong way... so results in failure.
Also, psm 6 mode does not work for vert_jpn.traineddata. Only psm 1 does.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why am i getting b5c168db error when using new files?
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by 'b5c168db error'?
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the file with Subtitle Edit
The old one worked fine
Now when I use this I get
Tesseract b5c168db has stopped working
Or something like that…
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The files before this last one, launched 3 days ago was working fine.
The "Best" datat files are showing this.
Anyway, I posted the problem in Subtitle Edit's repository, too.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theraysmith please explain jpn.traineddata and Japanese.traineddata
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use combine_tessdata -u to unpack both, look at their unicharsets and also convert the word-dawgs to wordlists using dawg2wordlist to see the differences.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also jpn_vert.traineddata
Maybe Japanese is jpn+jpn_vert.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theraysmith When I replace eng.traineddata in my tessdata folder with tesseract 4, I receive 'System.AccessViolationException' occurred in InteropRuntimeImplementer.TessApiSignaturesInstance in BaseApiInit method. What is the problem? Tried it with CanParseMultipageTif test
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after copying osd.traineddata file from best to my tessdata folder, tesseract shows "Failed loading language 'osd'
Tesseract couldn't load any languages!" message. Any other language traineddata files are working fine and pulled and compiled latest commit of tesseract 4. So please check new osd.traineddata file.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question. Is the yor.traineddata for yoruba?
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jpn_vert seems to have a glitch.
I don't know if it's the traineddata or tesseract 4.00 itself but it failed when longer edge has fewer characters aligned from the right side compared to the other sides.
This vertical text should be read from right to left.
After it reads "ったく", tesseract seems to understand it as the shorter edge, so it rotate the image in a wrong way... so results in failure.
Also, psm 6 mode does not work for vert_jpn.traineddata. Only psm 1 does.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i want to add some urdu words in urd.trainneddata after that i use them in my application .how can i do that
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can unpack the traineddata using combine_tessdata, replace the dawg file for word list with a new file based on your word lists.
Take a look at wordlist2dawg and dawg2wordlist commands.
However, there was some issue related to RTL languages, so I am not sure whether this suggestion will work for Urdu.
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can i unpack trained with combine tessdata
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping
3a94ddd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc