Skip to content

Commit

Permalink
Added best traineddatas for 4.00 alpha
Browse files Browse the repository at this point in the history
  • Loading branch information
theraysmith committed Jul 31, 2017
1 parent eb769ea commit 3a94ddd

Sorry, this diff is taking too long to generate.

It may be too large to display on GitHub.

20 comments on commit 3a94ddd

@OmrSi
Copy link

@OmrSi OmrSi commented on 3a94ddd Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why am i getting b5c168db error when using new files?

@amitdo
Copy link

@amitdo amitdo commented on 3a94ddd Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'b5c168db error'?

@OmrSi
Copy link

@OmrSi OmrSi commented on 3a94ddd Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the file with Subtitle Edit
The old one worked fine
Now when I use this I get
Tesseract b5c168db has stopped working
Or something like that…

@OmrSi
Copy link

@OmrSi OmrSi commented on 3a94ddd Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untitled

@theraysmith
Copy link
Contributor Author

@theraysmith theraysmith commented on 3a94ddd Aug 3, 2017 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmrSi
Copy link

@OmrSi OmrSi commented on 3a94ddd Aug 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files before this last one, launched 3 days ago was working fine.
The "Best" datat files are showing this.
Anyway, I posted the problem in Subtitle Edit's repository, too.

@theraysmith
Copy link
Contributor Author

@theraysmith theraysmith commented on 3a94ddd Aug 3, 2017 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoangtocdo90
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theraysmith please explain jpn.traineddata and Japanese.traineddata

@Shreeshrii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use combine_tessdata -u to unpack both, look at their unicharsets and also convert the word-dawgs to wordlists using dawg2wordlist to see the differences.

@amitdo
Copy link

@amitdo amitdo commented on 3a94ddd Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also jpn_vert.traineddata

Maybe Japanese is jpn+jpn_vert.

@theraysmith
Copy link
Contributor Author

@theraysmith theraysmith commented on 3a94ddd Aug 10, 2017 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kuznetsoffandrey
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@theraysmith When I replace eng.traineddata in my tessdata folder with tesseract 4, I receive 'System.AccessViolationException' occurred in InteropRuntimeImplementer.TessApiSignaturesInstance in BaseApiInit method. What is the problem? Tried it with CanParseMultipageTif test

@misheelen
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after copying osd.traineddata file from best to my tessdata folder, tesseract shows "Failed loading language 'osd'
Tesseract couldn't load any languages!" message. Any other language traineddata files are working fine and pulled and compiled latest commit of tesseract 4. So please check new osd.traineddata file.

@Timilehin
Copy link

@Timilehin Timilehin commented on 3a94ddd Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question. Is the yor.traineddata for yoruba?

@whatohyou
Copy link

@whatohyou whatohyou commented on 3a94ddd Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jpn_vert seems to have a glitch.

I don't know if it's the traineddata or tesseract 4.00 itself but it failed when longer edge has fewer characters aligned from the right side compared to the other sides.

temp
This vertical text should be read from right to left.
After it reads "ったく", tesseract seems to understand it as the shorter edge, so it rotate the image in a wrong way... so results in failure.

Also, psm 6 mode does not work for vert_jpn.traineddata. Only psm 1 does.

@alonehoney
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to add some urdu words in urd.trainneddata after that i use them in my application .how can i do that

@Shreeshrii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can unpack the traineddata using combine_tessdata, replace the dawg file for word list with a new file based on your word lists.

Take a look at wordlist2dawg and dawg2wordlist commands.

However, there was some issue related to RTL languages, so I am not sure whether this suggestion will work for Urdu.

@alonehoney
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can i unpack trained with combine tessdata

@alonehoney
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping

@Shreeshrii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.