Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arabic 800,000 model cant go below Error Rate 0.5 #133

Open
ghost opened this issue Jun 30, 2017 · 4 comments
Open

Arabic 800,000 model cant go below Error Rate 0.5 #133

ghost opened this issue Jun 30, 2017 · 4 comments

Comments

@ghost
Copy link

ghost commented Jun 30, 2017

I have been training an Arabic language model from scratch for days now, reaching +800,000 epochs, the error rate wont go below 0.5 and thats very bad.
I have used artificial training data that I have created, here are there specifications:
Arabic, no diacritics, 300dpi, black and white, 100% correct transcriptions, about 2100 lines.
The CLSTM settings consists of hidden=100 and lrate=1e-4

Can anybody help @tmbdev @mittagessen

@amitdo
Copy link
Contributor

amitdo commented Jun 30, 2017

Try to find out where that 0.5 comes from.

Maybe the errors are mostly with dot, comma, and spaces.

@zuphilip
Copy link
Collaborator

I would suggest to use ocropus-econf *.gt.txt to see the most common confusions, see https://github.com/tmbdev/ocropy/wiki/Compute-errors-and-confusions.

@mittagessen
Copy link
Contributor

The error is almost certainly caused by incorrect ordering of the training data (error will usually hover around 0.6). The code points have to be in display order (i.e. left-to-right) instead of reading order (right-to-left). If you've created them using kraken/ketos run linegen with the --reorder option to fix this. It doesn't default to this option as the training interface is intended to deal with that for you once it is finished.

@ghost
Copy link
Author

ghost commented Jul 7, 2017

Thanks for your reply,
I am taking all your suggestions, and will work on tracing the error.
keep the issue open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants