-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arabic 800,000 model cant go below Error Rate 0.5 #133
Comments
Try to find out where that 0.5 comes from. Maybe the errors are mostly with dot, comma, and spaces. |
I would suggest to use |
The error is almost certainly caused by incorrect ordering of the training data (error will usually hover around 0.6). The code points have to be in display order (i.e. left-to-right) instead of reading order (right-to-left). If you've created them using kraken/ketos run linegen with the |
Thanks for your reply, |
I have been training an Arabic language model from scratch for days now, reaching +800,000 epochs, the error rate wont go below 0.5 and thats very bad.
I have used artificial training data that I have created, here are there specifications:
Arabic, no diacritics, 300dpi, black and white, 100% correct transcriptions, about 2100 lines.
The CLSTM settings consists of hidden=100 and lrate=1e-4
Can anybody help @tmbdev @mittagessen
The text was updated successfully, but these errors were encountered: