Arabic 800,000 model cant go below Error Rate 0.5 #133

ghost · 2017-06-30T08:35:42Z

I have been training an Arabic language model from scratch for days now, reaching +800,000 epochs, the error rate wont go below 0.5 and thats very bad.
I have used artificial training data that I have created, here are there specifications:
Arabic, no diacritics, 300dpi, black and white, 100% correct transcriptions, about 2100 lines.
The CLSTM settings consists of hidden=100 and lrate=1e-4

Can anybody help @tmbdev @mittagessen

amitdo · 2017-06-30T14:52:12Z

Try to find out where that 0.5 comes from.

Maybe the errors are mostly with dot, comma, and spaces.

zuphilip · 2017-06-30T16:52:06Z

I would suggest to use ocropus-econf *.gt.txt to see the most common confusions, see https://github.com/tmbdev/ocropy/wiki/Compute-errors-and-confusions.

mittagessen · 2017-06-30T18:56:59Z

The error is almost certainly caused by incorrect ordering of the training data (error will usually hover around 0.6). The code points have to be in display order (i.e. left-to-right) instead of reading order (right-to-left). If you've created them using kraken/ketos run linegen with the --reorder option to fix this. It doesn't default to this option as the training interface is intended to deal with that for you once it is finished.

ghost · 2017-07-07T13:05:40Z

Thanks for your reply,
I am taking all your suggestions, and will work on tracing the error.
keep the issue open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arabic 800,000 model cant go below Error Rate 0.5 #133

Arabic 800,000 model cant go below Error Rate 0.5 #133

ghost commented Jun 30, 2017

amitdo commented Jun 30, 2017 •

edited

Loading

zuphilip commented Jun 30, 2017

mittagessen commented Jun 30, 2017

ghost commented Jul 7, 2017

Arabic 800,000 model cant go below Error Rate 0.5 #133

Arabic 800,000 model cant go below Error Rate 0.5 #133

Comments

ghost commented Jun 30, 2017

amitdo commented Jun 30, 2017 • edited Loading

zuphilip commented Jun 30, 2017

mittagessen commented Jun 30, 2017

ghost commented Jul 7, 2017

amitdo commented Jun 30, 2017 •

edited

Loading