You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some things the currently trained models for ocropus-rpred will not handle well, largely because they are nearly absent in the current training data. That includes all-caps text, some special symbols (including "?"), typewriter fonts, and subscripts/superscripts. This will be addressed in a future release,
The text was updated successfully, but these errors were encountered:
In general an LSTM+CTC configuration is able to recognize anything from the training data including "special" symbols (doing ancient Greek and playing around with Arabic here). You have to ensure the input you want to handle is included in the training data which is the reason the default model of ocropy doesn't deal well with these inputs.
"Tricky" stuff right now is training models performing well (<1% error) on multiple fonts and RTL scripts will need some preprocessing to reorder the label sequence.
The text was updated successfully, but these errors were encountered: