Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can clstm recognize speical character well #77

Closed
wanghaisheng opened this issue Apr 28, 2016 · 1 comment
Closed

can clstm recognize speical character well #77

wanghaisheng opened this issue Apr 28, 2016 · 1 comment

Comments

@wanghaisheng
Copy link

There are some things the currently trained models for ocropus-rpred will not handle well, largely because they are nearly absent in the current training data. That includes all-caps text, some special symbols (including "?"), typewriter fonts, and subscripts/superscripts. This will be addressed in a future release,

@mittagessen
Copy link
Contributor

In general an LSTM+CTC configuration is able to recognize anything from the training data including "special" symbols (doing ancient Greek and playing around with Arabic here). You have to ensure the input you want to handle is included in the training data which is the reason the default model of ocropy doesn't deal well with these inputs.

"Tricky" stuff right now is training models performing well (<1% error) on multiple fonts and RTL scripts will need some preprocessing to reorder the label sequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants