can clstm recognize speical character well #77

wanghaisheng · 2016-04-28T02:49:15Z

There are some things the currently trained models for ocropus-rpred will not handle well, largely because they are nearly absent in the current training data. That includes all-caps text, some special symbols (including "?"), typewriter fonts, and subscripts/superscripts. This will be addressed in a future release,

mittagessen · 2016-04-28T11:55:29Z

In general an LSTM+CTC configuration is able to recognize anything from the training data including "special" symbols (doing ancient Greek and playing around with Arabic here). You have to ensure the input you want to handle is included in the training data which is the reason the default model of ocropy doesn't deal well with these inputs.

"Tricky" stuff right now is training models performing well (<1% error) on multiple fonts and RTL scripts will need some preprocessing to reorder the label sequence.

wanghaisheng closed this as completed Apr 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can clstm recognize speical character well #77

can clstm recognize speical character well #77

wanghaisheng commented Apr 28, 2016

mittagessen commented Apr 28, 2016

can clstm recognize speical character well #77

can clstm recognize speical character well #77

Comments

wanghaisheng commented Apr 28, 2016

mittagessen commented Apr 28, 2016