LSTM: Non-dictionary words with combination of letters and numbers not recognized. #733

Shreeshrii · 2017-02-22T03:15:28Z

https://groups.google.com/d/msgid/tesseract-ocr/1a3e8773-7151-48f9-92bb-fda888293eab%40googlegroups.com?utm_medium=email&utm_source=footer

While the single "S" is recognized correctly, the text "2S" is recognized as "25".

Here is link to the test image:

https://03054610326450256607.googlegroups.com/attach/b8b86693ac072/2s.png?part=0.4&view=1

Shreeshrii · 2017-02-22T15:58:00Z

On 22-Feb-2017 9:02 PM, "Amit D." ***@***.***> wrote: The lstm engine is train on text-lines images. and learns from context, so it does not surprise me that for a single glyph the OCR accuracy is not so good.

So, is this another case where legacy engine is better than LSTM? - excuse the brevity, sent from mobile

andrewisplinghoff · 2017-02-23T08:26:26Z

Yes, the legacy engine (--oem 0) gets this one right.

tesseract4 --psm 7 --oem 0 2s.png 2s-out-oem0-psm7.txt

2s-out-oem0-psm7.txt

Shreeshrii · 2018-03-28T09:23:09Z

@zdenop Please label : accuracy.

Shreeshrii · 2018-03-28T09:27:38Z

Another instance reported in forum, in context of recognizing license plates.

Please see https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/tesseract-ocr/qxB-aCa3r6E

Test image is

Shreeshrii · 2018-03-29T03:33:34Z

numbers-dawg has patterns of numbers with punctuation and letters. However currently there is no way to specify patterns such as license plates, VIN, product IDs which are non-dictionary words and random combinations of numbers and letters.

Here are the other two images from error reports:

@theraysmith

Is there a variable which can be set for better accuracy in such cases?

Shreeshrii · 2018-04-30T14:19:11Z

Another issue, reported in the forum

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/tesseract-ocr/6a6sKOXdZsA

I to 1
A to 4

- an image containing "12345678I" => `123456781`
- an image containing "GLOTHUVFI" => `GLOTHUVFI`
- an image containing "12345678H" => `12345678H`
- an image containing "GLOTHUVFH" => `GLOTHUVFH`
- an image containing "12345678A" => `123456784`
- an image containing "GLOTHUVFA" => `GLOTHUVFA`

kolakao · 2019-04-14T11:03:48Z

Unfortunately, I've fallen into the same pit, is there any solution yet maybe?
I guess I've tried everything and all the topics regarding that matter in the internet are left without the solution.

FrancescoSaverioZuppichini · 2019-12-15T12:27:05Z

Same problem here

ghost · 2022-02-24T14:17:44Z

Hello, do you have datasets somewhere available for testing?

SHANDLEMAN · 2022-04-22T15:54:43Z

This thread has been open for 5 years. Has anyone come up with a method for reliably getting tesseract to read a combination of letters and numbers?

Shreeshrii mentioned this issue Mar 8, 2017

RFC: Remove the legacy OCR Engine #707

Closed

Shreeshrii changed the title ~~LSTM: Poor recognition quality of characters following digits~~ LSTM: Non-dictionary words with combination of letters and numbers not recognized. Mar 28, 2018

Shreeshrii mentioned this issue Mar 28, 2018

Zero's being interpreted as O #1363

Closed

zdenop added the accuracy label Mar 29, 2018

Shreeshrii mentioned this issue May 1, 2018

Why Tesseract always recognizes all characters as capital letters ? #1541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM: Non-dictionary words with combination of letters and numbers not recognized. #733

LSTM: Non-dictionary words with combination of letters and numbers not recognized. #733

Shreeshrii commented Feb 22, 2017 •

edited

Loading

Shreeshrii commented Feb 22, 2017 via email •

edited

Loading

andrewisplinghoff commented Feb 23, 2017

Shreeshrii commented Mar 28, 2018

Shreeshrii commented Mar 28, 2018 •

edited

Loading

Shreeshrii commented Mar 29, 2018

Shreeshrii commented Apr 30, 2018 •

edited

Loading

kolakao commented Apr 14, 2019

FrancescoSaverioZuppichini commented Dec 15, 2019

ghost commented Feb 24, 2022

SHANDLEMAN commented Apr 22, 2022

LSTM: Non-dictionary words with combination of letters and numbers not recognized. #733

LSTM: Non-dictionary words with combination of letters and numbers not recognized. #733

Comments

Shreeshrii commented Feb 22, 2017 • edited Loading

Shreeshrii commented Feb 22, 2017 via email • edited Loading

andrewisplinghoff commented Feb 23, 2017

Shreeshrii commented Mar 28, 2018

Shreeshrii commented Mar 28, 2018 • edited Loading

Shreeshrii commented Mar 29, 2018

Shreeshrii commented Apr 30, 2018 • edited Loading

kolakao commented Apr 14, 2019

FrancescoSaverioZuppichini commented Dec 15, 2019

ghost commented Feb 24, 2022

SHANDLEMAN commented Apr 22, 2022

Shreeshrii commented Feb 22, 2017 •

edited

Loading

Shreeshrii commented Feb 22, 2017 via email •

edited

Loading

Shreeshrii commented Mar 28, 2018 •

edited

Loading

Shreeshrii commented Apr 30, 2018 •

edited

Loading