Implemented computation of probability matrix #279

BingqingQu · 2017-12-15T09:46:41Z

This is intended to be an extension of the --probabilities.
Instead of just printing the probabilities for the recognised characters, --probmat will compute the complete probability matrix.

At each "timestep" the probability for each character is computed.
This can/could be used as input to a language model for example where one would have access to the probabilities of other characters as well.

zuphilip · 2017-12-16T12:30:50Z

Is your code complete? It looks that the variables out and timestamp are not used further...

Can you give more information about the output format? I see that the files have always 156 lines with several probalities, but none of these values seem to be equal the ones which are outputed with --probabilities.

amitdo · 2017-12-16T16:53:16Z

https://github.com/tmbdev/ocropy/wiki/OCRopus-File-Formats#lattice-files
This format was used in ocropy 0.6.

zuphilip · 2017-12-16T17:34:17Z

@amitdo The outputed files look differently. Here is an example:

010001.pm.txt
010001.prob.txt

amitdo · 2017-12-16T18:15:27Z

His patch just outputs the raw result of the prediction.

What you see with the current (without this parch) text/prob. options is the 'best' path that translate_back() found for you.

The format in my link is more human readable.
I was not very clear in my previous comment, sorry about that.

amitdo · 2017-12-16T18:30:38Z

Related: #25

amitdo · 2017-12-16T20:51:46Z

The number of lines (156) is the size of the codec (chars) in the model you use.

zuphilip · 2017-12-20T00:04:54Z

Okay, I don't think that this matrix is then enough interesting for an option to ocropus-rpred. One can use ocrolib as a library for such computations. More advanced lattice/alternative calculations could be interesting as outlined in #186.

zuphilip · 2017-12-22T10:34:35Z

There is also the --save and --show option for a visual debug info about these matrix.

Implemented computation of probability matrix

59df568

zuphilip added the 👾 invalid label Dec 22, 2017

amitdo mentioned this pull request Oct 16, 2018

Added the option for character accumulated glyph confidences. tesseract-ocr/tesseract#1851

Merged

bertsky mentioned this pull request Mar 20, 2019

RFC: Lattice Output tesseract-ocr/tesseract#2339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented computation of probability matrix #279

Implemented computation of probability matrix #279

BingqingQu commented Dec 15, 2017

zuphilip commented Dec 16, 2017

amitdo commented Dec 16, 2017

zuphilip commented Dec 16, 2017

amitdo commented Dec 16, 2017 •

edited

amitdo commented Dec 16, 2017

amitdo commented Dec 16, 2017

zuphilip commented Dec 20, 2017

zuphilip commented Dec 22, 2017

Implemented computation of probability matrix #279

Are you sure you want to change the base?

Implemented computation of probability matrix #279

Conversation

BingqingQu commented Dec 15, 2017

zuphilip commented Dec 16, 2017

amitdo commented Dec 16, 2017

zuphilip commented Dec 16, 2017

amitdo commented Dec 16, 2017 • edited

amitdo commented Dec 16, 2017

amitdo commented Dec 16, 2017

zuphilip commented Dec 20, 2017

zuphilip commented Dec 22, 2017

amitdo commented Dec 16, 2017 •

edited