New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata about detected characters: quality scores + alternatives #16

danvk opened this Issue Jan 6, 2015 · 1 comment


None yet
3 participants
Copy link

danvk commented Jan 6, 2015

The ocropus-rpred tool outputs text files of predicted text for each image. It would be nice if there were a way for it to output quality scores for each character, as well as alternatives.

For example, this line:
010004 bin

is being transcribed as:
2. 14E St. Lrand Loncourse, n.w. cor.

It's possible that G is the second most-likely candidate for the first letter in Lrand and C for Loncourse. If I were to build some kind of language model as a post-processing step, it would be clear that G and C are the better choices at those positions.

Some kind of JSON output would be helpful. It might look something like:

    "x": 216,
    "char": "L",
    "candidates": [
        "char": "L",
        "score": 0.9
        "char": "G",
        "score": 0.8

@danvk danvk changed the title More metadata about detected characters Metadata about detected characters Jan 6, 2015


This comment has been minimized.

Copy link

rainkinz commented Jan 7, 2015


QuLogic pushed a commit to QuLogic/ocropy that referenced this issue Dec 15, 2015

Merge pull request tmbdev#16 from tianyaqu/master
requests needs 'params' argument when passing parameters in url

@zuphilip zuphilip changed the title Metadata about detected characters Metadata about detected characters: quality scores + alternatives Dec 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment