Skip to content

makebox doesn't output horizontal coordinates of textangle 90 content #3590

Open
@rmast

Description

@rmast

Environment

tesseract 4.1.1 and 5.0.0-beta-20210916
Linux 5.4.0-81-generic #91-Ubuntu SMP Thu Jul 15 19:09:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

using nld language.
the language nld (and eng) from https://github.com/tesseract-ocr/tessdata with these sizes:
15400601 eng.traineddata
8903736 nld.traineddata

Makebox does output this for a part in the image that has vertically oriented text (textangle 90):
(all horizontal coordinates and widths are 0)

2 1968 0 1982 0 0
0 1985 0 1998 0 0
8 2001 0 2014 0 0
4 2016 0 2030 0 0
- 2041 0 2049 0 0
2 2059 0 2073 0 0
/ 2074 0 2082 0 0
2 2083 0 2097 0 0
2 2116 0 2130 0 0
5 2133 0 2146 0 0
1 2150 0 2158 0 0
9 2165 0 2179 0 0
8 2181 0 2195 0 0
0 2197 0 2211 0 0

This is the image that was used for this data:
210913.nog.2-000na.zip

A similar issue was filed earlier #2340, but the issuer https://github.com/dev884 didn't provide any pointer to his fix, he has no code at all in his account.

Expected Behavior:

I would expect the horizontal coordinates to resemble the ones in the word oriented hocr-output of the same region of the picture.
<span class=\'ocr_line\' id=\'line_1_1\' title="bbox 111 1289 133 1532; textangle 90; x_size 28.416666; x_descenders 7.1041665; x_ascenders 7.1041665">\n <span class=\'ocrx_word\' id=\'word_1_1\' title=\'bbox 112 1470 133 1532; x_wconf 88\'>2084</span>\n <span class=\'ocrx_word\' id=\'word_1_2\' title=\'bbox 124 1451 127 1459; x_wconf 88\'>-</span>\n <span class=\'ocrx_word\' id=\'word_1_3\' title=\'bbox 111 1403 133 1441; x_wconf 96\'>2/2</span>\n <span class=\'ocrx_word\' id=\'word_1_4\' title=\'bbox 112 1289 133 1384; x_wconf 96\'>251980</span>\n </span>\n

Suggested Fix:

Not thought of any yet. I don't know if the workaround of the previous issuer could be made watertight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions