Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSV output splits each word by newline AND space #3985

Closed
1 of 2 tasks
Antsthebul opened this issue Dec 26, 2022 · 2 comments
Closed
1 of 2 tasks

TSV output splits each word by newline AND space #3985

Antsthebul opened this issue Dec 26, 2022 · 2 comments
Labels
awaiting feedback output issues related output formats

Comments

@Antsthebul
Copy link

Antsthebul commented Dec 26, 2022

Basic Information

tesseract v5.3.0.20221222
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

Windows

  • Windows 11
  • Windows 10

Current Behavior

The string output of the file is correct,

[..]
Nutrition Facts 4
[..]

yet when selecting tsv output. Each word is placed on a newline.

5       1       1       1       1       1       48      0       562     323     76.177887       Nutrition
5       1       1       1       1       2       661     64      358     188     96.668480       Facts
5       1       1       1       1       3       1062    0       60      269     55.497231       4

Expected Behavior

To display the information similar to the string output.

Suggested Fix

Is there a way to omit/combine the items within the word_num column? Using psm did not have any effect

Other Information

No response

@amitdo
Copy link
Collaborator

amitdo commented Dec 27, 2022

Please provide the input image.

Also provide tsv and txt output files. You can make a zip archive that will contain these files, so GitHub will let you upload them.

@amitdo amitdo added the output issues related output formats label Dec 27, 2022
@amitdo
Copy link
Collaborator

amitdo commented Jan 13, 2023

No feedback from the OP.

@amitdo amitdo closed this as completed Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback output issues related output formats
Projects
None yet
Development

No branches or pull requests

2 participants