Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“no best words!!” on mixed language (fra+ara) items #235

Closed
acdha opened this issue Feb 22, 2016 · 28 comments
Closed

“no best words!!” on mixed language (fra+ara) items #235

acdha opened this issue Feb 22, 2016 · 28 comments

Comments

@acdha
Copy link

acdha commented Feb 22, 2016

I've noticed a couple of mixed language items which cause Tessearct v3.04.01 (Leptonica 1.72) to crash:

cadams@ganymede:~ $ tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra 
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz score is inf
[DS] Device[2] 1:HD Graphics 5000 score is 0.548963
[DS] Device[3] 0:(null) score is 1.080283
[DS] Selected Device[2]: "HD Graphics 5000" (OpenCL)
Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

Here's an example image:
11002612_2_0183

Interestingly, this appears to depend on the order of the languages – using -l ara or -l fra alone avoids the crash but specifying both in either order will cause it to crash.

@amitdo
Copy link
Collaborator

amitdo commented Feb 23, 2016

Seems like yet another OpenCL bug report...

Try this:
TESSERACT_OPENCL_DEVICE=1 tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra

@acdha
Copy link
Author

acdha commented Feb 23, 2016

I had the same question but the behaviour is identical either with that environmental variable or even using Tesseract which wasn't built with OpenCL at all:

cadams@Ganymede:~ $ tesseract --version
tesseract 3.04.01
 leptonica-1.72
  libjpeg 8d : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.5

cadams@Ganymede:~ $ tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

@amitdo
Copy link
Collaborator

amitdo commented Feb 23, 2016

I suggest changing the title so it will contain the word "Arabic" or "ara".

There were several reports in the past about problems when using Arabic+other lang.
Here is a commit that claims to fix them: 2f197cd

Fixed issues 899/1220/1246 (mixed eng+ara)

In general, Arabic uses a special engine called 'Cube', most other languages use another engine.
Cube code is considered obsolete. There is a plan to drop it and replace it with another engine (based on LSTM). It might happen this year.

@acdha acdha changed the title “no best words!!” on mixed language items “no best words!!” on mixed language (fra+ara) items Feb 23, 2016
@acdha
Copy link
Author

acdha commented Feb 23, 2016

I updated the title. It wasn't clear that these were related since Arabic works fine on its own.

The commit which you referenced is shown as being included in the version (3.04.01) I'm using.

@ghost
Copy link

ghost commented Feb 29, 2016

I just ran into the exact same problem. Arabic alone is processed successfully, but when I try to get Arabic and English read at the same time, tesseract crashes. I'm using Windows version 3.05.00dev.

Another question (I'm totally new to tesseract): When I use arabic language recognition and I read a text with arabic letters, but latin numbers, the latin numbers are not recognized (that's why I wanted to add English as recognition language). In the file "ara.cube.lm" I found the line

Digits=٠١٢٣٤٥٦٧٨٩0123456789

Does this mean,latin numbers should be recognized when I only use arabic as recognition language?

@tfmorris
Copy link
Contributor

Here's the stack trace for the crash

(lldb) bt
* thread #1: tid = 0x68d487, 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(int, PAGE_RES_IT*, C_BLOB*, STRING*, float*) [inlined] WERD_CHOICE::rating(this=0x0000000000000000) const at ratngs.h:325, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x3c)
  * frame #0: 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(int, PAGE_RES_IT*, C_BLOB*, STRING*, float*) [inlined] WERD_CHOICE::rating(this=0x0000000000000000) const at ratngs.h:325
    frame #1: 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(this=<unavailable>, pass_n=2, pr_it=0x00007fff5fbff320, blob=0x0000000105418ab0, best_str=0x00007fff5fbfef40, c2=0x00007fff5fbfef3c) + 637 at control.cpp:1263
    frame #2: 0x0000000100021e66 libtesseract.3.dylib`tesseract::Tesseract::SelectGoodDiacriticOutlines(this=0x0000000101006400, pass=2, certainty_threshold=-8, pr_it=0x00007fff5fbff320, blob=0x0000000105418ab0, outlines=0x00007fff5fbff1c0, num_outlines=6, ok_outlines=0x00007fff5fbff320) + 118 at control.cpp:1124
    frame #3: 0x0000000100021283 libtesseract.3.dylib`tesseract::Tesseract::AssignDiacriticsToOverlappingBlobs(this=0x0000000101006400, outlines=0x00007fff5fbff1c0, pass=2, real_word=<unavailable>, pr_it=0x00007fff5fbff320, word_wanted=0x00007fff5fbff1a0, overlapped_any_blob=<unavailable>, target_blobs=0x0000000111b4e280) + 1923 at control.cpp:1023
    frame #4: 0x000000010001c514 libtesseract.3.dylib`tesseract::Tesseract::ReassignDiacritics(this=0x0000000101006400, pass=2, pr_it=0x00007fff5fbff320, make_next_word_fuzzy=0x00007fff5fbff25f) + 356 at control.cpp:936
    frame #5: 0x000000010001c249 libtesseract.3.dylib`tesseract::Tesseract::RecogAllWordsPassN(this=0x0000000101006400, pass_n=2, monitor=0x0000000000000000, pr_it=0x00007fff5fbff320, words=0x00007fff5fbff2d0) + 537 at control.cpp:258
    frame #6: 0x000000010001d877 libtesseract.3.dylib`tesseract::Tesseract::recog_all_words(this=0x0000000101006400, page_res=0x0000000106e30560, monitor=0x0000000000000000, target_word_box=0x0000000000000000, word_config=0x0000000000000000, dopasses=0) + 1095 at control.cpp:386
    frame #7: 0x000000010000a0ce libtesseract.3.dylib`tesseract::TessBaseAPI::Recognize(this=0x00007fff5fbff8c8, monitor=0x0000000000000000) + 750 at baseapi.cpp:895
    frame #8: 0x000000010000a92b libtesseract.3.dylib`tesseract::TessBaseAPI::ProcessPage(this=<unavailable>, pix=<unavailable>, page_index=<unavailable>, filename=<unavailable>, retry_config=0x0000000000000000, timeout_millisec=<unavailable>, renderer=0x00000000000000

but I think the problem is actually in either classify_word_and_language callees (RetryWithLanguage or the individual recognizers) which shouldn't be returning no words (like the double exclamation points imply) or in ClassifyBlobAsWord which shouldn't assume that there's always a raw choice available.

The latter is easier to do and I was too lazy to dig further into the recognizer, so I generated a patch for that which I'll post.

tfmorris added a commit to tfmorris/tesseract that referenced this issue Feb 29, 2016
tfmorris added a commit to tfmorris/tesseract that referenced this issue Mar 2, 2016
@amitdo amitdo added the bug label May 26, 2016
@anupamaray
Copy link

Hi All,

Does Tesseract support script identification. I have bilingual pages and two different model for different scripts. I want to use a script identifier on each word and call my models accordingly for recognition.
Help will be appreciated.

@tfmorris
Copy link
Contributor

tfmorris commented Jun 3, 2016

@anupamaray Please use the mailing list for questions (and don't hijack issues about unrelated topics). You'll get better answers if you include more details about the scripts, languages, etc.

@amitdo
Copy link
Collaborator

amitdo commented Jun 3, 2016

Hi @anupamaray !

Please read this:
https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md

Try asking your question in the users mailing-list

@amitdo
Copy link
Collaborator

amitdo commented Dec 27, 2016

Is this issue still exist in 4.00 (code in master)?

Probably not, since cube was removed.

@Shreeshrii
Can you test it? (ara+other lang)

@stweil
Copy link
Contributor

stweil commented Dec 27, 2016

This is still an issue: it crashes with --oem 0 or --oem 2:

tesseract /tmp/11002612_2_0183.jpg /tmp/11002612_2_0183 -l ara+fra --oem 0
Found AVX
Found SSE
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Intel(R) HD Graphics IvyBridge M GT2 score is 1.229392
[DS] Device[2] 0:(null) score is 1.146125
[DS] Selected Device[2]: "(null)" (Native)
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file classify/adaptmatch.cpp, line 537
Speicherzugriffsfehler

The crash is obviously unrelated to OpenCL, as it crashed here without using OpenCL.

@Shreeshrii
Copy link
Collaborator

--oem 0 and --oem 2 - both use the tesseract mode, so the problem is in that code.

(gdb) run
Starting program: /usr/local/bin/tesseract test2.jpg test2-ara-fra --oem 0 -l ara+fra
warning: Error disabling address space randomization: Success
warning: linux_ptrace_test_ret_to_nx: PTRACE_KILL waitpid returned -1: Interrupted system call
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 537

Program received signal SIGSEGV, Segmentation fault.
ERRCODE::error (this=this@entry=0x7fddaa95aff8 <_ZL13ASSERT_FAILED>, caller=caller@entry=0x7fddaa6adcc0 "tessdata_manager.SeekToStart(TESSDATA_INTTEMP)", action=action@entry=ABORT,
    format=format@entry=0x7fddaa695c94 "in file %s, line %d") at errcode.cpp:86
86            if (!*p)
(gdb) stacktrace
Undefined command: "stacktrace".  Try "help".
(gdb) backtrace
#0  ERRCODE::error (this=this@entry=0x7fddaa95aff8 <_ZL13ASSERT_FAILED>, caller=caller@entry=0x7fddaa6adcc0 "tessdata_manager.SeekToStart(TESSDATA_INTTEMP)", action=action@entry=ABORT,
    format=format@entry=0x7fddaa695c94 "in file %s, line %d") at errcode.cpp:86
#1  0x00007fddaa5b76e0 in tesseract::Classify::InitAdaptiveClassifier (this=this@entry=0x27e1b70, load_pre_trained_templates=load_pre_trained_templates@entry=true) at adaptmatch.cpp:537
#2  0x00007fddaa5af495 in tesseract::Wordrec::program_editup (this=this@entry=0x27e1b70, textbase=textbase@entry=0x27e1b58 "test2-ara-fra", init_classifier=<optimized out>,
    init_dict=<optimized out>) at tface.cpp:51
#3  0x00007fddaa4d6949 in tesseract::Tesseract::init_tesseract_internal (this=this@entry=0x27e1b70, arg0=arg0@entry=0x0, textbase=textbase@entry=0x27e1b58 "test2-ara-fra",
    language=language@entry=0x27f7d38 "ara", oem=oem@entry=tesseract::OEM_TESSERACT_ONLY, configs=configs@entry=0x7fffca100e30, configs_size=configs_size@entry=0,
    vars_vec=vars_vec@entry=0x605280 <main::vars_vec>, vars_values=vars_values@entry=0x605260 <main::vars_values>, set_only_non_debug_params=set_only_non_debug_params@entry=false)
    at tessedit.cpp:439
#4  0x00007fddaa4d7188 in tesseract::Tesseract::init_tesseract (this=0x27e1b70, arg0=arg0@entry=0x0, textbase=0x27e1b58 "test2-ara-fra", language=language@entry=0x7fffca101064 "ara+fra",
    oem=oem@entry=tesseract::OEM_TESSERACT_ONLY, configs=configs@entry=0x7fffca100e30, configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x605280 <main::vars_vec>,
    vars_values=vars_values@entry=0x605260 <main::vars_values>, set_only_non_debug_params=false) at tessedit.cpp:345
#5  0x00007fddaa4825ac in tesseract::TessBaseAPI::Init (this=this@entry=0x7fffca100c70, datapath=0x0, language=0x7fffca101064 "ara+fra", oem=tesseract::OEM_TESSERACT_ONLY,
    configs=0x7fffca100e30, configs_size=0, vars_vec=vars_vec@entry=0x605280 <main::vars_vec>, vars_values=vars_values@entry=0x605260 <main::vars_values>,
    set_only_non_debug_params=set_only_non_debug_params@entry=false) at baseapi.cpp:306
#6  0x0000000000401fa2 in main (argc=7, argv=0x7fffca100df8) at tesseractmain.cpp:428
(gdb) quit

@Shreeshrii
Copy link
Collaborator

here's is the recognition of original sepia image -
test1-fra-ara-lstm.txt

@stweil
Copy link
Contributor

stweil commented Dec 31, 2016

It was also sufficient to specify -l ara in my test.

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Dec 31, 2016 via email

@jbreiden
Copy link
Contributor

jbreiden commented Dec 31, 2016 via email

@amitdo
Copy link
Collaborator

amitdo commented Dec 31, 2016

I just unpacked the ara.traineddata - it does not have the tesseract model files in it.

It never had tesseract model files in it...

@Shreeshrii
Copy link
Collaborator

Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

The original issue here should be tagged and tested against the 3.05 branch since it is related to cube. the ara.config file in ara.traineddata uses oem 1 (originally for cube and now for LSTM).

The current issue being seen with 4.0alpha, ara not working for --oem 0 and --oem 2 is to be expected since there is no Tesseract model for the Arabic language. So, instead of segfault, the message displayed should be something like the following ...

"Tesseract requested but not present, LSTM engine used instead".

Later, if non-LSTM recognizer is removed this will not apply.

@amitdo
Copy link
Collaborator

amitdo commented Jan 1, 2017

Yes Shree, you are right.

@Shreeshrii
Copy link
Collaborator

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 -l fra+ara
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
no best words!!

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 --oem 0 -l fra+ara
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ../../../../classify/adaptmatch.cpp, line 537

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 --oem 1 -l fra+ara
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.

Problem still there in 3.05.01

Arabic can be used ONLY in --oem 1 mode (cube in 3.05).

Combined language mode tries to apply same --oem to both languages. So, if using Arabic as one of the languages, need to use --oem 1.

However, what would be the result if the second language does have --oem 1 option.

@Shreeshrii
Copy link
Collaborator

@Shreeshrii
Copy link
Collaborator

recently filed issue -
#1579

@amitdo
Copy link
Collaborator

amitdo commented Sep 26, 2018

What's the output with current master code?

  • oem 0, 3, traineddata repo (crash?)
  • oem 1, 3. best/fast

for each option above try:

  • -l fra+ara
  • -l ara+fra

Also try best/fast 'Arabic' (alone, no '+').

@Shreeshrii
Copy link
Collaborator

@amitdo Here are the results - console output, without any pre-processing for image. I also have the OCRed output texts, if you want.

ubuntu@tesseract-ocr:~/TEST$ bash ./test.sh

 *****  ./sepia.jpg OEM 0 PSM 3 LANG ara+fra TESSDATA tessdata ****
Failed loading language 'ara'
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 0 PSM 3 LANG fra+ara TESSDATA tessdata ****
Failed loading language 'ara'
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 SCRIPT Arabic TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 SCRIPT Arabic TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 SCRIPT Arabic TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 SCRIPT Arabic TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
DONE
ubuntu@tesseract-ocr:~/TEST$

@amitdo
Copy link
Collaborator

amitdo commented Oct 3, 2018

Shree, thank you very much for your testing!

@zdenop zdenop closed this as completed in f6fd9b3 Oct 13, 2018
@amitdo
Copy link
Collaborator

amitdo commented Oct 13, 2018

Solution for 4.0.0: When using 'ara', only use traineddata files from best or fast repos.

3.0x versions are not supported by the Tesseract team anymore.
If you still use it, ara+lang2 was never supported, so don't use ara with another lang.

@amitdo
Copy link
Collaborator

amitdo commented Oct 14, 2018

Hi @MariamHijazi,

https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md

Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.


Title of this issue: “no best words!!” on mixed language (fra+ara) items

If you are using tesseract command line program and both of your traineddata files are from best or fast, you probably don't get this error message, so it's not the same issue.

@netwons
Copy link

netwons commented Nov 21, 2020

tesseract 3.jpg out -l ara+eng
The problem could not be solved

what do we do?
solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants