“no best words!!” on mixed language (fra+ara) items #235

acdha · 2016-02-22T22:12:58Z

I've noticed a couple of mixed language items which cause Tessearct v3.04.01 (Leptonica 1.72) to crash:

cadams@ganymede:~ $ tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra 
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz score is inf
[DS] Device[2] 1:HD Graphics 5000 score is 0.548963
[DS] Device[3] 0:(null) score is 1.080283
[DS] Selected Device[2]: "HD Graphics 5000" (OpenCL)
Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

Here's an example image:

Interestingly, this appears to depend on the order of the languages – using -l ara or -l fra alone avoids the crash but specifying both in either order will cause it to crash.

The text was updated successfully, but these errors were encountered:

amitdo · 2016-02-23T07:02:00Z

Seems like yet another OpenCL bug report...

Try this:
TESSERACT_OPENCL_DEVICE=1 tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra

acdha · 2016-02-23T14:40:48Z

I had the same question but the behaviour is identical either with that environmental variable or even using Tesseract which wasn't built with OpenCL at all:

cadams@Ganymede:~ $ tesseract --version
tesseract 3.04.01
 leptonica-1.72
  libjpeg 8d : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.5

cadams@Ganymede:~ $ tesseract 11002612_2_0183.jpg 11002612_2_0183 -l ara+fra
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

amitdo · 2016-02-23T17:20:48Z

I suggest changing the title so it will contain the word "Arabic" or "ara".

There were several reports in the past about problems when using Arabic+other lang.
Here is a commit that claims to fix them: 2f197cd

Fixed issues 899/1220/1246 (mixed eng+ara)

In general, Arabic uses a special engine called 'Cube', most other languages use another engine.
Cube code is considered obsolete. There is a plan to drop it and replace it with another engine (based on LSTM). It might happen this year.

acdha · 2016-02-23T17:59:09Z

I updated the title. It wasn't clear that these were related since Arabic works fine on its own.

The commit which you referenced is shown as being included in the version (3.04.01) I'm using.

ghost · 2016-02-29T14:33:45Z

I just ran into the exact same problem. Arabic alone is processed successfully, but when I try to get Arabic and English read at the same time, tesseract crashes. I'm using Windows version 3.05.00dev.

Another question (I'm totally new to tesseract): When I use arabic language recognition and I read a text with arabic letters, but latin numbers, the latin numbers are not recognized (that's why I wanted to add English as recognition language). In the file "ara.cube.lm" I found the line

Digits=٠١٢٣٤٥٦٧٨٩0123456789

Does this mean,latin numbers should be recognized when I only use arabic as recognition language?

tfmorris · 2016-02-29T19:16:12Z

Here's the stack trace for the crash

(lldb) bt
* thread #1: tid = 0x68d487, 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(int, PAGE_RES_IT*, C_BLOB*, STRING*, float*) [inlined] WERD_CHOICE::rating(this=0x0000000000000000) const at ratngs.h:325, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x3c)
  * frame #0: 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(int, PAGE_RES_IT*, C_BLOB*, STRING*, float*) [inlined] WERD_CHOICE::rating(this=0x0000000000000000) const at ratngs.h:325
    frame #1: 0x000000010002294d libtesseract.3.dylib`tesseract::Tesseract::ClassifyBlobAsWord(this=<unavailable>, pass_n=2, pr_it=0x00007fff5fbff320, blob=0x0000000105418ab0, best_str=0x00007fff5fbfef40, c2=0x00007fff5fbfef3c) + 637 at control.cpp:1263
    frame #2: 0x0000000100021e66 libtesseract.3.dylib`tesseract::Tesseract::SelectGoodDiacriticOutlines(this=0x0000000101006400, pass=2, certainty_threshold=-8, pr_it=0x00007fff5fbff320, blob=0x0000000105418ab0, outlines=0x00007fff5fbff1c0, num_outlines=6, ok_outlines=0x00007fff5fbff320) + 118 at control.cpp:1124
    frame #3: 0x0000000100021283 libtesseract.3.dylib`tesseract::Tesseract::AssignDiacriticsToOverlappingBlobs(this=0x0000000101006400, outlines=0x00007fff5fbff1c0, pass=2, real_word=<unavailable>, pr_it=0x00007fff5fbff320, word_wanted=0x00007fff5fbff1a0, overlapped_any_blob=<unavailable>, target_blobs=0x0000000111b4e280) + 1923 at control.cpp:1023
    frame #4: 0x000000010001c514 libtesseract.3.dylib`tesseract::Tesseract::ReassignDiacritics(this=0x0000000101006400, pass=2, pr_it=0x00007fff5fbff320, make_next_word_fuzzy=0x00007fff5fbff25f) + 356 at control.cpp:936
    frame #5: 0x000000010001c249 libtesseract.3.dylib`tesseract::Tesseract::RecogAllWordsPassN(this=0x0000000101006400, pass_n=2, monitor=0x0000000000000000, pr_it=0x00007fff5fbff320, words=0x00007fff5fbff2d0) + 537 at control.cpp:258
    frame #6: 0x000000010001d877 libtesseract.3.dylib`tesseract::Tesseract::recog_all_words(this=0x0000000101006400, page_res=0x0000000106e30560, monitor=0x0000000000000000, target_word_box=0x0000000000000000, word_config=0x0000000000000000, dopasses=0) + 1095 at control.cpp:386
    frame #7: 0x000000010000a0ce libtesseract.3.dylib`tesseract::TessBaseAPI::Recognize(this=0x00007fff5fbff8c8, monitor=0x0000000000000000) + 750 at baseapi.cpp:895
    frame #8: 0x000000010000a92b libtesseract.3.dylib`tesseract::TessBaseAPI::ProcessPage(this=<unavailable>, pix=<unavailable>, page_index=<unavailable>, filename=<unavailable>, retry_config=0x0000000000000000, timeout_millisec=<unavailable>, renderer=0x00000000000000

but I think the problem is actually in either classify_word_and_language callees (RetryWithLanguage or the individual recognizers) which shouldn't be returning no words (like the double exclamation points imply) or in ClassifyBlobAsWord which shouldn't assume that there's always a raw choice available.

The latter is easier to do and I was too lazy to dig further into the recognizer, so I generated a patch for that which I'll post.

anupamaray · 2016-06-03T16:20:29Z

Hi All,

Does Tesseract support script identification. I have bilingual pages and two different model for different scripts. I want to use a script identifier on each word and call my models accordingly for recognition.
Help will be appreciated.

tfmorris · 2016-06-03T16:28:12Z

@anupamaray Please use the mailing list for questions (and don't hijack issues about unrelated topics). You'll get better answers if you include more details about the scripts, languages, etc.

amitdo · 2016-06-03T16:38:31Z

Hi @anupamaray !

Please read this:
https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md

Try asking your question in the users mailing-list

amitdo · 2016-12-27T14:17:07Z

Is this issue still exist in 4.00 (code in master)?

Probably not, since cube was removed.

@Shreeshrii
Can you test it? (ara+other lang)

stweil · 2016-12-27T14:42:14Z

This is still an issue: it crashes with --oem 0 or --oem 2:

tesseract /tmp/11002612_2_0183.jpg /tmp/11002612_2_0183 -l ara+fra --oem 0
Found AVX
Found SSE
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:Intel(R) HD Graphics IvyBridge M GT2 score is 1.229392
[DS] Device[2] 0:(null) score is 1.146125
[DS] Selected Device[2]: "(null)" (Native)
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file classify/adaptmatch.cpp, line 537
Speicherzugriffsfehler

The crash is obviously unrelated to OpenCL, as it crashed here without using OpenCL.

Shreeshrii · 2016-12-31T14:27:49Z

--oem 0 and --oem 2 - both use the tesseract mode, so the problem is in that code.

(gdb) run
Starting program: /usr/local/bin/tesseract test2.jpg test2-ara-fra --oem 0 -l ara+fra
warning: Error disabling address space randomization: Success
warning: linux_ptrace_test_ret_to_nx: PTRACE_KILL waitpid returned -1: Interrupted system call
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 537

Program received signal SIGSEGV, Segmentation fault.
ERRCODE::error (this=this@entry=0x7fddaa95aff8 <_ZL13ASSERT_FAILED>, caller=caller@entry=0x7fddaa6adcc0 "tessdata_manager.SeekToStart(TESSDATA_INTTEMP)", action=action@entry=ABORT,
    format=format@entry=0x7fddaa695c94 "in file %s, line %d") at errcode.cpp:86
86            if (!*p)
(gdb) stacktrace
Undefined command: "stacktrace".  Try "help".
(gdb) backtrace
#0  ERRCODE::error (this=this@entry=0x7fddaa95aff8 <_ZL13ASSERT_FAILED>, caller=caller@entry=0x7fddaa6adcc0 "tessdata_manager.SeekToStart(TESSDATA_INTTEMP)", action=action@entry=ABORT,
    format=format@entry=0x7fddaa695c94 "in file %s, line %d") at errcode.cpp:86
#1  0x00007fddaa5b76e0 in tesseract::Classify::InitAdaptiveClassifier (this=this@entry=0x27e1b70, load_pre_trained_templates=load_pre_trained_templates@entry=true) at adaptmatch.cpp:537
#2  0x00007fddaa5af495 in tesseract::Wordrec::program_editup (this=this@entry=0x27e1b70, textbase=textbase@entry=0x27e1b58 "test2-ara-fra", init_classifier=<optimized out>,
    init_dict=<optimized out>) at tface.cpp:51
#3  0x00007fddaa4d6949 in tesseract::Tesseract::init_tesseract_internal (this=this@entry=0x27e1b70, arg0=arg0@entry=0x0, textbase=textbase@entry=0x27e1b58 "test2-ara-fra",
    language=language@entry=0x27f7d38 "ara", oem=oem@entry=tesseract::OEM_TESSERACT_ONLY, configs=configs@entry=0x7fffca100e30, configs_size=configs_size@entry=0,
    vars_vec=vars_vec@entry=0x605280 <main::vars_vec>, vars_values=vars_values@entry=0x605260 <main::vars_values>, set_only_non_debug_params=set_only_non_debug_params@entry=false)
    at tessedit.cpp:439
#4  0x00007fddaa4d7188 in tesseract::Tesseract::init_tesseract (this=0x27e1b70, arg0=arg0@entry=0x0, textbase=0x27e1b58 "test2-ara-fra", language=language@entry=0x7fffca101064 "ara+fra",
    oem=oem@entry=tesseract::OEM_TESSERACT_ONLY, configs=configs@entry=0x7fffca100e30, configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x605280 <main::vars_vec>,
    vars_values=vars_values@entry=0x605260 <main::vars_values>, set_only_non_debug_params=false) at tessedit.cpp:345
#5  0x00007fddaa4825ac in tesseract::TessBaseAPI::Init (this=this@entry=0x7fffca100c70, datapath=0x0, language=0x7fffca101064 "ara+fra", oem=tesseract::OEM_TESSERACT_ONLY,
    configs=0x7fffca100e30, configs_size=0, vars_vec=vars_vec@entry=0x605280 <main::vars_vec>, vars_values=vars_values@entry=0x605260 <main::vars_values>,
    set_only_non_debug_params=set_only_non_debug_params@entry=false) at baseapi.cpp:306
#6  0x0000000000401fa2 in main (argc=7, argv=0x7fffca100df8) at tesseractmain.cpp:428
(gdb) quit

Shreeshrii · 2016-12-31T14:32:25Z

here's is the recognition of original sepia image -
test1-fra-ara-lstm.txt

stweil · 2016-12-31T15:27:33Z

It was also sufficient to specify -l ara in my test.

Shreeshrii · 2016-12-31T16:00:00Z

I just unpacked the ara.traineddata - it does not have the tesseract model files in it. combine_tessdata -u ara.traineddata ara. Extracting tessdata components from ara.traineddata Wrote ara.config Wrote ara.unicharset Wrote ara.punc-dawg Wrote ara.word-dawg Wrote ara.number-dawg Wrote ara.freq-dawg Wrote ara.lstm Wrote ara.lstm-punc-dawg Wrote ara.lstm-word-dawg Wrote ara.lstm-number-dawg ShreeDevi

…

____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Sat, Dec 31, 2016 at 8:57 PM, Stefan Weil ***@***.***> wrote: It was also sufficient to specify -l ara in my test. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#235 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE2_o0ZzhLREr78qnJF_KgztsNsJEPDCks5rNnRsgaJpZM4HgCcw> .

jbreiden · 2016-12-31T16:33:36Z

Last time we talked Ray said he was leaning toward deletion of the non-LSTM recognizer.

amitdo · 2016-12-31T16:43:03Z

I just unpacked the ara.traineddata - it does not have the tesseract model files in it.

It never had tesseract model files in it...

Shreeshrii · 2017-01-01T12:22:47Z

Warning in pixReadMemJpeg: work-around: writing to a temp file
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!
Segmentation fault: 11

The original issue here should be tagged and tested against the 3.05 branch since it is related to cube. the ara.config file in ara.traineddata uses oem 1 (originally for cube and now for LSTM).

The current issue being seen with 4.0alpha, ara not working for --oem 0 and --oem 2 is to be expected since there is no Tesseract model for the Arabic language. So, instead of segfault, the message displayed should be something like the following ...

"Tesseract requested but not present, LSTM engine used instead".

Later, if non-LSTM recognizer is removed this will not apply.

amitdo · 2017-01-01T13:43:04Z

Yes Shree, you are right.

Shreeshrii · 2017-06-11T09:53:52Z

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 -l fra+ara
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
no best words!!

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 --oem 0 -l fra+ara
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file ../../../../classify/adaptmatch.cpp, line 537

C:\Users\vvkum\test>tesseract sepia.jpg sepia --psm 6 --oem 1 -l fra+ara
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.

Problem still there in 3.05.01

Arabic can be used ONLY in --oem 1 mode (cube in 3.05).

Combined language mode tries to apply same --oem to both languages. So, if using Arabic as one of the languages, need to use --oem 1.

However, what would be the result if the second language does have --oem 1 option.

Shreeshrii · 2018-05-19T07:35:15Z

More mixed language issues

reported in forum
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/SLll0b7MYiU/F2ad9lxMCQAJ

Shreeshrii · 2018-05-19T07:36:11Z

recently filed issue -
#1579

amitdo · 2018-09-26T18:57:55Z

What's the output with current master code?

oem 0, 3, traineddata repo (crash?)
oem 1, 3. best/fast

for each option above try:

-l fra+ara
-l ara+fra

Also try best/fast 'Arabic' (alone, no '+').

Shreeshrii · 2018-10-01T15:40:44Z

@amitdo Here are the results - console output, without any pre-processing for image. I also have the OCRed output texts, if you want.

ubuntu@tesseract-ocr:~/TEST$ bash ./test.sh

 *****  ./sepia.jpg OEM 0 PSM 3 LANG ara+fra TESSDATA tessdata ****
Failed loading language 'ara'
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 0 PSM 3 LANG fra+ara TESSDATA tessdata ****
Failed loading language 'ara'
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
no best words!!

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG ara+fra TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 LANG fra+ara TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG ara+fra TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 LANG fra+ara TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 SCRIPT Arabic TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 1 PSM 3 SCRIPT Arabic TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 SCRIPT Arabic TESSDATA tessdata_best ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box

 *****  ./sepia.jpg OEM 3 PSM 3 SCRIPT Arabic TESSDATA tessdata_fast ****
Tesseract Open Source OCR Engine v4.0.0-beta.4-164-g5dfce with Leptonica
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 327
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
DONE
ubuntu@tesseract-ocr:~/TEST$

amitdo · 2018-10-03T16:50:10Z

Shree, thank you very much for your testing!

amitdo · 2018-10-13T09:29:01Z

Solution for 4.0.0: When using 'ara', only use traineddata files from best or fast repos.

3.0x versions are not supported by the Tesseract team anymore.
If you still use it, ara+lang2 was never supported, so don't use ara with another lang.

amitdo · 2018-10-14T09:08:35Z

Hi @MariamHijazi,

https://github.com/tesseract-ocr/tesseract/blob/master/CONTRIBUTING.md

Make sure you are able to replicate the problem with Tesseract command line program. For external programs that use Tesseract (including wrappers and your own program, if you are developer), report the issue to the developers of that software if it's possible. You can also try to find help in the Tesseract forum.

Title of this issue: “no best words!!” on mixed language (fra+ara) items

If you are using tesseract command line program and both of your traineddata files are from best or fast, you probably don't get this error message, so it's not the same issue.

netwons · 2020-11-21T12:40:27Z

tesseract 3.jpg out -l ara+eng
The problem could not be solved

what do we do?
solution?

acdha changed the title ~~“no best words!!” on mixed language items~~ “no best words!!” on mixed language (fra+ara) items Feb 23, 2016

tfmorris added a commit to tfmorris/tesseract that referenced this issue Feb 29, 2016

Handle null raw_choice - fixes tesseract-ocr#235

f5b1b34

tfmorris added a commit to tfmorris/tesseract that referenced this issue Mar 2, 2016

Handle null raw_choice - fixes tesseract-ocr#235

eaefbea

amitdo added the bug label May 26, 2016

amitdo mentioned this issue Sep 19, 2016

english the api works well but if i use the arabic trained data the app crashes #428

Closed

amitdo mentioned this issue Jan 15, 2018

Tesseract segmentation fault when using Arabic and English #1275

Closed

stweil mentioned this issue Apr 20, 2018

Handle null raw_choice - fixes #235 #246

Closed

Shreeshrii mentioned this issue May 24, 2018

“low accuracy” on mixed language (fas+eng) #1599

Closed

Shreeshrii mentioned this issue May 25, 2018

Segmentation fault OCRing a washed out image #1601

Closed

zdenop mentioned this issue Sep 30, 2018

Arabic + English Language #1934

Closed

zdenop closed this as completed in f6fd9b3 Oct 13, 2018

jrobble mentioned this issue Dec 12, 2018

Finalize R3.0.0 Release Notes openmpf/openmpf#762

Closed

jrobble mentioned this issue Dec 21, 2018

Tesseract OCR component does not support ara+eng openmpf/openmpf#783

Closed

amitdo added the boxClipToRectangle label Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

“no best words!!” on mixed language (fra+ara) items #235

“no best words!!” on mixed language (fra+ara) items #235

acdha commented Feb 22, 2016

amitdo commented Feb 23, 2016

acdha commented Feb 23, 2016

amitdo commented Feb 23, 2016

acdha commented Feb 23, 2016

ghost commented Feb 29, 2016

tfmorris commented Feb 29, 2016

anupamaray commented Jun 3, 2016

tfmorris commented Jun 3, 2016

amitdo commented Jun 3, 2016 •

edited

Loading

amitdo commented Dec 27, 2016

stweil commented Dec 27, 2016 •

edited

Loading

Shreeshrii commented Dec 31, 2016

Shreeshrii commented Dec 31, 2016

stweil commented Dec 31, 2016

Shreeshrii commented Dec 31, 2016 via email

jbreiden commented Dec 31, 2016 via email

amitdo commented Dec 31, 2016

Shreeshrii commented Jan 1, 2017

amitdo commented Jan 1, 2017

Shreeshrii commented Jun 11, 2017

Shreeshrii commented May 19, 2018

Shreeshrii commented May 19, 2018

amitdo commented Sep 26, 2018 •

edited

Loading

Shreeshrii commented Oct 1, 2018

amitdo commented Oct 3, 2018

amitdo commented Oct 13, 2018 •

edited

Loading

amitdo commented Oct 14, 2018 •

edited

Loading

netwons commented Nov 21, 2020 •

edited

Loading

“no best words!!” on mixed language (fra+ara) items #235

“no best words!!” on mixed language (fra+ara) items #235

Comments

acdha commented Feb 22, 2016

amitdo commented Feb 23, 2016

acdha commented Feb 23, 2016

amitdo commented Feb 23, 2016

acdha commented Feb 23, 2016

ghost commented Feb 29, 2016

tfmorris commented Feb 29, 2016

anupamaray commented Jun 3, 2016

tfmorris commented Jun 3, 2016

amitdo commented Jun 3, 2016 • edited Loading

amitdo commented Dec 27, 2016

stweil commented Dec 27, 2016 • edited Loading

Shreeshrii commented Dec 31, 2016

Shreeshrii commented Dec 31, 2016

stweil commented Dec 31, 2016

Shreeshrii commented Dec 31, 2016 via email

jbreiden commented Dec 31, 2016 via email

amitdo commented Dec 31, 2016

Shreeshrii commented Jan 1, 2017

amitdo commented Jan 1, 2017

Shreeshrii commented Jun 11, 2017

Shreeshrii commented May 19, 2018

Shreeshrii commented May 19, 2018

amitdo commented Sep 26, 2018 • edited Loading

Shreeshrii commented Oct 1, 2018

amitdo commented Oct 3, 2018

amitdo commented Oct 13, 2018 • edited Loading

amitdo commented Oct 14, 2018 • edited Loading

netwons commented Nov 21, 2020 • edited Loading

amitdo commented Jun 3, 2016 •

edited

Loading

stweil commented Dec 27, 2016 •

edited

Loading

amitdo commented Sep 26, 2018 •

edited

Loading

amitdo commented Oct 13, 2018 •

edited

Loading

amitdo commented Oct 14, 2018 •

edited

Loading

netwons commented Nov 21, 2020 •

edited

Loading