-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arabic Language output is reversed #169
Comments
Hi @Christophered, How did you generate the tif images for training? Did you use the Did you use From https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract#dictionary-data-optional
You can also try to use In general, the right place to ask questions like this is here: See also: |
Thank you for reply |
Try using this config file: Remove this line:
|
The problem has been solved! Thanks to the user (amitdo) The solution was: This solved my 2 problems of Arabic Language reversed words, and Arabic Language combined word. |
@Christophered |
roozgar, I will conduct some tests and will reply back after couple of days |
@Christophered |
Thank you roozgar, I appreciate you |
@Christophered |
I have tested tesseract 3.02+3.04+3.05dev all have failed in arabic ocr. |
@Christophered do you have any plan to work more on this subject?! |
The official trained data uses the 'Cube' engine. There is no documented way to train 'Cube' with other fonts. |
@amitdo Oops! i found this https://code.google.com/archive/p/tesseract-ocr-extradocs/wikis/Cube.wiki its really undocumented!! |
I suppose they have a program for that task... |
@amitdo who are they? i there any way to find who build each trained file? |
Developers from Google. |
@Christophered |
Hi @AREEBAKAMIL I have replyed by email also here is the Tutorial that you requested. also here is one method to improve the recognition, just for testing Please remember this is for the Arabic Language, the recognition rate is low to moderate. in jtessboxeditor: |
@Christophered https://tsl620atnaz.wikispaces.com/file/view/arabic.gif/130745645/arabic.gif |
@Shreeshrii |
Hi ,
|
@amitdo should this work for hebrew as well? Do I need to create training data myself (i.e. "freaquent_words_list" + "words_list" etc)? 10x |
Hi Uri ! There is a tessdata package for Hebrew. Try to use it before you start training Hebrew. Also, read this page:
By 'this' you mean
Yes, it should be used for Hebrew too. If you have further questions please use the forum (I'm not participating there). For Hebrew OCR questions / discussions you can also try here: |
@amitdo thank you very much! I will go through you suggestions. |
Hi, Can you please send me a trained data file for arabic language for tesseract 3.0.2? Thank you in advance, |
@adinetoiu I suggest that you skip using Tesseract 3.x for Arabic, instead use Tesseract 4. |
Thank you very much!
From: chris <notifications@github.com>
To: tesseract-ocr/tesseract <tesseract@noreply.github.com>
Cc: adinetoiu <adinetoiu@yahoo.com>; Mention <mention@noreply.github.com>
Sent: Monday, June 19, 2017 5:14 PM
Subject: Re: [tesseract-ocr/tesseract] Arabic Language output is reversed (#169)
@adinetoiu I suggest that you skip using Tesseract 3.x for Arabic, instead use Tesseract 4.
a binary is also available at http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Do you have a sample project or link that uses tesseract 4?
From: chris <notifications@github.com>
To: tesseract-ocr/tesseract <tesseract@noreply.github.com>
Cc: adinetoiu <adinetoiu@yahoo.com>; Mention <mention@noreply.github.com>
Sent: Monday, June 19, 2017 5:14 PM
Subject: Re: [tesseract-ocr/tesseract] Arabic Language output is reversed (#169)
@adinetoiu I suggest that you skip using Tesseract 3.x for Arabic, instead use Tesseract 4.
a binary is also available at http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Both gimagereader and vietocr have versions which use tesseract 4.
ShreeDevi
…____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jun 19, 2017 at 8:07 PM, adinetoiu ***@***.***> wrote:
Do you have a sample project or link that uses tesseract 4?
From: chris ***@***.***>
To: tesseract-ocr/tesseract ***@***.***>
Cc: adinetoiu ***@***.***>; Mention ***@***.***>
Sent: Monday, June 19, 2017 5:14 PM
Subject: Re: [tesseract-ocr/tesseract] Arabic Language output is reversed
(#169)
@adinetoiu I suggest that you skip using Tesseract 3.x for Arabic, instead
use Tesseract 4.
a binary is also available at http://digi.bib.uni-mannheim.
de/tesseract/tesseract-ocr-setup-4.00.00dev.exe—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_ozNSJAJOADt4YTvs6TLIAMnHaa7Vks5sFoeSgaJpZM4GzPQj>
.
|
@adinetoiu I have contacted the developer of jtessboxeditor, he stated that it might take time until we see an automated lstm trainer, until then, you must train manually. |
hello man, please can you send me the ara.traineddata so i can test it, i don't know how to train iOS tesseract 3.x to recognize arabic in a great way? |
hello every one, i have a issue in urdu language data, any expert here who can help me please mail me. |
Hi there,
I have created my own Arabic Language traindata, but the problem is that when used it gives the recognized text reversely (opposite direction), noting that the Arabic and Hebrew languages are written and read from Right to left handside (RTL).
People keep implying to use Cube for training Arabic, but I think no one really knows how to use Cube for training, and yes I have read the tesseract extra Cube documentation, and it seems that they purposely don't want anyone to use Cube.
How can I make a tesseract traineddata that recognize RTL languages as Arabic correctly?
Waiting for your reply
The text was updated successfully, but these errors were encountered: