-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr_.Init(traineddata_path.c_str()):Error:Assert failed: #1075
Comments
Do you put the traineddate in the write directory? Then, I think this will be resolved. I have met the same problem, and this is my resolution. |
It helped. Thanks. But than I got Loaded file C:/Temp/engbest/eng.lstm, unpacking... stephenyong2005 Did you see error like this in your case? |
Sorry, I do not got such a error like you mentioned, but i got another error. By the way, I am a new user of tesseract. Do you know how to generate a lstm file? Thanks.
2017年8月13日 16:09,iuriigalaida <notifications@github.com>写道:
It helped. Thanks. But than I got
Loaded file C:/Temp/engbest/eng.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 105 to 105!!
Failed to continue from: C:/Temp/engbest/eng.lstm
stephenyong2005 Did you see error like this in your case?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1075 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJBNQGx7L5w2ALkdkGuuWtd3SzwqXgI2ks5sX1fwgaJpZM4O0iiS>.
|
Are you using the latest source from github for building tesseract? What is your version info, git log info? |
Also see #1069 |
I use v4.0.0. Compile from the source code instead of git. OS is ubuntu 17.04. I have finished the demo procedure in the wiki web page of training tesseract 4.0 by following the tutorial.
2017年8月14日 00:18,Shreeshrii <notifications@github.com>写道:
Are you using the latest source from github for building tesseract? What is your version info, git log info?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1075 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJBNQPaOqYXe8oSSfc6LX8aVtkQ-yNRIks5sX8qFgaJpZM4O0iiS>.
|
this is the tesseract wiki homepage: https://github.com/tesseract-ocr/tesseract/wiki
2017年8月14日 00:56,Shreeshrii <notifications@github.com>写道:
Compile from the source code instead of git.
from where?
ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Aug 14, 2017 at 9:57 AM, stephenyong2005 ***@***.***> wrote:
I use v4.0.0. Compile from the source code instead of git. OS is ubuntu
17.04. I have finished the demo procedure in the wiki web page of training
tesseract 4.0 by following the tutorial.
2017年8月14日 00:18,Shreeshrii ***@***.***>写道:
Are you using the latest source from github for building tesseract? What
is your version info, git log info?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://github.com/
tesseract-ocr/tesseract#1075#issuecomment-322096544>, or mute the
thread<https://github.com/notifications/unsubscribe-auth/
AJBNQPaOqYXe8oSSfc6LX8aVtkQ-yNRIks5sX8qFgaJpZM4O0iiS>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1075 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_ozzaFTAJzfednKsAdNPo1yuTOjnJks5sX8yZgaJpZM4O0iiS>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1075 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AJBNQGHjjAab3USEZ0D4HZKlj1RaZ8f_ks5sX9NqgaJpZM4O0iiS>.
|
After I get latest sourses v4 lstmtraining started work but I almost always got errors like |
That means you have new characters in your training text, so fine tuning
may not work.
…On 14-Aug-2017 4:08 PM, "iuriigalaida" ***@***.***> wrote:
After I get latest sourses v4 lstmtraining started work but I almost
always got errors like
Encoding of string failed!
Can't encode transcription:
Even if I directly set path to unicharset file.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1075 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_ox6hWfOOzKYP7lTVLo-hmT6EzaRFks5sYCOsgaJpZM4O0iiS>
.
|
yes. i have 44 new character and I wont to extend english traineddata. Attached my unicharset file. How can I aviod this encoding issues? |
Unicharset was not attached - see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#training-just-a-few-layers Have you tried Latin.traineddata - it will have more characters than eng in it. |
Thanks for the unicharset. I notice it has accented english letters.
1. Test with Latin.traineddata - which is for Latin script not Latin
language (lat).
2. Check that all your additional characters are in best/Latin.traineddata
- use combine_tessdata -u to unpack the traineddata and look at its
unicharset.
I find it easy to sort a copy of the unicharsets and then compare .
3. If Latin.traineddata has all characters you want, then you can do
finetune using Latin.traineddata as the continue_from rather than English.
|
Re: 'mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../../../lstm/lstmtrainer.h, line 110' This comes when the path to the traineddata files is incorrect. Check file locations and correct the command for training. |
The command should use tprintf() with a meaningful message and exit(), not assert. |
|
What is the expected solution here? Just note that path does not exist? diff --git a/src/ccmain/tessedit.cpp b/src/ccmain/tessedit.cpp
index 3ad8d921..6b094129 100644
--- a/src/ccmain/tessedit.cpp
+++ b/src/ccmain/tessedit.cpp
@@ -90,7 +90,6 @@ bool Tesseract::init_tesseract_lang_data(const std::string &arg0,
// Initialize TessdataManager.
std::string tessdata_path = language_data_path_prefix + kTrainedDataSuffix;
if (!mgr->is_loaded() && !mgr->Init(tessdata_path.c_str())) {
- tprintf("Error opening data file %s\n", tessdata_path.c_str());
tprintf(
"Please make sure the TESSDATA_PREFIX environment variable is set"
" to your \"tessdata\" directory.\n");
diff --git a/src/ccutil/serialis.cpp b/src/ccutil/serialis.cpp
index d9c9a8d4..4356d027 100644
--- a/src/ccutil/serialis.cpp
+++ b/src/ccutil/serialis.cpp
@@ -21,6 +21,7 @@
#include "errcode.h"
#include "helpers.h" // for ReverseN
+#include "tprintf.h" // for tprintf
#include <climits> // for INT_MAX
#include <cstdio>
@@ -44,6 +45,8 @@ bool LoadDataFromFile(const char *filename, std::vector<char> *data) {
result = static_cast<long>(fread(&(*data)[0], 1, size, fp)) == size;
}
fclose(fp);
+ } else {
+ tprintf("Error opening data file '%s'!\n", filename);
}
return result;
}
diff --git a/src/ccutil/tessdatamanager.cpp b/src/ccutil/tessdatamanager.cpp
index 279cf7ac..1582ef92 100644
--- a/src/ccutil/tessdatamanager.cpp
+++ b/src/ccutil/tessdatamanager.cpp
@@ -82,6 +82,8 @@ bool TessdataManager::LoadArchiveFile(const char *filename) {
result = is_loaded_;
}
archive_read_free(a);
+ } else {
+ tprintf("Error opening data file '%s'!\n", filename);
}
return result;
} |
…tesseract-ocr#1075) Signed-off-by: Stefan Weil <sw@weilnetz.de>
…#1075) Signed-off-by: Stefan Weil <sw@weilnetz.de>
Fixed in commit 68017db. |
I've got error 'mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../../../lstm/lstmtrainer.h, line 110' when trying to run a command below. Can somebody help me with this?
All paths are correct.
version 4.0
c:\Temp\tesseract-master\tesseract-master\api>lstmtraining --model_output C:/Temp/engbest/model --continue_from C:/Temp/engbest/eng.lstm --traineddata C:/Temp/engbest/eng.traineddata --old_traineddata C:/Temp/tesseract-master/tesseract-master/tessdata/eng.traineddata --train_listfile C:/Temp/engbest/eng.training_files.txt --max_iterations 3600
The text was updated successfully, but these errors were encountered: