Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

Open
ghost opened this issue Jun 24, 2017 · 2 comments

Comments

@ghost
Copy link

ghost commented Jun 24, 2017

Peace Be Upon you,
I am training an Arabic model from scratch, reaching about 270,000 epochs in 32 hours, the ERROR is still 1, and the OUT is empty.
The data I used for training is artificial and 100% Arabic, contains no diacritics, is 300 dpi, Times New Roman regular size 18, and I,am sure that the transcription is 100% correct.
How is it that I cannot get anything recognized?

Attached (click-on):
The transcribed html file
The extracted png/gt.txt files
The training script train.sh
The produced clstm models

The complete terminal log

My training script:

#!/bin/bash
set -x
set -a
sort -R manifest.txt > /tmp/manifest2.txt
sed 1,100d /tmp/manifest2.txt > train.txt
sed 100q /tmp/manifest2.txt > test.txt

report_every=1000
save_every=1000
maxtrain=2000000
target_height=48
dewarp=center
display_every=1000
test_every=1000
hidden=100
lrate=1e-4
save_name=arabic
'/home/bmwmy/Desktop/kra/clstm/clstmocrtrain' train.txt test.txt
@ghost ghost changed the title 270,000 epochs/ ERROR still 1/ OUT is blank 270,000 epochs/ ERROR still 1/ OUT is blank/ Arabic language Jun 24, 2017
@ghost
Copy link
Author

ghost commented Jun 24, 2017

*** charsep 
got 1998 files, 100 tests
got 38 classes
.stacked: 0.0001 0.9 in 0 48 out 0 38
.stacked.parallel: 0.0001 0.9 in 0 48 out 0 200
.stacked.parallel.lstm: 0.0001 0.9 in 0 48 out 0 100
.stacked.parallel.reversed: 0.0001 0.9 in 0 48 out 0 100
.stacked.parallel.reversed.lstm: 0.0001 0.9 in 0 48 out 0 100
.stacked.softmax: 0.0001 0.9 in 0 200 out 0 38
.
.
.
ERROR 8000 1     6321 6321
8000
TRU نونمٔوي
ALN 
OUT 
ERROR 9000 1     6321 6321
.
.
.
ERROR 268000 1     6321 6321
268000
TRU تثبل لاق تثبل مك لاق هثعب مث ماع ةٔيام هللا هتامٔاف
 ةٔيام تثبل لب لاق موي ضعب ؤا اموي
ALN 
OUT 
ERROR 269000 1     6321 6321
269000
TRU رفكي نٕاف ةوبنلاو مكحلاو باتكلا مهانيتٓا نيذلا كٔيلؤا
 اموق اهب انلكو دقف ءالٔوه اهب
ALN 
OUT

@ghost
Copy link
Author

ghost commented Jun 25, 2017

I found the solution, it seems to be a weird problem
I removed these lines from the training script, and the problem was solved:

sort -R manifest.txt > /tmp/manifest2.txt
sed 1,100d /tmp/manifest2.txt > train.txt
sed 100q /tmp/manifest2.txt > test.txt

Therefore, you can run these commands in the terminal, and after they finish executing close the current terminal, then you must open a new terminal and run the training script.
After removing these lines from the training script, and then running the training script again the ERROR was changing and the ALN and OUT contained data.
Somehow sed and sort were causing the model not to train, what a weird problem.

@ghost ghost changed the title 270,000 epochs/ ERROR still 1/ OUT is blank/ Arabic language Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) Jun 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants