Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

ghost · 2017-06-24T11:10:13Z

Peace Be Upon you,
I am training an Arabic model from scratch, reaching about 270,000 epochs in 32 hours, the ERROR is still 1, and the OUT is empty.
The data I used for training is artificial and 100% Arabic, contains no diacritics, is 300 dpi, Times New Roman regular size 18, and I,am sure that the transcription is 100% correct.
How is it that I cannot get anything recognized?

Attached (click-on):
The transcribed html file
The extracted png/gt.txt files
The training script train.sh
The produced clstm models

The complete terminal log

My training script:

#!/bin/bash
set -x
set -a
sort -R manifest.txt > /tmp/manifest2.txt
sed 1,100d /tmp/manifest2.txt > train.txt
sed 100q /tmp/manifest2.txt > test.txt

report_every=1000
save_every=1000
maxtrain=2000000
target_height=48
dewarp=center
display_every=1000
test_every=1000
hidden=100
lrate=1e-4
save_name=arabic
'/home/bmwmy/Desktop/kra/clstm/clstmocrtrain' train.txt test.txt

The text was updated successfully, but these errors were encountered:

ghost · 2017-06-24T11:23:22Z

*** charsep 
got 1998 files, 100 tests
got 38 classes
.stacked: 0.0001 0.9 in 0 48 out 0 38
.stacked.parallel: 0.0001 0.9 in 0 48 out 0 200
.stacked.parallel.lstm: 0.0001 0.9 in 0 48 out 0 100
.stacked.parallel.reversed: 0.0001 0.9 in 0 48 out 0 100
.stacked.parallel.reversed.lstm: 0.0001 0.9 in 0 48 out 0 100
.stacked.softmax: 0.0001 0.9 in 0 200 out 0 38
.
.
.
ERROR 8000 1     6321 6321
8000
TRU نونمٔوي
ALN 
OUT 
ERROR 9000 1     6321 6321
.
.
.
ERROR 268000 1     6321 6321
268000
TRU تثبل لاق تثبل مك لاق هثعب مث ماع ةٔيام هللا هتامٔاف
 ةٔيام تثبل لب لاق موي ضعب ؤا اموي
ALN 
OUT 
ERROR 269000 1     6321 6321
269000
TRU رفكي نٕاف ةوبنلاو مكحلاو باتكلا مهانيتٓا نيذلا كٔيلؤا
 اموق اهب انلكو دقف ءالٔوه اهب
ALN 
OUT

ghost · 2017-06-25T17:16:22Z

I found the solution, it seems to be a weird problem
I removed these lines from the training script, and the problem was solved:

sort -R manifest.txt > /tmp/manifest2.txt
sed 1,100d /tmp/manifest2.txt > train.txt
sed 100q /tmp/manifest2.txt > test.txt

Therefore, you can run these commands in the terminal, and after they finish executing close the current terminal, then you must open a new terminal and run the training script.
After removing these lines from the training script, and then running the training script again the ERROR was changing and the ALN and OUT contained data.
Somehow sed and sort were causing the model not to train, what a weird problem.

ghost changed the title ~~270,000 epochs/ ERROR still 1/ OUT is blank~~ 270,000 epochs/ ERROR still 1/ OUT is blank/ Arabic language Jun 24, 2017

ghost changed the title ~~270,000 epochs/ ERROR still 1/ OUT is blank/ Arabic language~~ Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) Jun 25, 2017

ghost mentioned this issue Jun 25, 2017

Sort & Sed commands are hindering the training mittagessen/kraken-vagrant#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

ghost commented Jun 24, 2017 •

edited by ghost

Loading

ghost commented Jun 24, 2017

ghost commented Jun 25, 2017 •

edited by ghost

Loading

Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

Sort and Sed commands are causing the model not to train (ERROR 1/OUT is empty) #132

Comments

ghost commented Jun 24, 2017 • edited by ghost Loading

ghost commented Jun 24, 2017

ghost commented Jun 25, 2017 • edited by ghost Loading

ghost commented Jun 24, 2017 •

edited by ghost

Loading

ghost commented Jun 25, 2017 •

edited by ghost

Loading