English to Spanish: zero blue score while training along with <unk> output during inference #396

MukundKhandelwal · 2018-10-01T14:59:20Z

Hi,

I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just
unknown.

Here is how I am creating the vocabulary:

from nltk.corpus import stopwords
stoplist = stopwords.words('english')
file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus
text = file.read()
clean = [word for word in text.split() if word not in stoplist]
from collections import Counter
count = Counter(clean)
frequency = count.most_common(17188)
l1,l2=zip(*frequency)
with open('/home/ubuntu/mukund_nmt/spanish_data/vocab.en', 'w') as f:
for item in l1:
f.write("%s\n" % item) #writing the vocab file as a string

Once the vocabulary is created, I run the training as follows:

python -m nmt.nmt --src=en --tgt=es --vocab_prefix=/home/ubuntu/mukund_nmt/spanish_data/vocab --train_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_train --dev_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_dev --test_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_testing --out_dir=/home/ubuntu/mukund_nmt/spanish_data/model1 --num_train_steps=3000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu

It does give some output at the start, which looks normal to me since its just the beginning:

However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):

Can someone please help me with this. Thanks.

MukundKhandelwal changed the title ~~zero blue score while training along with <unk> output during inference~~ English to Spanish: zero blue score while training along with <unk> output during inference Oct 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

English to Spanish: zero blue score while training along with <unk> output during inference #396

English to Spanish: zero blue score while training along with <unk> output during inference #396

MukundKhandelwal commented Oct 1, 2018 •

edited

Loading

English to Spanish: zero blue score while training along with <unk> output during inference #396

English to Spanish: zero blue score while training along with <unk> output during inference #396

Comments

MukundKhandelwal commented Oct 1, 2018 • edited Loading

MukundKhandelwal commented Oct 1, 2018 •

edited

Loading