Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

English to Spanish: zero blue score while training along with <unk> output during inference #396

Open
MukundKhandelwal opened this issue Oct 1, 2018 · 0 comments

Comments

@MukundKhandelwal
Copy link

MukundKhandelwal commented Oct 1, 2018

Hi,

I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just
unknown.

Here is how I am creating the vocabulary:

from nltk.corpus import stopwords
stoplist = stopwords.words('english')
file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus
text = file.read()
clean = [word for word in text.split() if word not in stoplist]
from collections import Counter
count = Counter(clean)
frequency = count.most_common(17188)
l1,l2=zip(*frequency)
with open('/home/ubuntu/mukund_nmt/spanish_data/vocab.en', 'w') as f:
for item in l1:
f.write("%s\n" % item) #writing the vocab file as a string

Once the vocabulary is created, I run the training as follows:

python -m nmt.nmt --src=en --tgt=es --vocab_prefix=/home/ubuntu/mukund_nmt/spanish_data/vocab --train_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_train --dev_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_dev --test_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_testing --out_dir=/home/ubuntu/mukund_nmt/spanish_data/model1 --num_train_steps=3000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu

It does give some output at the start, which looks normal to me since its just the beginning:
image

However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):
image

Can someone please help me with this. Thanks.

@MukundKhandelwal MukundKhandelwal changed the title zero blue score while training along with <unk> output during inference English to Spanish: zero blue score while training along with <unk> output during inference Oct 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant