You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.
I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just
unknown.
Here is how I am creating the vocabulary:
from nltk.corpus import stopwords
stoplist = stopwords.words('english')
file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus
text = file.read()
clean = [word for word in text.split() if word not in stoplist]
from collections import Counter
count = Counter(clean)
frequency = count.most_common(17188)
l1,l2=zip(*frequency)
with open('/home/ubuntu/mukund_nmt/spanish_data/vocab.en', 'w') as f:
for item in l1:
f.write("%s\n" % item) #writing the vocab file as a string
Once the vocabulary is created, I run the training as follows:
It does give some output at the start, which looks normal to me since its just the beginning:
However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):
Can someone please help me with this. Thanks.
The text was updated successfully, but these errors were encountered:
MukundKhandelwal
changed the title
zero blue score while training along with <unk> output during inference
English to Spanish: zero blue score while training along with <unk> output during inference
Oct 1, 2018
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I am trying to run the translation from English to Spanish. However, during training, the BLEU score remains zero even after running 3000 steps. As a result, when I run the inference, the output is just
unknown.
Here is how I am creating the vocabulary:
from nltk.corpus import stopwords
stoplist = stopwords.words('english')
file=open('/home/ubuntu/europarldata/europarl-v7.es-en.en',encoding='utf-8') #English Corpus
text = file.read()
clean = [word for word in text.split() if word not in stoplist]
from collections import Counter
count = Counter(clean)
frequency = count.most_common(17188)
l1,l2=zip(*frequency)
with open('/home/ubuntu/mukund_nmt/spanish_data/vocab.en', 'w') as f:
for item in l1:
f.write("%s\n" % item) #writing the vocab file as a string
Once the vocabulary is created, I run the training as follows:
python -m nmt.nmt --src=en --tgt=es --vocab_prefix=/home/ubuntu/mukund_nmt/spanish_data/vocab --train_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_train --dev_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_dev --test_prefix=/home/ubuntu/mukund_nmt/spanish_data/new_testing --out_dir=/home/ubuntu/mukund_nmt/spanish_data/model1 --num_train_steps=3000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu
It does give some output at the start, which looks normal to me since its just the beginning:
However, subsequent training outputs are filled with unknown with BLEU score remaining 0 till the end of the training. For this reason, the inference output also comes out to be garbage (shown below):
Can someone please help me with this. Thanks.
The text was updated successfully, but these errors were encountered: