BPE support seems missing #4
Comments
It seems train/test/dev data load don't have BPE path neither. |
@lmthang Can you PTAL? |
@skyw: can you provide the error log? |
Attached. the command I used is at beginning of the log. |
Hmm, why the first token in your vocab is "-e unk", is there any compatibility issues on the vocab generation? I believe the 3 special token is appended here: https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh#L148 |
hmm, I didn't even check it. Though they look pretty much the same, I still used https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh to generate data instead of the one I generated by tensor2tensor's script. I would suppose there was some compatibility issues. I would guess the error is the vocab_util.py trying to add "<s>" but it was already in the vocab file, I haven't tried a manual fix though. |
@skyw Can you try remove "-e " and a fresh Also, here is the head of the vocab file:
|
Uh, it seems to be working after I deleted the the out_dir. Sorry for the chatter, I should have done that. I also tried to reproduce the issue of generating "-e" in the vocab file. I think the problem is the space between # and !, https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh#L1 |
I'm trying to run wmt16_en_de_gnmt.json.
It first comes back with an error of missing vocabulary file. Looking into the code, it doesn't look for the vocab files with "bpe.32000" which are created by the wmt16_en_de.sh. If I force it to look at the right vocab file, then the model starts to run and graph build seems successful. However, it stops with an error "HashTable has different value for same key. Key <s> has 1 and trying to add value 4"
The text was updated successfully, but these errors were encountered: