BPE support seems missing #4

skyw · 2017-07-13T17:09:30Z

I'm trying to run wmt16_en_de_gnmt.json.
It first comes back with an error of missing vocabulary file. Looking into the code, it doesn't look for the vocab files with "bpe.32000" which are created by the wmt16_en_de.sh. If I force it to look at the right vocab file, then the model starts to run and graph build seems successful. However, it stops with an error "HashTable has different value for same key. Key <s> has 1 and trying to add value 4"

skyw · 2017-07-13T17:27:42Z

It seems train/test/dev data load don't have BPE path neither.

ebrevdo · 2017-07-15T20:18:40Z

@lmthang Can you PTAL?

lmthang · 2017-07-16T05:12:39Z

@skyw: can you provide the error log?

skyw · 2017-07-17T16:32:34Z

test.log.txt

Attached. the command I used is at beginning of the log.
It looks the BPE vocab files contains "<s>, </s>" already, while the vocab_utils.py is not aware of any of them.

oahziur · 2017-07-17T17:11:22Z

Hmm, why the first token in your vocab is "-e unk", is there any compatibility issues on the vocab generation?

I believe the 3 special token is appended here:

https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh#L148

skyw · 2017-07-17T20:35:21Z

hmm, I didn't even check it. Though they look pretty much the same, I still used https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh to generate data instead of the one I generated by tensor2tensor's script. I would suppose there was some compatibility issues.
But anyway, I removed "-e", it still reports the same error.

I would guess the error is the vocab_util.py trying to add "<s>" but it was already in the vocab file, I haven't tried a manual fix though.

oahziur · 2017-07-17T20:42:04Z

@skyw Can you try remove "-e " and a fresh out_dir. I think the model will use previous saved vocab_file if you use the same out_dir.

Also, here is the head of the vocab file:

<unk>
<s>
</s>
,
.
the
in
of
and
die

skyw · 2017-07-17T21:53:05Z

Uh, it seems to be working after I deleted the the out_dir. Sorry for the chatter, I should have done that.

I also tried to reproduce the issue of generating "-e" in the vocab file. I think the problem is the space between # and !, https://github.com/tensorflow/nmt/blob/master/nmt/scripts/wmt16_en_de.sh#L1

ebrevdo assigned lmthang Jul 15, 2017

lmthang assigned oahziur and unassigned lmthang Jul 16, 2017

oahziur closed this as completed in 3fa281a Jul 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BPE support seems missing #4

BPE support seems missing #4

skyw commented Jul 13, 2017 •

edited

skyw commented Jul 13, 2017

ebrevdo commented Jul 15, 2017

lmthang commented Jul 16, 2017

skyw commented Jul 17, 2017 •

edited

oahziur commented Jul 17, 2017

skyw commented Jul 17, 2017 •

edited

oahziur commented Jul 17, 2017 •

edited

skyw commented Jul 17, 2017

BPE support seems missing #4

BPE support seems missing #4

Comments

skyw commented Jul 13, 2017 • edited

skyw commented Jul 13, 2017

ebrevdo commented Jul 15, 2017

lmthang commented Jul 16, 2017

skyw commented Jul 17, 2017 • edited

oahziur commented Jul 17, 2017

skyw commented Jul 17, 2017 • edited

oahziur commented Jul 17, 2017 • edited

skyw commented Jul 17, 2017

skyw commented Jul 13, 2017 •

edited

skyw commented Jul 17, 2017 •

edited

skyw commented Jul 17, 2017 •

edited

oahziur commented Jul 17, 2017 •

edited