bug Bytenet and slicenet performance #724

divyam3897 · 2018-04-17T18:52:56Z

Description

Bytenet and slicenet are not giving the required performance and results. I have even tried lowering down the learning rate and even used the same as in the paper but that didn't help.

TensorFlow and tensor2tensor versions

tensorflow-gpu (1.6.0)
tensor2tensor: Installed from the source code.

In case of bug report: Steps to reproduce the problem

For bytenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=byte_net --hparams_set=bytenet_base --hparams="batch_size=512,hidden_size=256,initializer=uniform_unit_scaling" --output_dir=$TRAIN_DIR The initializer is to be specified else it gives NaN.
for slicenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=slice_net --hparams_set=slicenet_1noam --output_dir=$TRAIN_DIR slicenet_1 hparams give a NaN too probably due to the same reason (initializer problem)

Logs

The text was updated successfully, but these errors were encountered:

zqma2 · 2018-05-16T03:27:23Z

I have the same problem in slicenet, have you resolve this problem?

divyam3897 · 2018-05-16T05:47:02Z

No, we don't have any improvement in the performance.

zqma2 · 2018-05-16T06:32:48Z

Thanks, did you follow some other models, like transformer_moe?

divyam3897 · 2018-05-16T06:39:25Z

Transformer works fine, the only issues we face are with bytenet and slicenet.

rsepassi · 2018-05-24T01:11:24Z

So setting initializer=uniform_unit_scaling works fine? I don't believe we've tuned these models so the hparams given may not work out of the box.

divyam3897 · 2018-05-24T17:12:53Z

@rsepassi no, the initializer change just prevents the NaN but the performance is still not as expected from both of them. We have tried a bunch of hparams including the hparams in the paper itself but nothing helped.

hhxxttxsh · 2018-11-07T14:55:33Z

@divyam3897 Hi, just wondering whether you have managed to reproduce the accuracy appeared in the paper for ByteNet? What is the score you have got?

divyam3897 changed the title ~~Bytenet and slicenet performance~~ *bug* Bytenet and slicenet performance Apr 18, 2018

rsepassi added question bug and removed question labels May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug Bytenet and slicenet performance #724

bug Bytenet and slicenet performance #724

divyam3897 commented Apr 17, 2018

zqma2 commented May 16, 2018

divyam3897 commented May 16, 2018

zqma2 commented May 16, 2018

divyam3897 commented May 16, 2018

rsepassi commented May 24, 2018

divyam3897 commented May 24, 2018

hhxxttxsh commented Nov 7, 2018 •

edited

*bug* Bytenet and slicenet performance #724

*bug* Bytenet and slicenet performance #724

Comments

divyam3897 commented Apr 17, 2018

Description

TensorFlow and tensor2tensor versions

In case of bug report: Steps to reproduce the problem

zqma2 commented May 16, 2018

divyam3897 commented May 16, 2018

zqma2 commented May 16, 2018

divyam3897 commented May 16, 2018

rsepassi commented May 24, 2018

divyam3897 commented May 24, 2018

hhxxttxsh commented Nov 7, 2018 • edited

bug Bytenet and slicenet performance #724

bug Bytenet and slicenet performance #724

hhxxttxsh commented Nov 7, 2018 •

edited