Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

*bug* Bytenet and slicenet performance #724

Open
divyam3897 opened this issue Apr 17, 2018 · 7 comments
Open

*bug* Bytenet and slicenet performance #724

divyam3897 opened this issue Apr 17, 2018 · 7 comments
Labels

Comments

@divyam3897
Copy link

Description

Bytenet and slicenet are not giving the required performance and results. I have even tried lowering down the learning rate and even used the same as in the paper but that didn't help.

TensorFlow and tensor2tensor versions

tensorflow-gpu (1.6.0)
tensor2tensor: Installed from the source code.

In case of bug report: Steps to reproduce the problem

For bytenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=byte_net --hparams_set=bytenet_base --hparams="batch_size=512,hidden_size=256,initializer=uniform_unit_scaling" --output_dir=$TRAIN_DIR The initializer is to be specified else it gives NaN.
for slicenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=slice_net --hparams_set=slicenet_1noam --output_dir=$TRAIN_DIR slicenet_1 hparams give a NaN too probably due to the same reason (initializer problem)

Logs

@divyam3897 divyam3897 changed the title Bytenet and slicenet performance *bug* Bytenet and slicenet performance Apr 18, 2018
@zqma2
Copy link

zqma2 commented May 16, 2018

I have the same problem in slicenet, have you resolve this problem?

@divyam3897
Copy link
Author

No, we don't have any improvement in the performance.

@zqma2
Copy link

zqma2 commented May 16, 2018

Thanks, did you follow some other models, like transformer_moe?

@divyam3897
Copy link
Author

Transformer works fine, the only issues we face are with bytenet and slicenet.

@rsepassi
Copy link
Contributor

So setting initializer=uniform_unit_scaling works fine? I don't believe we've tuned these models so the hparams given may not work out of the box.

@divyam3897
Copy link
Author

@rsepassi no, the initializer change just prevents the NaN but the performance is still not as expected from both of them. We have tried a bunch of hparams including the hparams in the paper itself but nothing helped.

@hhxxttxsh
Copy link

hhxxttxsh commented Nov 7, 2018

@divyam3897 Hi, just wondering whether you have managed to reproduce the accuracy appeared in the paper for ByteNet? What is the score you have got?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants