*bug* Bytenet and slicenet performance #724
Comments
I have the same problem in slicenet, have you resolve this problem? |
No, we don't have any improvement in the performance. |
Thanks, did you follow some other models, like transformer_moe? |
Transformer works fine, the only issues we face are with bytenet and slicenet. |
So setting |
@rsepassi no, the |
@divyam3897 Hi, just wondering whether you have managed to reproduce the accuracy appeared in the paper for ByteNet? What is the score you have got? |
Description
Bytenet and slicenet are not giving the required performance and results. I have even tried lowering down the learning rate and even used the same as in the paper but that didn't help.
TensorFlow and tensor2tensor versions
tensorflow-gpu (1.6.0)
tensor2tensor: Installed from the source code.
In case of bug report: Steps to reproduce the problem
For bytenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=byte_net --hparams_set=bytenet_base --hparams="batch_size=512,hidden_size=256,initializer=uniform_unit_scaling" --output_dir=$TRAIN_DIR
The initializer is to be specified else it gives NaN.for slicenet:
t2t-trainer --data_dir=$DATA_DIR --problems=translate_ende_wmt32k --model=slice_net --hparams_set=slicenet_1noam --output_dir=$TRAIN_DIR
slicenet_1 hparams give a NaN too probably due to the same reason (initializer problem)Logs
The text was updated successfully, but these errors were encountered: