This repository was archived by the owner on Jul 7, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
adafactor vs adam #1008
Copy link
Copy link
Open
Description
Description
I am interested in using adafactor (instead of adam) because it allows checkpoints of smaller size and according to this paper achieves also good performance wrt to adam.
But according to the logs it seems that the bleu is much lower, as you can see below.
Is there any setting (different from default) which is specific for Adafactor?
approx_bleu for Adafactor: (one evaluation every 5K steps)
5K: 0.057251852
10K: 0.16007528
15K: 0.25117296
20K: 0.29413137
25K: 0.3245068
30K: 0.3451813
35K: 0.366105
approx_bleu for Adam: (one evaluation every 5K steps)
5K: 0.32362464
10K: 0.4280433
15K: 0.47145975
20K: 0.4960477
25K: 0.51056355
30K: 0.52096725
35K: 0.53078866
40K: 0.5337893
45K: 0.5363042
50K: 0.53831
Environment information
OS: Ubuntu 16.04
$ pip freeze | grep tensor
tensor2tensor==1.6.3
tensorboard==1.8.0
tensorflow-gpu==1.8.0
$ python -V
Python 2.7.12
For bugs: reproduction and error logs
# Steps to reproduce:
t2t-datagen --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k
t2t-trainer --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k --model transformer --hparams_set transformer_base --output_dir ./t2t_data/model_adafactor --local_eval_frequency=500 --train_steps=1000 --worker_gpu=1 --hparams batch_size=3072,optimizer=Adafactor
t2t-trainer --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k --model transformer --hparams_set transformer_base --output_dir ./t2t_data/model_adam --local_eval_frequency=500 --train_steps=1000 --worker_gpu=1 --hparams batch_size=3072
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels