Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

adafactor vs adam #1008

@nicolabertoldi

Description

@nicolabertoldi

Description

I am interested in using adafactor (instead of adam) because it allows checkpoints of smaller size and according to this paper achieves also good performance wrt to adam.
But according to the logs it seems that the bleu is much lower, as you can see below.

Is there any setting (different from default) which is specific for Adafactor?

approx_bleu for Adafactor: (one evaluation every 5K steps)
5K: 0.057251852
10K: 0.16007528
15K: 0.25117296
20K: 0.29413137
25K: 0.3245068
30K: 0.3451813
35K: 0.366105

approx_bleu for Adam: (one evaluation every 5K steps)
5K: 0.32362464
10K: 0.4280433
15K: 0.47145975
20K: 0.4960477
25K: 0.51056355
30K: 0.52096725
35K: 0.53078866
40K: 0.5337893
45K: 0.5363042
50K: 0.53831

Environment information

OS: Ubuntu 16.04

$ pip freeze | grep tensor
tensor2tensor==1.6.3
tensorboard==1.8.0
tensorflow-gpu==1.8.0


$ python -V
Python 2.7.12

For bugs: reproduction and error logs

# Steps to reproduce:
t2t-datagen --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k

t2t-trainer --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k --model transformer --hparams_set transformer_base --output_dir ./t2t_data/model_adafactor --local_eval_frequency=500 --train_steps=1000 --worker_gpu=1 --hparams batch_size=3072,optimizer=Adafactor

t2t-trainer --data_dir t2t_data/datagen --tmp_dir ./t2t_data/tmp --problem translate_enfr_wmt_small8k --model transformer --hparams_set transformer_base --output_dir ./t2t_data/model_adam --local_eval_frequency=500 --train_steps=1000 --worker_gpu=1 --hparams batch_size=3072

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions