How does `accumulate_gradient_steps` work? #108

VictorSanh · 2024-03-08T20:51:57Z

Hi,

I am unsure I understand the logic behind accumulate_gradient_steps.
I have these 3 configurations:

batch_size=1, accumulate_gradient_steps=1 -> blue
batch_size=2, accumulate_gradient_steps=1 -> red
batch_size=2, accumulate_gradient_steps=2 -> green

My initial understanding is that when doing grad accumulation, accumulate_gradient_steps forwards + backward steps and then the optimizer takes a step.

1/ I don't see such where that logic is handled in llama_train.py. it looks like in the method train_step, there is no counter for accumulate_gradient_steps and optimizer steps are taken after each forward?
2/ the logging is confusing: I would have expected the red and blue line to be overlapped, not the blue and green.

Is it possible that step is the counter of forward+backward operations and not the counter of (forward+backward) x grad_acc + optimizer_step?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does `accumulate_gradient_steps` work? #108

How does `accumulate_gradient_steps` work? #108

VictorSanh commented Mar 8, 2024 •

edited

Loading

How does accumulate_gradient_steps work? #108

How does accumulate_gradient_steps work? #108

Comments

VictorSanh commented Mar 8, 2024 • edited Loading

How does `accumulate_gradient_steps` work? #108

How does `accumulate_gradient_steps` work? #108

VictorSanh commented Mar 8, 2024 •

edited

Loading