Correctness of "counter average weight reduction"

Hi devs, first, thanks for this clean & useful code base!

I'm not following this line
https://github.com/texttron/tevatron/blob/f88e0c738dd15cfe8cbc939159ef3ed29597eb5a/src/tevatron/retriever/modeling/encoder.py#L72

Since in an earlier line https://github.com/texttron/tevatron/blob/f88e0c738dd15cfe8cbc939159ef3ed29597eb5a/src/tevatron/retriever/modeling/encoder.py#L41
reduction is "mean", I believe it is correct to let DDP average the gradient across multiple GPUs, and there's no need to counteract that? Only when DDP does the average, would it be equivalent between 2 GPUs with 8 examples each and 1 GPU with 16 examples, both reduced by "mean".

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correctness of "counter average weight reduction" #143

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Correctness of "counter average weight reduction" #143

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions