Question about the moco implementation #6

mingkai-zheng · 2022-01-06T11:55:09Z

Hello, I'm a bit confused about the moco implementation in this paper. Since moco only has one forward pass for the teacher network, so I guess that the lazy update is not required for moco right? In this case, did you include the bn statistics for the current batch during the forward pass?

To be more specific, do you update the running_mean and running_var before calculating x?

with torch.no_grad():
    self.running_mean = self.momentum * mean + (1 - self.momentum) * self.running_mean
    self.running_var = self.momentum * var * n / (n - 1) + (1 - self.momentum) * self.running_var

x = (x - self.running_mean[None, :, None, None].detach()) / (
    torch.sqrt(self.running_var[None, :, None, None].detach() + self.eps)
)

or you calculate x first

x = (x - self.running_mean[None, :, None, None].detach()) / (
    torch.sqrt(self.running_var[None, :, None, None].detach() + self.eps)
)

with torch.no_grad():
    self.running_mean = self.momentum * mean + (1 - self.momentum) * self.running_mean
    self.running_var = self.momentum * var * n / (n - 1) + (1 - self.momentum) * self.running_var

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the moco implementation #6

Question about the moco implementation #6

mingkai-zheng commented Jan 6, 2022

Question about the moco implementation #6

Question about the moco implementation #6

Comments

mingkai-zheng commented Jan 6, 2022