Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about FID calculation #11

Closed
taehoon-yoon opened this issue Sep 20, 2023 · 2 comments
Closed

Question about FID calculation #11

taehoon-yoon opened this issue Sep 20, 2023 · 2 comments

Comments

@taehoon-yoon
Copy link

Hi, thanks for your amazing project!
Anyway I was looking through your code and found that while calculating covariance matrix in fid_score.py you set ddof to 0 .

var = np.cov(act, rowvar=False, ddof=0, dtype=np.float64)

But when I look at pytorch-fid library, it seems like they didn't set ddof to 0, which mean they set ddof to None.
https://github.com/mseitzer/pytorch-fid/blob/0a754fb8e66021700478fd365b79c2eaa316e31b/src/pytorch_fid/fid_score.py#L230

Is there any reason you set ddof parameter to 0? It seems like setting ddof to 0 or not makes significant change in FID score. For example, I trained DDPM for cifar10 dataset and tried to calculate FID score. If I set ddof to 0 then resulting FID score was about 8 but when I just use default value for ddof just like pytorch-fid library did it, then resulting FID score was about 11.

Furthermore DDPM library by lucidrains (https://github.com/lucidrains/denoising-diffusion-pytorch), which is one of the most known library implementing DDPM in pytorch, also calculates FID score with ``ddof``` to default value(None)

So I was qurious about the reason you set ddof=0.

Thanks in advance!

@tqch
Copy link
Owner

tqch commented Sep 20, 2023

Hi, thank you for asking! I want to clarify that the final output of the covariance matrix by my algorithm is still an unbiased estimator, meaning that the effective ddof is 1, just like those implementations you mentioned. Let me explain why:

Actually, in my implementation, I derive a simple online algorithm for covariance matrix calculation (an unbiased estimator), where I create a covariance matrix as a running statistic and update it at each mini-batch. You can find a simple version in uni-variate case at Wikipedia link. For convenience, I choose ddof=0 (which is biased) to keep track of the running statistic, as the biased covariance estimator is always well-defined (even if you have a batch size of 1). Note that in

def get_statistics(self):
assert self.count > 1, "Count must be greater than 1!"
return (
self.running_mean.copy(),
self.running_var.copy() * self.count / (self.count - 1)
)

when I extract the running statistic once the process has gone through the whole dataset to evaluate, I use the unbiased correction by multiply it by $N/(N-1)$.

@taehoon-yoon
Copy link
Author

Thanks for your kind explanation:) It helped me a lot.
May good luck always be with you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants