Question about FID calculation #11

taehoon-yoon · 2023-09-20T15:28:39Z

Hi, thanks for your amazing project!
Anyway I was looking through your code and found that while calculating covariance matrix in fid_score.py you set ddof to 0 .

ddpm-torch/ddpm_torch/metrics/fid_score.py

Line 124 in 5a2566e

var = np.cov(act, rowvar=False, ddof=0, dtype=np.float64)

But when I look at pytorch-fid library, it seems like they didn't set ddof to 0, which mean they set ddof to None.
https://github.com/mseitzer/pytorch-fid/blob/0a754fb8e66021700478fd365b79c2eaa316e31b/src/pytorch_fid/fid_score.py#L230

Is there any reason you set ddof parameter to 0? It seems like setting ddof to 0 or not makes significant change in FID score. For example, I trained DDPM for cifar10 dataset and tried to calculate FID score. If I set ddof to 0 then resulting FID score was about 8 but when I just use default value for ddof just like pytorch-fid library did it, then resulting FID score was about 11.

Furthermore DDPM library by lucidrains (https://github.com/lucidrains/denoising-diffusion-pytorch), which is one of the most known library implementing DDPM in pytorch, also calculates FID score with ``ddof``` to default value(None)

So I was qurious about the reason you set ddof=0.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

tqch · 2023-09-20T16:03:56Z

Hi, thank you for asking! I want to clarify that the final output of the covariance matrix by my algorithm is still an unbiased estimator, meaning that the effective ddof is 1, just like those implementations you mentioned. Let me explain why:

Actually, in my implementation, I derive a simple online algorithm for covariance matrix calculation (an unbiased estimator), where I create a covariance matrix as a running statistic and update it at each mini-batch. You can find a simple version in uni-variate case at Wikipedia link. For convenience, I choose ddof=0 (which is biased) to keep track of the running statistic, as the biased covariance estimator is always well-defined (even if you have a batch size of 1). Note that in

ddpm-torch/ddpm_torch/metrics/fid_score.py

Lines 137 to 142 in 5a2566e

    
           def get_statistics(self): 
        
               assert self.count > 1, "Count must be greater than 1!" 
        
               return ( 
        
                   self.running_mean.copy(), 
        
                   self.running_var.copy() * self.count / (self.count - 1) 
        
               )

when I extract the running statistic once the process has gone through the whole dataset to evaluate, I use the unbiased correction by multiply it by $N/(N-1)$.

taehoon-yoon · 2023-09-20T16:55:42Z

Thanks for your kind explanation:) It helped me a lot.
May good luck always be with you!

taehoon-yoon closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about FID calculation #11

Question about FID calculation #11

taehoon-yoon commented Sep 20, 2023

tqch commented Sep 20, 2023

taehoon-yoon commented Sep 20, 2023

Question about FID calculation #11

Question about FID calculation #11

Comments

taehoon-yoon commented Sep 20, 2023

tqch commented Sep 20, 2023

taehoon-yoon commented Sep 20, 2023