Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results on the same arrays #2

Open
PavelOstyakov opened this issue Jun 10, 2020 · 7 comments
Open

Different results on the same arrays #2

PavelOstyakov opened this issue Jun 10, 2020 · 7 comments

Comments

@PavelOstyakov
Copy link

Hi, thanks for your good work.

I would like to try MSID for GANs evaluation, but found that the metric is extremely unstable. I just clone your repo and perform a simple experiment:

>>> import numpy as np
>>> from msid import msid_score
>>> x0 = np.random.randn(1000, 10)
>>> x1 = np.random.randn(1000, 10)
>>> for _ in range(20):
...     print('MSID(x0, x1)', msid_score(x0, x1))
...

MSID(x0, x1) 11.612343854772956
MSID(x0, x1) 7.671366682093675
MSID(x0, x1) 1.8117880712326395
MSID(x0, x1) 6.205967034975149
MSID(x0, x1) 1.9430385102291492
MSID(x0, x1) 2.467981390832042
MSID(x0, x1) 4.359253678580822
MSID(x0, x1) 5.705092418121339
MSID(x0, x1) 7.084854325912502
MSID(x0, x1) 8.925101261419211
MSID(x0, x1) 2.6563495105769963
MSID(x0, x1) 6.67076587871034
MSID(x0, x1) 0.9609276170219742
MSID(x0, x1) 4.134198699891847
MSID(x0, x1) 2.061919358106404
MSID(x0, x1) 5.4849779235186045
MSID(x0, x1) 2.5738576367295107
MSID(x0, x1) 3.597934029471011
MSID(x0, x1) 1.0966421686877845
MSID(x0, x1) 13.116242604321098

Next, I provide the results of MSID(x0, x0)

>>> for _ in range(20):
...     print('MSID(x0, x0)', msid_score(x0, x0))
...
MSID(x0, x0) 1.8842238243396114
MSID(x0, x0) 6.653959832884025
MSID(x0, x0) 2.896296044612713
MSID(x0, x0) 1.7874406866486243
MSID(x0, x0) 2.212118637843133
MSID(x0, x0) 5.352864291155722
MSID(x0, x0) 4.492301054567285
MSID(x0, x0) 1.3662656634830224
MSID(x0, x0) 2.2663591630199416
MSID(x0, x0) 4.5750399290303045
MSID(x0, x0) 4.094359241800621
MSID(x0, x0) 2.4488511702991795
MSID(x0, x0) 5.929584568192836
MSID(x0, x0) 7.591811322838174
MSID(x0, x0) 7.372357733571717
MSID(x0, x0) 5.6968201645123075
MSID(x0, x0) 1.4797792557903116
MSID(x0, x0) 1.1783656760547234
MSID(x0, x0) 7.6904604926511295
MSID(x0, x0) 5.483936755815125

I expect that a metric for GANs evaluation should:

  1. Give the same (or at least similar) score when I compute it two times on the same data.
  2. MSID(x0, x0) < MSID(x0, x1) for any x0 and x1: x0 != x1

Could you please check your implementation? Thank you.

@xgfs
Copy link
Owner

xgfs commented Jun 11, 2020

The shape of these two random arrays is (approximately) the same. The ~3-5 score you see is due to randomness of the approximation.

@SarfarazHabib
Copy link

@xgfs Thanks for your work, I have same question

I am also getting similar problem. I was trying this metric for Evaluating difference between two layers of CNNs, I found that even if i give similar/same arrays to the function "msid_score()" i get different results when ran multiple times. Also, for same arrays, i was expecting it to be lower than other values but this is not the case.

Can you please elaborate about this further. ?
Also, when you run for similar arrays multiple times, the score keeps changing.

@xgfs
Copy link
Owner

xgfs commented Jul 15, 2020

The results are based on a random approximation, the default value of niters=100 gives reasonable default results for many datasets (as used in the paper). If you want more stable predictions, you can either increase the niters to some other appropriate value, or run the function multiple and measure the average/std of the results. These two approaches are mathematically equivalent, first one should be a bit faster.

Hope that helps.

@SarfarazHabib
Copy link

Hi, I tried both of your proposed solutions, the results came close to being stable but they still fluctuate to different valules even for the same arrays.

@xgfs
Copy link
Owner

xgfs commented Jul 21, 2020

Do you mean that the metric has multi-mode distribution? If so, this is not expected, and we haven't observed this effect.
If you can publish the arrays in question, I can try to investigate.

@SarfarazHabib
Copy link

model_resnet_dct_layer1_3_conv2.zip

This is the pickled file of the array I am using. Its a feature vector from a layer. So, i just transpose it from (1,128,56,56) to (3136, 128) and then calculate the score. I am also pasting the code below for your reference.

act1 = np.transpose(acts1, (0, 2,3, 1))
num_datapoints,h, w, channels = act.shape
f_acts = act.reshape((num_datapointshw, channels))

for i in range(20):
print( msid_score(f_acts, f_acts,niters=1000))

@xunhuang1995
Copy link

I also find the score to be too unstable to be used for GAN evaluation. Table 1 in the paper indeed shows that IMD has larger variance than FID/KID, but the variance I observed is even significantly larger than the variance in Table 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants