-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results on the same arrays #2
Comments
The shape of these two random arrays is (approximately) the same. The ~3-5 score you see is due to randomness of the approximation. |
@xgfs Thanks for your work, I have same question I am also getting similar problem. I was trying this metric for Evaluating difference between two layers of CNNs, I found that even if i give similar/same arrays to the function "msid_score()" i get different results when ran multiple times. Also, for same arrays, i was expecting it to be lower than other values but this is not the case. Can you please elaborate about this further. ? |
The results are based on a random approximation, the default value of niters=100 gives reasonable default results for many datasets (as used in the paper). If you want more stable predictions, you can either increase the niters to some other appropriate value, or run the function multiple and measure the average/std of the results. These two approaches are mathematically equivalent, first one should be a bit faster. Hope that helps. |
Hi, I tried both of your proposed solutions, the results came close to being stable but they still fluctuate to different valules even for the same arrays. |
Do you mean that the metric has multi-mode distribution? If so, this is not expected, and we haven't observed this effect. |
model_resnet_dct_layer1_3_conv2.zip This is the pickled file of the array I am using. Its a feature vector from a layer. So, i just transpose it from (1,128,56,56) to (3136, 128) and then calculate the score. I am also pasting the code below for your reference. act1 = np.transpose(acts1, (0, 2,3, 1)) for i in range(20): |
I also find the score to be too unstable to be used for GAN evaluation. Table 1 in the paper indeed shows that IMD has larger variance than FID/KID, but the variance I observed is even significantly larger than the variance in Table 1. |
Hi, thanks for your good work.
I would like to try MSID for GANs evaluation, but found that the metric is extremely unstable. I just clone your repo and perform a simple experiment:
Next, I provide the results of MSID(x0, x0)
I expect that a metric for GANs evaluation should:
Could you please check your implementation? Thank you.
The text was updated successfully, but these errors were encountered: