how is robustness calculated? #5

psteinb · 2022-03-03T10:38:55Z

Hi,

thank you for this wonderful work on vision transformers and how to understand them. I have some simple questions which I must apologize for.
I tried to reproduce figure 12 independently of your code base. I struggle a bit to understand the code. Is is correct that you define robustness as robustness = mean(accuracy(y_val_true, y_val_pred))?
Related to this, do I understand correctly that you compute this accuracy on batches of the validation dataset? These batches are of size 256, right?

Thanks.

The text was updated successfully, but these errors were encountered:

xxxnell · 2022-03-04T04:31:08Z

Hi, thank you for your support!

CIFAR-{10, 100}-C and ImageNet-C consist of 75 datasets (= data corrupted by 15 different types with 5 levels of intensity each). The robustness in this paper is the average of the accuracies on these 75 corrupted datasets.

In particular, I recommend that you measure the robustness as follows:

Run all cells in robustness.ipynb to get predictive performances of a pretrained model on the 75 datasets. CIFAR-{10, 100}-C will be automatically downloaded. Then, you will get a performance sheet like the sample robustness sheet.
Average all accuracies for the 75 datasets. In the robustness sheet, the columns stand for "Intensity", "Type", "NLL", "Cutoff1", "Cutoff2", "Acc", "Acc-90", "Unc", "Unc-90", "IoU", "IoU-90", "Freq", "Freq-90", "Top-5", "Brier", "ECE", "ECSE”, respectively. We only use the accuracy column ("Acc").

To avoid confusion: rigorously, we do not use the following types of datasets for evaluation: "speckle_noise", "gaussian_blur", "spatter", "saturate". Another metric called mCE (which does not used in this paper) is also used for robustness.

The batch size is 256 by default, but I believe the robustness is independent of the batch size.

xxxnell · 2022-03-12T16:39:47Z

Closing this issue based on the comment above. Please feel free to reopen this issue if the problem still exists.

psteinb · 2022-03-14T10:10:21Z

Sure thing, please close the issue.
I think it would be great to have access to the intermediate results to (re-)produce the robustness numbers.
I fancied in the robustness notebook that I'd have to retrain all cited models (as I cannot honor models.load(name, ...) in my environment) and (to be honest) didn't want to invest the CO2 for this.
But maybe the .pth checkpoints are available for download and I misread the docs. Please accept my apologies if that is the case.

xxxnell · 2022-03-15T14:26:43Z

Thank you for your constructive feedback. I agree with your comments that releasing intermediate results would be helpful, because evaluating pretrained models on 75 datasets can be resource intensive. I will release robustness sheets as intermediate results for some models, and make the pretrained models easily accessible.

xxxnell closed this as completed Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how is robustness calculated? #5

how is robustness calculated? #5

psteinb commented Mar 3, 2022

xxxnell commented Mar 4, 2022 •

edited

xxxnell commented Mar 12, 2022

psteinb commented Mar 14, 2022

xxxnell commented Mar 15, 2022

how is robustness calculated? #5

how is robustness calculated? #5

Comments

psteinb commented Mar 3, 2022

xxxnell commented Mar 4, 2022 • edited

xxxnell commented Mar 12, 2022

psteinb commented Mar 14, 2022

xxxnell commented Mar 15, 2022

xxxnell commented Mar 4, 2022 •

edited