Adversarial training is not working #9

ksouvik52 · 2022-07-24T16:39:15Z

Hi, as you suggested in the earlier thread, we tried with the exact same settings and argument that you provided to do train on deit tiny adversarially. However, it is still not working, only 18% accuracy after 52 epochs.

{"train_lr": 9.999999999999953e-07, "train_loss": 6.91152723201459, "test_0_loss": 6.872483930142354, "test_0_acc1": 0.26, "test_0_acc5": 1.12, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 0, "n_parameters": 5717416}
{"train_lr": 9.999999999999953e-07, "train_loss": 6.900039605433039, "test_0_loss": 6.850005263940539, "test_0_acc1": 0.362, "test_0_acc5": 1.578, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 1, "n_parameters": 5717416}
{"train_lr": 0.00040089999999999305, "train_loss": 6.754413659242894, "test_0_loss": 6.145018349987379, "test_0_acc1": 3.726, "test_0_acc5": 10.93, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 2, "n_parameters": 5717416}
{"train_lr": 0.0008008000000000196, "train_loss": 6.625597798519379, "test_0_loss": 5.638628653967449, "test_0_acc1": 7.224, "test_0_acc5": 18.596, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 3, "n_parameters": 5717416}
{"train_lr": 0.001200700000000036, "train_loss": 6.560625480233337, "test_0_loss": 5.315861636678607, "test_0_acc1": 10.064, "test_0_acc5": 23.916, "test_5_loss": 6.9177000566849856, "test_5_acc1": 0.1805, "test_5_acc5": 0.6765, "epoch": 4, "n_parameters": 5717416}
{"train_lr": 0.0016005999999999657, "train_loss": 6.54946454628099, "test_0_loss": 5.134192888048774, "test_0_acc1": 11.984, "test_0_acc5": 27.634, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 5, "n_parameters": 5717416}
{"train_lr": 0.0020004999999999914, "train_loss": 6.542691466810225, "test_0_loss": 5.129832990186304, "test_0_acc1": 12.648, "test_0_acc5": 28.482, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 6, "n_parameters": 5717416}
{"train_lr": 0.002400399999999976, "train_loss": 6.536189370113406, "test_0_loss": 5.018598655821495, "test_0_acc1": 13.942, "test_0_acc5": 31.118, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 7, "n_parameters": 5717416}
{"train_lr": 0.002800300000000059, "train_loss": 6.503189501430777, "test_0_loss": 4.9134670785048495, "test_0_acc1": 15.256, "test_0_acc5": 33.352, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 8, "n_parameters": 5717416}
{"train_lr": 0.0032002000000000983, "train_loss": 6.5094980549373975, "test_0_loss": 4.851811613246408, "test_0_acc1": 15.578, "test_0_acc5": 34.102, "test_5_loss": 5.716198673022533, "test_5_acc1": 6.383, "test_5_acc5": 16.22425, "epoch": 9, "n_parameters": 5717416}
{"train_lr": 0.0036000999999999585, "train_loss": 6.511975479259384, "test_0_loss": 5.143745462633598, "test_0_acc1": 13.898, "test_0_acc5": 30.834, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 10, "n_parameters": 5717416}
{"train_lr": 0.0039023577500088853, "train_loss": 6.418486285981515, "test_0_loss": 5.218988336208, "test_0_acc1": 12.054, "test_0_acc5": 27.72, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 11, "n_parameters": 5717416}
{"train_lr": 0.003882057134063648, "train_loss": 6.32594983450038, "test_0_loss": 5.0386335921455325, "test_0_acc1": 14.276, "test_0_acc5": 31.454, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 12, "n_parameters": 5717416}
{"train_lr": 0.0038599040893470705, "train_loss": 6.353343641300567, "test_0_loss": 4.799374374913163, "test_0_acc1": 16.064, "test_0_acc5": 34.754, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 13, "n_parameters": 5717416}
{"train_lr": 0.0038359204782395374, "train_loss": 6.200804713199274, "test_0_loss": 5.33436276099656, "test_0_acc1": 11.77, "test_0_acc5": 26.914, "test_5_loss": 5.7486809707954, "test_5_acc1": 7.313, "test_5_acc5": 17.87725, "epoch": 14, "n_parameters": 5717416}
{"train_lr": 0.0038101299696696937, "train_loss": 6.372054211956134, "test_0_loss": 5.132588769103652, "test_0_acc1": 14.968, "test_0_acc5": 32.62, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 15, "n_parameters": 5717416}
{"train_lr": 0.0037825580157558377, "train_loss": 6.417677939938699, "test_0_loss": 4.666486162294277, "test_0_acc1": 15.988, "test_0_acc5": 34.454, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 16, "n_parameters": 5717416}
{"train_lr": 0.0037532318266873923, "train_loss": 6.26336412826221, "test_0_loss": 5.38978447795143, "test_0_acc1": 12.402, "test_0_acc5": 28.042, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 17, "n_parameters": 5717416}
{"train_lr": 0.003722180343872929, "train_loss": 6.142464170686537, "test_0_loss": 4.77877281021782, "test_0_acc1": 15.232, "test_0_acc5": 33.184, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 18, "n_parameters": 5717416}
{"train_lr": 0.0036894342113766073, "train_loss": 6.004803928301679, "test_0_loss": 5.506222593730944, "test_0_acc1": 10.49, "test_0_acc5": 24.774, "test_5_loss": 5.900749441910766, "test_5_acc1": 6.776, "test_5_acc5": 16.361, "epoch": 19, "n_parameters": 5717416}
{"train_lr": 0.0036550257456777735, "train_loss": 6.135914517725877, "test_0_loss": 5.406659782199775, "test_0_acc1": 11.604, "test_0_acc5": 27.232, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 20, "n_parameters": 5717416}
{"train_lr": 0.0036189889037779527, "train_loss": 6.093383449254086, "test_0_loss": 5.010170250158621, "test_0_acc1": 13.934, "test_0_acc5": 30.648, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 21, "n_parameters": 5717416}
{"train_lr": 0.0035813592496895356, "train_loss": 6.059879529152176, "test_0_loss": 5.371730933796497, "test_0_acc1": 10.762, "test_0_acc5": 25.292, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 22, "n_parameters": 5717416}
{"train_lr": 0.00354217391933767, "train_loss": 6.073066240544323, "test_0_loss": 5.048453500617107, "test_0_acc1": 13.242, "test_0_acc5": 29.126, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 23, "n_parameters": 5717416}
{"train_lr": 0.0035014715839127445, "train_loss": 6.08733743352951, "test_0_loss": 5.331494154414533, "test_0_acc1": 10.974, "test_0_acc5": 25.072, "test_5_loss": 6.950556825081355, "test_5_acc1": 2.013, "test_5_acc5": 5.54425, "epoch": 24, "n_parameters": 5717416}
{"train_lr": 0.003459292411705762, "train_loss": 6.1378554502646505, "test_0_loss": 5.264302222605172, "test_0_acc1": 10.784, "test_0_acc5": 25.304, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 25, "n_parameters": 5717416}
{"train_lr": 0.0034156780284671424, "train_loss": 6.1356256470786965, "test_0_loss": 5.568371077798074, "test_0_acc1": 8.102, "test_0_acc5": 19.622, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 26, "n_parameters": 5717416}
{"train_lr": 0.003370671476327724, "train_loss": 6.071214107396029, "test_0_loss": 5.589009375886435, "test_0_acc1": 9.484, "test_0_acc5": 22.578, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 27, "n_parameters": 5717416}
{"train_lr": 0.00332431717132075, "train_loss": 5.972238908568732, "test_0_loss": 5.707102177010388, "test_0_acc1": 7.978, "test_0_acc5": 19.746, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 28, "n_parameters": 5717416}
{"train_lr": 0.0032766608595485736, "train_loss": 6.1025936421539955, "test_0_loss": 5.42314580428013, "test_0_acc1": 11.556, "test_0_acc5": 26.76, "test_5_loss": 7.786235228686171, "test_5_acc1": 0.8985, "test_5_acc5": 2.83875, "epoch": 29, "n_parameters": 5717416}
{"train_lr": 0.0032277495720376523, "train_loss": 6.084920492580087, "test_0_loss": 5.923481827123914, "test_0_acc1": 6.536, "test_0_acc5": 16.768, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 30, "n_parameters": 5717416}
{"train_lr": 0.003177631578323426, "train_loss": 6.115835655793298, "test_0_loss": 5.460594815469596, "test_0_acc1": 9.702, "test_0_acc5": 23.214, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 31, "n_parameters": 5717416}
{"train_lr": 0.003126356338814952, "train_loss": 6.0647800623846475, "test_0_loss": 5.592425890023786, "test_0_acc1": 10.268, "test_0_acc5": 23.954, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 32, "n_parameters": 5717416}
{"train_lr": 0.0030739744559831173, "train_loss": 6.054409027099609, "test_0_loss": 5.690968008889499, "test_0_acc1": 9.494, "test_0_acc5": 22.702, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 33, "n_parameters": 5717416}
{"train_lr": 0.003020537624422026, "train_loss": 6.063710031606597, "test_0_loss": 4.698054779361473, "test_0_acc1": 15.238, "test_0_acc5": 32.862, "test_5_loss": 7.121632807848168, "test_5_acc1": 1.17125, "test_5_acc5": 3.44575, "epoch": 34, "n_parameters": 5717416}
{"train_lr": 0.0029660985798329645, "train_loss": 5.910670252202703, "test_0_loss": 5.064415029280474, "test_0_acc1": 13.038, "test_0_acc5": 29.468, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 35, "n_parameters": 5717416}
{"train_lr": 0.0029107110469803756, "train_loss": 5.995033393136794, "test_0_loss": 4.890588184388418, "test_0_acc1": 14.9, "test_0_acc5": 32.556, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 36, "n_parameters": 5717416}
{"train_lr": 0.002854429686672256, "train_loss": 5.9739233070044975, "test_0_loss": 5.277839526410142, "test_0_acc1": 12.19, "test_0_acc5": 28.048, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 37, "n_parameters": 5717416}
{"train_lr": 0.002797310041816381, "train_loss": 5.918417158744318, "test_0_loss": 5.211479980214925, "test_0_acc1": 13.658, "test_0_acc5": 30.232, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 38, "n_parameters": 5717416}
{"train_lr": 0.002739408482605983, "train_loss": 5.925514445745116, "test_0_loss": 5.419682004859031, "test_0_acc1": 10.596, "test_0_acc5": 24.802, "test_5_loss": 7.283367584701997, "test_5_acc1": 1.14525, "test_5_acc5": 3.321, "epoch": 39, "n_parameters": 5717416}
{"train_lr": 0.002680782150889308, "train_loss": 5.883729684505341, "test_0_loss": 5.13738645015431, "test_0_acc1": 13.256, "test_0_acc5": 29.476, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 40, "n_parameters": 5717416}
{"train_lr": 0.0026214889037779947, "train_loss": 5.8768603530623835, "test_0_loss": 5.099718636911188, "test_0_acc1": 14.214, "test_0_acc5": 32.066, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 41, "n_parameters": 5717416}
{"train_lr": 0.002561587256548306, "train_loss": 5.914836130887389, "test_0_loss": 5.2416254764783865, "test_0_acc1": 11.636, "test_0_acc5": 26.856, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 42, "n_parameters": 5717416}
{"train_lr": 0.0025011363248938225, "train_loss": 5.900150206830386, "test_0_loss": 5.101568935318627, "test_0_acc1": 13.276, "test_0_acc5": 30.104, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 43, "n_parameters": 5717416}
{"train_lr": 0.00244019576658606, "train_loss": 5.86289567250809, "test_0_loss": 4.907099856829994, "test_0_acc1": 14.74, "test_0_acc5": 32.012, "test_5_loss": 7.820061174479342, "test_5_acc1": 1.0445, "test_5_acc5": 3.129, "epoch": 44, "n_parameters": 5717416}
{"train_lr": 0.0023788257225985108, "train_loss": 5.835502566836721, "test_0_loss": 5.569610283303093, "test_0_acc1": 11.684, "test_0_acc5": 26.968, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 45, "n_parameters": 5717416}
{"train_lr": 0.002317086757755297, "train_loss": 5.902843760119544, "test_0_loss": 5.3058019126750535, "test_0_acc1": 13.534, "test_0_acc5": 29.732, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 46, "n_parameters": 5717416}
{"train_lr": 0.0022550398009607707, "train_loss": 5.798361852205248, "test_0_loss": 5.197937856175087, "test_0_acc1": 13.114, "test_0_acc5": 29.348, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 47, "n_parameters": 5717416}
{"train_lr": 0.0021927460850704548, "train_loss": 5.73072993350353, "test_0_loss": 4.734242562521595, "test_0_acc1": 17.252, "test_0_acc5": 35.896, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 48, "n_parameters": 5717416}
{"train_lr": 0.0021302670864610006, "train_loss": 5.69394142336125, "test_0_loss": 5.058369449217657, "test_0_acc1": 14.906, "test_0_acc5": 32.252, "test_5_loss": 7.334822424390113, "test_5_acc1": 1.49225, "test_5_acc5": 4.25875, "epoch": 49, "n_parameters": 5717416}
{"train_lr": 0.002067664464360847, "train_loss": 5.709019121363294, "test_0_loss": 4.3592056012351925, "test_0_acc1": 19.068, "test_0_acc5": 39.056, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 50, "n_parameters": 5717416}
{"train_lr": 0.002005000000000026, "train_loss": 5.700551097960972, "test_0_loss": 4.753117966484123, "test_0_acc1": 15.846, "test_0_acc5": 33.87, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 51, "n_parameters": 5717416}
{"train_lr": 0.0019423355356391193, "train_loss": 5.7724813230031975, "test_0_loss": 4.528603377741876, "test_0_acc1": 18.206, "test_0_acc5": 37.726, "test_5_loss": 10.509043875979218, "test_5_acc1": 0.72375, "test_5_acc5": 2.323, "epoch": 52, "n_parameters": 5717416}

ksouvik52 · 2022-07-24T16:41:00Z

This is the command we are using:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=12349 --use_env main_adv_deit.py --model deit_tiny_patch16_224_adv --batch-size=128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_tiny_patch16_224

ytongbai · 2022-07-24T16:50:22Z

Just to be sure, you are using deit-tiny? Did the deit-small work on you side now after yesterday? (Want to diagnose the problem by step)

ksouvik52 · 2022-07-24T16:54:24Z

Okay, I will try with diet_small...have never tried yet.

ytongbai · 2022-07-24T16:56:30Z

Oh, but in your yesterday's response(#8 (comment)), you mentioned you are using deit_small_patch16_224_adv

'python -m torch.distributed.launch --nproc_per_node=4 --master_port=5672 --use_env main_adv_deit.py --model deit_small_patch16_224_adv --batch-size 128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_small_patch16_224'

ksouvik52 · 2022-07-24T16:57:09Z

Yes, I never tried with 8 gpu node for deit small. Trying now. So, the goal is to narrow down whether its model specific?

ytongbai · 2022-07-24T16:57:25Z

Oh I see..

ksouvik52 · 2022-07-24T16:59:09Z

One reason to choose deit_tiny is to see the results quickly, as the deit_small takes little longer to train. unfortunately it is not working yet.

ytongbai · 2022-07-24T17:04:22Z

Just want to debug the problem, since we didn't use deit-tiny setting in the paper. So first step is to make sure the original recipe works on your side to rule out other problems.

But yes I do have some assumptions on DeiT-Tiny. As we mentioned in the paper, the augmentations are very strong, and with strong regularization like adversariall training, therefore even with the DeiT-small 100 epoch model, the regularization is very strong. So if you shrink down the model size, I would assume the model is over-regularized and cannot give good results, maybe using less augmentation would help. But if the model size goes up, like adversarail training on base model, that wouldn't be a problem.

(Previously in our experiment, DeiT-tiny would not need strong augmentation to perform well on adversarial training
)

Hope it helps :)

ksouvik52 · 2022-07-24T17:20:51Z

Can you let me know what arguments you used to tone down augmentation on diet tiny to make it work?

ytongbai · 2022-07-24T17:32:36Z

Sure,
we didn't fully explore under this setting since it is out of the scope(fair comparison between these two structures), but still got some initial recipe for you to explore.

With tiny model:
With sgd, cosine lr scheduler, lr 0.1, no regularization and no augmentation, clean acc is around 54.9. pgd-5 is 31.7.
(Adam gives similar results but less stable, so would recommand the above setting.

ksouvik52 · 2022-07-24T17:59:53Z

Running on deit_small, will let you know the results...its around 2x slower than deit_tiny, so will be little late.

ytongbai · 2022-07-24T18:04:11Z

Sure no worries, I'll be around to help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adversarial training is not working #9

Adversarial training is not working #9

ksouvik52 commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022

Adversarial training is not working #9

Adversarial training is not working #9

Comments

ksouvik52 commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 • edited

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 • edited

ksouvik52 commented Jul 24, 2022 • edited

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 • edited

ytongbai commented Jul 24, 2022 • edited

ksouvik52 commented Jul 24, 2022

ytongbai commented Jul 24, 2022 • edited

ksouvik52 commented Jul 24, 2022 • edited

ytongbai commented Jul 24, 2022

ytongbai commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022 •

edited

ytongbai commented Jul 24, 2022 •

edited

ksouvik52 commented Jul 24, 2022 •

edited