A curated list of awesome adversarial attack and defense papers, inspired by awesome-adv-ml.
- Intriguing properties of neural networks. Szegedy et al., 2013. (L-BFGS)
- Explaining and Harnessing Adversarial Examples. Goodfellow et al., 2014. (FGSM)
- DeepFool: a simple and accurate method to fool deep neural networks. Moosavi-Dezfooli et al., 2015. (DeepFool)
- The Limitations of Deep Learning in Adversarial Settings. Papernot et al., 2015. (JSMA)
- Towards Evaluating the Robustness of Neural Networks. Carlini et al., 2016. (C&W)
- Adversarial examples in the physical world. Kurakin et al., 2016. (BIM)
- Towards Deep Learning Models Resistant to Adversarial Attacks. Madry et al., 2017. (PGD)
- Boosting Adversarial Attacks with Momentum. Dong et al., 2017. (MIM)
- EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples. Chen et al., 2017. (EAD)
- Generating Adversarial Examples with Adversarial Networks. Xiao et al., 2018. (AdvGAN)
- Practical Black-Box Attacks against Machine Learning. Papernot et al., 2016.
- Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. Papernot et al., 2016.
- Delving into Transferable Adversarial Examples and Black-box Attacks. Liu et al., 2016.
- ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models. Chen et al., 2017. (ZOO)
- Practical Black-box Attacks on Deep NeuralNetworks using Efficient Query Mechanisms. Bhagoji et al., 2018. (PCA, random grouping)
- Black-box Adversarial Attacks with Limited Queries and Information. Ilyas et al., 2018. (NES)
- Prior convictions: Black-box adversarial attacks with bandits and priors. Ilyas et al., 2018. (Bandits-TD)
- Adversarial Risk and the Dangers of Evaluating Against Weak Attacks. Uesato et al., 2018. (SPSA)
- AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks. Tu et al., 2018.
- GenAttack: GenAttack: Practical Black-box Attacks with Gradient-Free Optimization. Alzantot et al., 2018.
- Simple Black-box Adversarial Attacks. Guo et al., 2019. (SimBA)
- There are No Bit Parts for Sign Bits in Black-Box Attacks. Al-Dujaili et al., 2019. (SignHunter)
- Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization. Moon et al., 2019.
- Improving Black-box Adversarial Attacks with a Transfer-based Prior. Cheng et al., 2019. (P-RGF)
- NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks. Li et al., 2019.
- BayesOpt Adversarial Attack. Ru et al., 2020.
- Black-box Adversarial Attacks with Bayesian Optimization. Shukla et al., 2020.
- Query-efficient Meta Attack to Deep Neural Networks. Du et al., 2020.
- Projection & Probability-Driven Black-Box Attack. Li et al., 2020. (PPBA)
- Square Attack: a query-efficient black-box adversarial attack via random search. Andriushchenko et al., 2020. (Square Attack)
- Switching Gradient Directions for Query-Efficient Black-Box Adversarial Attacks. Chen et al., 2020.
- Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. Brendel et al., 2017. (Boundary Attack)
- Black-box Adversarial Attacks with Limited Queries and Information. Ilyas et al., 2018. (NES-LO)
- Query-Efficient Hard-label Black-box Attack:An Optimization-based Approach. Cheng et al., 2018. (Optimization)
- Efficient Decision-based Black-box Adversarial Attacks on Face Recognition. Dong et al., 2019. (Evolutionary Attack)
- Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks. Brunner et al., 2019. (Biased Boundary Attack)
- HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. Chen et al., 2019. (Boundary Attack++)
- QEBA: Query-Efficient Boundary-Based Blackbox Attack. Li et al., 2020.
- Robust Physical-World Attacks on Deep Learning Models. Eykholt et al., 2017.
- Synthesizing Robust Adversarial Examples. Athalye et al., 2017. (EOT, 3D adv-turtle)
- ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector. Chen et al., 2018.
- Physical Adversarial Examples for Object Detectors. Eykholt et al., 2018.
- SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing. Qiu et al., 2019.
- Adversarial Objects Against LiDAR-Based Autonomous Driving Systems. Cao et al., 2019.
- Universal adversarial perturbations. Moosavi-Dezfooli et al., 2016
- Ensemble Adversarial Training: Attacks and Defenses. Tramer et al., 2017.
- Synthesizing Robust Adversarial Examples. Athalye et al., 2017. (EOT)
- CAAD 2018: Iterative Ensemble Adversarial Attack. Liu et al., 2018. (ens-PGD, CAAD 2018 5th)
- Beyond Adversarial Training: Min-Max Optimization in Adversarial Attack and Defense. Wang et al., 2019. (better ens-attack, universal perturbataion and EOT)
- A study of the effect of JPG compression on adversarial images. Dziugaite et al., 2016.
- Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. Xu et al., 2017.
- Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression. Das et al., 2017.
- Countering Adversarial Images using Input Transformations. Guo et al., 2017.
- Defending against Adversarial Images using Basis Functions Transformations. Shaham et al., 2018.
- Explaining and Harnessing Adversarial Examples. Goodfellow et al., 2014. (Adversarial Training, AT)
- Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. Papernot et al., 2015.
- Towards Deep Learning Models Resistant to Adversarial Attacks. Madry et al., 2017. (AT)
- Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning. Miyato, et al., 2017. (VAT)
- Extending Defensive Distillation. Papernot et al., 2017.
- Ensemble Adversarial Training: Attacks and Defenses. Tramèr et al., 2017. (ImageNet)
- Mitigating Adversarial Effects Through Randomization. Xie et al., 2017.
- Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. Ross et al., 2017.
- Towards Robust Neural Networks via Random Self-ensemble. Liu et al., 2017.
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. Athalye et al., 2018. (ICML 2018 best paper)
- Adversarial Logit Pairing. Kannan et al., 2018. (ALP, ImageNet)
- Curriculum Adversarial Training. Cai et al., 2018. (CAT)
- Improved robustness to adversarial examples using Lipschitz regularization of the loss. Finlay et al., 2018.
- Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network. Liu et al., 2018.
- Feature Denoising for Improving Adversarial Robustness. Xie et al., 2018. (CAAD 2018 1st, ImageNet)
- Theoretically Principled Trade-off between Robustness and Accuracy. Zhang et al., 2019. (TRADES)
- Defensive Quantization: When Efficiency Meets Robustness. Lin et al., 2019.
- Beyond Adversarial Training: Min-Max Optimization in Adversarial Attack and Defense. Wang et al., 2019. (Generalized Adversarial Training, GAT)
- Unlabeled Data Improves Adversarial Robustness. Carmon et al., 2019.
- Are Labels Required for Improving Adversarial Robustness? Uesato et al., 2019.
- Adversarially Robust Generalization Just Requires More Unlabeled Data. Zhai et al., 2019.
- MagNet: a Two-Pronged Defense against Adversarial Examples. Meng et al., 2017.
- Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser. Liao et al., 2017. (HGD)
- Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality. Ma et al., 2018. (LID)
- Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. Samangouei et al., 2018.
- ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples. Jia et al., 2018.