This project is for the paper: Enhanced Regularizers for Attributional Robustness. Some codes are from Robust Attribution Regularization, CIFAR10 Challenge, tflearn oxflower17 and Interpretation of Neural Network is Fragile.
This project is to solve an emerging problem in trustworthy machine learning: train models that produce robust interpretations for their predictions. See the example below:
It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requries some packages to be installed:
- MNIST: included in Tensorflow.
- Fashion-MNIST: can be loaded via Tensorflow.
- GTSRB: we provide scripts to download it.
- Flower: we provide scripts to download it.
- Before doing experiments, first edit config.json file to specify experiment settings. We provide some template config files like IG-NORM-config.json. You may need to run
mkdir models
before training models. - For GTSRB and Flower, you can run prepare_data.sh to get dataset.
- train_nat.py: the script used to train NATURAL models.
- train_adv.py: the script used to train Madry's models.
- train_our.py: the script used to train our models(IG-NORM or IG-SUM-NORM).
- eval_pgd_attack.py: the script used to evaluate NA and AA of the model.
- eval_attribution_attack.py: the script used to evaluate IN and CO of the model.
- test_ig_attack.ipynb: ipython notebook used to present demo figures.
Model configuration:
model_dir
: contains the path to the directory of the currently trained/evaluated model.
Data configuration:
data_path
: contains the path to the directory of dataset.
Training configuration:
tf_random_seed
: the seed for the RNG used to initialize the network weights.numpy_random_seed
: the seed for the RNG used to pass over the dataset in random order.max_num_training_steps
: the number of training steps.num_output_steps
: the number of training steps between printing progress in standard output.num_summary_steps
: the number of training steps between storing tensorboard summaries.num_checkpoint_steps
: the number of training steps between storing model checkpoints.training_batch_size
: the size of the training batch.step_size_schedule
: learning rate schedule array.weight_decay
: weight decay rate.momentum
: momentum rate.m
: m in the gradient step.continue_train
: whether continue previous training. Should be True or False.lambda
: lambda in IG-NORM or beta in IG-SUM-NORM.approx_factor
: (m / approx_factor) = (m in the attack step).training_objective
: 'ar' for IG-NORM and 'adv_ar' for IG-SUM-NORM.
Evaluation configuration:
num_eval_examples
: the number of examples to evaluate the model on.eval_batch_size
: the size of the evaluation batches.
Adversarial examples configuration:
epsilon
: the maximum allowed perturbation per pixel.num_steps
ork
: the number of PGD iterations used by the adversary.step_size
ora
: the size of the PGD adversary steps.random_start
: specifies whether the adversary will start iterating from the natural example or a random perturbation of it.loss_func
: the loss function used to run pgd on.xent
corresponds to the standard cross-entropy loss,cw
corresponds to the loss function of Carlini and Wagner,ar_approx
corresponds to the regularization term of our IG-NORM objective,adv_ar_approx
corresponds to our IG-SUM-NORM objective.
Integrated gradient configuration:
num_IG_steps
: the number of segments for summation appproximation of IG.
Attribution robustness configuration:
attribution_attack_method
: can berandom
,topK
,mass_center
andtarget
.attribution_attack_measure
: can bekendall
,intersection
,spearman
andmass_center
.saliency_type
: can beig
orsimple_gradient
.k_top
: the k used for topK attack.eval_k_top
: the k used for evaluation metric -- TopK intersection.attribution_attack_step_size
: step size of attribution attack.attribution_attack_steps
: the number of iterations used by the attack.attribution_attack_times
: the number of iterations to repeat the attack.
To-do:
- Codebase for other datasets are coming soon.
Please cite our work if you use the codebase:
Please refer to the LICENSE.