Official repository for the paper Faithful Vision-Language Interpretation via Concept Bottleneck Models (ICLR 2024).
Authors: Songning Lai, Lijie Hu, Junxiao Wang, Laure Berti-Equille, Di Wang
An overview of our pipeline for creating FVLC. The concept set from GPT-3 is encoded by CLIP to obtain E_T; the image is processed by the backbone and CLIP image encoder. The activation matrix M and mappings g, W_c, and W_F are learned. L1–L4 are used to obtain the faithful mapping W̃_c and g̃(x). Noise is introduced in the concept set, text encoder, and input to validate faithfulness.
Places365 example: input image, concept output without perturbation, and concept output after concept-word and input perturbations. Concept positions and ranks change (e.g. the concept “surgery”); the prediction also changes under slight perturbations.
The demand for transparency in healthcare and finance has led to interpretable machine learning (IML) models, notably Concept Bottleneck Models (CBMs), valued for their potential in performance and insights into deep neural networks. However, CBM's reliance on manually annotated data poses challenges. Label-free CBMs have emerged to address this, but they remain unstable, affecting their faithfulness as explanatory tools.
We introduce a formal definition for Faithful Vision-Language Concept (FVLC) and a methodology for constructing an FVLC that satisfies four critical properties. Our experiments on four benchmark datasets (CIFAR-10, CIFAR-100, CUB, Places365) demonstrate that FVLC outperforms baselines regarding stability against input and concept set perturbations (WP1, WP2, IP), with minimal accuracy degradation compared to vanilla Label-free CBM.
- Four properties of a faithful concept: (i) significant top-k overlap for interpretability; (ii) stability of the concept vector under noise and concept-set perturbations; (iii) prediction distribution close to vanilla CBM; (iv) stable output distribution under perturbations.
- Objective: Min-max formulation (Eq. 7 in the paper) with losses L₁–L₄ (overlap, concept stability, prediction closeness, output stability). We use PGD to find worst-case perturbations δ (input) and ρ (concept words), then update the projection layer.
- Metrics: TCPC (Top Concept Prediction Change) and TOPC (Top Overlap Prediction Change); lower is more stable.
| Method | CIFAR10 | CIFAR100 | CUB | Places365 |
|---|---|---|---|---|
| Standard (no interpretability) | 88.80 | 70.10 | 76.70 | 48.56 |
| P-CBM (CLIP) | 84.50 | 56.00 | N/A | N/A |
| Label-free CBM | 86.32 | 65.42 | 74.23 | 43.63 |
| WP1(10%) – base | 86.25 | 65.09 | 73.97 | 43.67 |
| WP1(10%) – FVLC | 86.39 | 64.90 | 73.92 | 43.62 |
| WP2 – base | 86.41 | 65.16 | 73.96 | 43.54 |
| WP2 – FVLC | 86.22 | 65.34 | 74.44 | 44.55 |
| IP – base | 86.62 | 65.36 | 74.39 | 43.64 |
| IP – FVLC | 86.88 | 65.29 | 74.01 | 43.71 |
FVLC keeps accuracy on par or better while greatly improving stability (see Table 2).
| Method | CIFAR10 | CIFAR100 | CUB | Places365 |
|---|---|---|---|---|
| WP1(10%) – base | 1.99E-01 / 8.36E-02 | 1.94E-01 / 1.31E-01 | 2.32E-01 / 3.41E-01 | 2.26E-01 / 1.14E-01 |
| WP1(10%) – FVLC | 1.19E-03 / 7.40E-03 | 3.67E-03 / 4.55E-03 | 1.19E-02 / 1.53E-03 | 1.39E-03 / 1.25E-03 |
| WP2 – base | 1.53E-01 / 4.99E-02 | 1.36E-01 / 6.67E-02 | 1.43E-01 / 1.73E-01 | 1.40E-01 / 6.37E-02 |
| WP2 – FVLC | 1.10E-02 / 8.72E-03 | 3.35E-03 / 4.55E-03 | 1.05E-02 / 1.53E-03 | 1.55E-03 / 1.29E-03 |
| IP – base | 1.68E-01 / 6.28E-02 | 1.38E-01 / 8.81E-02 | 1.71E-01 / 2.23E-01 | 1.73E-01 / 8.09E-02 |
| IP – FVLC | 8.02E-03 / 8.29E-03 | 3.24E-03 / 4.56E-03 | 1.04E-02 / 1.59E-03 | 1.50E-03 / 1.25E-03 |
Ablation shows that L₂, L₃, and L₄ all contribute; using all three (✓✓✓) gives the best TCPC/TOPC.
Concept weights and final-layer weights on one sample per dataset: input x, concept without perturbation (c), with perturbation (c+δ), and optimized with perturbation (c̃+δ).
# Clone this repository
git clone https://github.com/xll0328/FVLC.git
cd FVLC
# Create environment and install dependencies
pip install -r requirements.txt- Python: 3.8+
- PyTorch: with CUDA if available.
- CLIP: used for text/image encoders; concept sets are under
data/concept_sets/.
bash download_cub.sh # CUB-200-2011
bash download_models.sh # pretrained backbone / CLIP
bash download_rn18_places.sh # ResNet-18 for Places365-
Train Label-free CBM (base):
python train_cbm.py --dataset cifar10
(Seetrain_cbm.pyfor--concept_set,--backbone,--clip_name, etc.) -
Train FVLC (with faithfulness losses):
python train_fcbm_all.py --dataset cifar10
or projection-only:
python train_fcbm_projonly.py --dataset cifar10
(Uses PGD for δ and ρ; seeattack.pyandmetric.pyfor TCPC/TOPC and overlap losses.) -
Evaluation:
python eval.py --dataset cifar10 --model_path <path>
(Adjust--model_pathand dataset flags as needed.) -
Attack/perturbation evaluation:
python attack.py(see script args for perturbation types: WP1, WP2, IP).
Example commands are also listed in training_commands.txt.
train_cbm.py— train base Label-free CBMtrain_fcbm_all.py— train full FVLC (all losses)train_fcbm_projonly.py— train FVLC projection layer onlyattack.py— PGD attacker for input/concept perturbationsmetric.py— top-k overlap, TCPC/TOPC, robust lossescbm.py— CBM model definitiondata_utils.py,utils.py— data and backbone/CLIP helpersdata/concept_sets/— concept sets per datasetclip/— CLIP model and tokenizer
If you use this code or the paper, please cite:
@inproceedings{lai2023faithful,
title={Faithful Vision-Language Interpretation via Concept Bottleneck Models},
author={Lai, Songning and Hu, Lijie and Wang, Junxiao and Berti-Equille, Laure and Wang, Di},
booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
year={2024}
}- Paper: OpenReview
- Project page: https://xll0328.github.io/fvlc/


