ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models (ICLR 2024)

Authors: Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
Paper: "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models"
Project Page

We propose ECoFLaP, a two-stage coarse-to-fine weight pruning approach for Large Vision-Language Models (LVLMs). We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients. Then, the multimodal model performs local layer-wise unstructured weight pruning based on the given ratios.

We validate our proposed method across various multimodal and unimodal models and datasets, demonstrating significant performance improvements over prevalent pruning techniques in the high-sparsity regime.

Changelog

[Feb 2024] Checkpoints are added

Checkpoints

BLIP2

Sparsities are all 0.5

Wanda	ECoFLaP first-order	ECoFLaP zeroth-order
Ckpt	Ckpt	Ckpt

FlanT5 XL

Wanda	ECoFLaP first-order	ECoFLaP zeroth-order
Ckpt	Ckpt	Ckpt

ViT b/16

Wanda	ECoFLaP first-order	ECoFLaP zeroth-order
Ckpt	Ckpt	Ckpt

CLIP

Sparsities are all 0.4

Wanda	SparseGPT	ECoFLaP w/ Wanda	ECoFLaP w/ SparseGPT
Ckpt	Ckpt	Ckpt	Ckpt

BLIP

Sparsities are all 0.5

Dataset	Wanda	ECoFLaP	ECoFLaP w/ fine-tuning
VQA	Ckpt	Ckpt	Ckpt
NLVR2	Ckpt	Ckpt	Ckpt
Flickr	Ckpt	Ckpt	Ckpt
COCO Caption	Ckpt	Ckpt	Ckpt

Some additional results regarding BLIP models

Methods	VQA (test dev)	Flickr30k (TR@1/IR@1)	NLVR2 (val/text)	COCO Cap. (CIDEr/SPICE)
Full model	77.4	96.8/86.9	82.3/83.6	133.3/23.8
Wanda (w/o fine-tuning)	71.9	85.3/72.3	78.3/78.1	97.1/18.4
ECoFLaP (w/o fine-tuning)	73.6	90.2/79.5	79.1/79.2	111.0/20.3
UPop (w/ fine-tuning)	76.3	94.0/82.0	80.3/81.1	128.9/23.3
ECoFLaP (w/ fine-tuning)	76.7	96.8/85.6	81.8/82.5	132.3/23.8

BLIP-2, FlanT5, ViT experiment scripts

The main code for this part is in LAVIS/. Please do everything in LAVIS/ by cd LAVIS/.

Installation

pip install -e .

Dataset

Follow the scripts in lavis/datasets/download_scripts/ to download the datasets.

BLIP-2 Scripts

## BLIP-2 experiments

# ECoFLaP - zeroth order
python scripts/blip2/ecoflap_zeroth.py 0 12341

# ECoFLaP - first order
python scripts/blip2/ecoflap_first.py 0 12341

# Wanda
python scripts/blip2/wanda.py 0 12341

# SparseGPT
python scripts/blip2/sparsegpt.py 0 12341

ViT Scripts

# ECoFLaP - zeroth order
python scripts/eva_clip/ecoflap.py 0 12341

# Wanda
python scripts/eva_clip/wanda.py 0 12341

FlanT5 Scripts

### Generate the pruned checkpoint

# ECoFLaP - zeroth order
python scripts/t5/ecoflap.py 0 12341

### Do the five-shot evaluation

# go to the mmlu_eval folder
cd ../mmlu_eval

# Make sure to assign pruned_checkpoint to the checkpoint generated in the previous step
bash test.sh

CLIP experiments

The main code for this part is in CoOp/. Please do everything in CoOp/ by cd CoOp/.

Installation

pip install -r requirements.txt

Dataset

Follow the scripts in DATASETS.md to download the datasets.

Scripts

# Wanda and ECoFLaP (w/ Wanda)
bash scripts/coop/ecoflap_wanda.sh

# SparseGPT and ECoFLaP (w/ SparseGPT)
bash scripts/coop/ecoflap_sparsegpt.sh

BLIP experiments to compare with UPop

The main code for this part is in UPop/. Please do everything in UPop/ by cd UPop/.

Installation

pip install -r requirements.txt

Dataset

Follow the scripts in README.md to download the datasets.

Scripts

### task=coco, flickr, nlvr2, vqa

# Wanda
bash ecoflap_scripts/${task}/wanda.sh

# ECoFLaP
bash ecoflap_scripts/${task}/ecoflap.sh

# Fine-tune the pruned checkpoint obtained by ECoFLaP
bash ecoflap_scripts/${task}/ecoflap_finetuning.sh

LLaMA experiments

The main code for this part is in LLaMA/. Please do everything in LLaMA/ by cd LLaMA/.

Installation

Follow the scripts in Install.md.

Scripts

I removed --cache_dir so the program will read the cache that store in $HF_HOME$ (if specified) or default cache directory.

# ECoFLaP
bash scripts/ecoflap_zero.sh 0

Bibtex

@inproceedings{Sung2024ECoFLaP,
    author = {Yi-Lin Sung, Jaehong Yoon, Mohit Bansal},
    title = {ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
CoOp		CoOp
LAVIS		LAVIS
LLaMA		LLaMA
UPop		UPop
assets		assets
mmlu_eval		mmlu_eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

ylsung/ECoFLaP

Folders and files

Latest commit

History

Repository files navigation

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models (ICLR 2024)

Changelog

Checkpoints

BLIP2

FlanT5 XL

ViT b/16

CLIP

BLIP

BLIP-2, FlanT5, ViT experiment scripts

Installation

Dataset

BLIP-2 Scripts

ViT Scripts

FlanT5 Scripts

CLIP experiments

Installation

Dataset

Scripts

BLIP experiments to compare with UPop

Installation

Dataset

Scripts

LLaMA experiments

Installation

Scripts

Bibtex

About

Resources

License

Stars

Watchers

Forks

Languages