GitHub - uvavision/TV-GZSL: On the Transferability of Visual Features in Generalized Zero-Shot Learning Toolkit

A Toolkit for large scale analysis of Visual features and
Generalized Zero-Shot Learning (GZSL) methods

Download all files located in this folder.
- You can also download only the features from the backbones you want to use in your experiments.
- If you want to use the traditional RN101 and RN101 fine-tuned features for each dataset you can download this folder only. Just make sure it is inside the data directory you just created.

Features and Methods

Datasets	Backbone Types	GZSL Families
CUB	CNN	Embedding-based
SUN	ViT	Generative-based
AWA2	MLP-Mixer	Disentanglement-based

FAQ

Code is based on original authors implementations, including seed and hyperparameter selection.
Codebase should be used to reproduce the results we report.
Run the command below to reproduce the CADA-VAE results on CUB using the RN101 features:

CUDA_VISIBLE_DEVICES=0 python main.py --method CADA --dataset CUB --feature_backbone resnet101

If you want to use the fine-tuned features you should add the finetuned_features flag:

CUDA_VISIBLE_DEVICES=0 python main.py --method CADA --dataset CUB --feature_backbone resnet101 --finetuned_features

If you want to use the a different method and feature you should add the feature_backbone flag and change the method name:
- Method name: --method SDGZSL
- Use CLIP w/ ViT/B32 features: --feature_backbone vit_b32_clip
- Run your code in a different GPU: CUDA_VISIBLE_DEVICES=1

CUDA_VISIBLE_DEVICES=1 python main.py --method SDGZSL --dataset CUB --feature_backbone vit_b32_clip

Available Parameters

Everything you need to run is in main.py. The Wrapper class contains all the main functions to create the model, prepare the dataset, and train your model. The arguments you pass are handled by the Wrapper.
Please play a special attention to the --feature_backbone parameter to use the pre-computed features you are looking for!

usage: main.py [-h] [--dataset DATASET]
               [--feature_backbone {resnet101,resnet152,resnet50,resnet50_moco,googlenet,vgg16,alexnet,shufflenet,vit,vit_large,adv_inception_v3,inception_v3,resnet50_clip,resnet101_clip,resnet50x4_clip,resnet50x16_clip,resnet50x64_clip,vit_b32_clip,vit_b16_clip,vit_l14_clip,virtex,virtex2,mlp_mixer,mlp_mixer_l16,vit_base_21k,vit_large_21k,vit_huge,deit_base,dino_vitb16,dino_resnet50,biggan_138k_128size,biggan_100k_224size,vq_vae_fromScratch,soho,combinedv1,combinedv2,vit_l14_clip_finetune_v2,vit_l14_clip_finetune_classAndAtt,vit_l14_clip_finetune_class200Epochs,vit_l14_clip_finetune_trainsetAndgenerated_100Epochs,vit_l14_clip_finetune_trainsetAndgenerated_200Epochs,vit_l14_clip_finetuned_classAndAtt_200Epochs,vit_l14_clip_finetuned_setAndgenerated_classAndAtt_100Epochs,vit_l14_clip_finetuned_setAndgenerated_classAndAtt_200Epochs,clip_l14_finetune_classes_200epochs,clip_l14_finetun_atts_200epochs,clip_l14_finetun_atts_200epochs,clip_l14_finetune_classes_200epochs_frozenAllExc1Layer,clip_l14_finetun_atts_200epochs_frozenAllExc1Layer,clip_l14_finetune_classAndAtt_200epochs_frozenAllExc1Layer,clip_l14_finetune_classes_200epochs_frozenTextE,clip_l14_finetun_atts_200epochs_frozenTextE,clip_l14_finetune_classAndAtt_200epochs_frozenTextE,clip_l14_finetun_atts_fromMAT_200epochs,clip_l14_finetun_classAndatts_fromMAT_200epochs,clip_l14_finetun_class_fromMAT_200epochs,vit_large_finetune_classes_200epochs}]
               [--methods {DEVISE,ESZSL,ALE,CADA,tfVAEGAN,CE,SDGZSL,FREE,UPPER_BOUND}]
               [--finetuned_features] [--data_path DATA_PATH]
               [--workers WORKERS] [--dropout DO] [--optimizer OPTIMIZER]
               [--epochs N] [--start_epoch N] [-b N] [--lr LR]
               [--initial_lr LR] [--lr_rampup EPOCHS]
               [--lr_rampdown_epochs EPOCHS] [--momentum M] [--nesterov]
               [--weight-decay W] [--doParallel] [--print_freq N]
               [--root_dir ROOT_DIR] [--add_name ADD_NAME] [--exp_dir EXP_DIR]
               [--load_from_epoch LOAD_FROM_EPOCH] [--seed SEED]

Original Method Repositories: - please cite all of them accordingly!

Finetuning

Download the dataset images and annotations:

Unzip them in a data folder inside the finetune folder:

~$ cd anonymized_code/finetune/
~$ mkdir data
~$ tar -xvf [filename]

Finetune:
- Dataloaders: sample code
- Unimodal Backbones: sample code
- CLIP: sample code

How to: Adding New Methods

You can add a new method under the methods folder. Then, you should only modify the utils/general_config.py and wrapper.py files to reference your new method:

Add your method name in the choices array of the methods argument in utils/general_config.py array all_methods.
In wrapper.py you should include the new parameter option when initializing the Wrapper Class.
To support all available features in your custom method: from utils.cada_dataloader import DATA_LOADER
To reuse the final classifier for Generative-based and Disentanglement-based methods, you can use the LINEAR_LOGSOFTMAX class inside wrapper.py

Updates

✅ All 54 visual features for all datasets are available here! ⭐
✅ Initial codebase is now available! ⏫
🔲 Please expect regular updates and commits of this repo.

On the Transferability of Visual Features in Generalized Zero-Shot Learning (GZSL) :: TV-GZSL

About

Our work provides a comprehensive benchmark for Generalized Zero-Shot Learning (GZSL). We benchmark extensively the utility of different GZSL methods which we characterize as embedding-based, generative-based, and based on semantic disentanglement. We particularly investigate how these previous methods for GZSL fare against CLIP, a more recent large scale pretrained model that claims zero-shot performance by means of being trained with internet scale multimodal data. Our findings indicate that through prompt engineering over an off-the-shelf CLIP model, it is possible to surpass all previous methods on standard benchmarks for GZSL: CUB (Birds), SUN (scenes), and AWA2 (animals). While it is possible that CLIP has actually seen many of the unseen categories in these benchmarks, we also show that GZSL methods in combination with the feature backbones obtained through CLIP contrastive pretraining (e.g. ViT~L/14) still provide advantages in standard GZSL benchmarks over off-the-shelf CLIP with prompt engineering. In summary, some GZSL methods designed to transfer information from seen categories to unseen categories still provide valuable gains when paired with a comparable feature backbone such as the one in CLIP. Surprisingly, we find that generative-based GZSL methods provide more advantages compared to more recent methods based on semantic disentanglement. We release a well-documented codebase which both replicates our findings and provides a modular framework for analyzing representation learning issues in GZSL.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
finetune		finetune
functionalities		functionalities
graphics		graphics
methods		methods
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
main.py		main.py
wrapper.py		wrapper.py

License

uvavision/TV-GZSL

Folders and files

Latest commit

History

Repository files navigation

A Toolkit for large scale analysis of Visual features and Generalized Zero-Shot Learning (GZSL) methods

Table of Contents

Requirements

Data Setup

Features and Methods

FAQ

Available Parameters

Original Method Repositories: - please cite all of them accordingly!

Finetuning

How to: Adding New Methods

Updates

On the Transferability of Visual Features in Generalized Zero-Shot Learning (GZSL) :: TV-GZSL

About

About

Resources

License

Stars

Watchers

Forks

Languages

A Toolkit for large scale analysis of Visual features and
Generalized Zero-Shot Learning (GZSL) methods