Source: Emerging Properties in Self-Supervised Vision Transformers.
This repository implements DINO (self-distillation with no labels) using PyTorch Lightning.
This repository is part of my broader goal to implement DINOv2 for building foundation-level vision models without the need for labels.
- Self-supervised Pre-training: Supports pre-training on the ImageNet-1k dataset, available on Hugging Face.
- Linear Probing: For ImageNet-1k, CIFAR-10, and CIFAR-100.
- Attention Visualization: Multi-head attention visualization on images.
Dataset | Loss | Accuracy |
---|---|---|
CIFAR-10 | 0.2640 | 90.09% |
CIFAR-100 | 0.8897 | 74.34% |
ImageNet-1k | 0.8897 | 71.25% |
pip install -r requirements.txt
To download ImageNet-1k before pre-training, create a .env
file using .env.example
as a template and enter your HuggingFace token.
HF_TOKEN=YOUR_HF_TOKEN
Once completed, enter the src
directory and run:
python get_imagenet.py
Configure pre-training through pre-train.yaml
found under the src/configs
directory. The configuration used in my experiments is shown below:
# network
backbone: vit-s-16
mlp_layers: 3
hidden_dim: 2048
bottleneck_dim: 256
k_dim: 65536
# ema teacher momentum
base_teacher_momentum: 0.996
final_teacher_momentum: 1.000
# weight decay
base_weight_decay: 0.04
final_weight_decay: 0.4
# learning rate
warmup_epochs_lr: 10
warmup_start_lr: 0.0
final_lr: 1.0e-6
# temperatures
student_temp: 0.1
warmup_teacher_epochs: 0
warmup_teacher_temp: 0.04
final_teacher_temp: 0.04
# cropping
global_scale_min: 0.4
global_scale_max: 1.0
local_scale_min: 0.05
local_scale_max: 0.4
num_local_crops: 10
# others
batch_size: 1024
center_momentum: 0.9
seed: 42
epochs: 100
experiment_num: 0
Configure the finetuning script through finetune.yaml
which is also found under the src/configs
directory. The configuration used in my experiments is shown below:
backbone: vit-s-16
seed: 42
epochs: 100
lr: 1.0e-4
eta_min: 1.0e-6
batch_size: 8
weight_decay: 1.0e-5
experiment_num: 0
dataset: cifar-10
To pre-train and finetune the encoders, run the following from within the src
directory:
# self-supervised pre-training
python pre_train.py
# finetuning
python finetune.py
- Implement DINO for self-supervised learning.
- Embedding visualization.
- Linear probe evaluation for CIFAR datasets.
- Linear probe evaluation for ImageNet dataset.
- Sync BatchNorm for Multi-GPU ResNet-50 pre-training.
- KNN evaluation on CIFAR datasets.
- KNN evaluation on ImageNet dataset.
- Full fine-tuning evaluation.