Skip to content

zhengli97/DM-KD

Repository files navigation

DM-KD

This is the official implementation of "Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?"

Abstract

In this paper, we extensively study whether and how synthetic images produced from state-of-the-art diffusion models can be used for knowledge distillation without access to real images, and obtain three key conclusions: (1) synthetic data from diffusion models can easily lead to state-of-the-art performance among existing synthesis-based distillation methods, (2) low-fidelity synthetic images are better teaching materials, and (3) relatively weak classifiers are better teachers.

Framework

We propose to use the diffusion model to synthesize images given the target label space when the real dataset is not available. The student model is optimized to minimize the prediction discrepancy between itself and the teacher model on the synthetic dataset.

Main Results

On CIFAR-100:

Method Syn Method T:RN-34
S:RN-18
T:VGG-11
S:RN-18
T:WRN-40-2
S:WRN-16-1
T:WRN-40-2
S:WRN-40-1
T:WRN-40-2
S:WRN-16-2
Teacher
Student
KD
-
-
-
78.05
77.10
77.87
71.32
77.10
75.07
75.83
65.31
64.06
75.83
72.19
68.58
75.83
73.56
70.79
DeepInv
DAFL
DFQ
FastDFKD
Inversion
Inversion
Inversion
Inversion
61.32
74.47
77.01
74.34
54.13
54.16
66.21
67.44
53.77
20.88
51.27
54.02
61.33
42.83
54.43
63.91
61.34
43.70
64.79
65.12
One-Image
DM-KD(Ours)
Augmentation
Diffusion
74.56
76.58
68.51
70.83
34.62
56.29
52.39
65.01
54.71
66.89

On ImageNet-1K:

Method Syn Method Data Amount Epoch T:RN-50/18
S:RN-50
T:RN-50/18
S:RN-18
Place365+KD
BigGAN
DeepInv
FastDFKD
-
GAN
Inversion
Inversion
1.8M
215K
140K
140K
200
90
90
200
55.74
64.00
68.00
68.61
45.53
-
-
53.45
One-Image Augmentation 1.28M 200 66.20 -
DM-KD(Ours) Diffusion 140K
200K
1.28M
200
200
200
66.74
68.63
72.43
60.10
61.61
68.25

On other datasets:

Datasets Categories
(#Classes)
Teacher
T:RN-18
One-Image
S:RN-50
Ours
S:RN-50
ImageNet-100 Objects (100) 89.6 84.4 85.9
Flowers-102 Flower types (102) 87.9 81.5 85.4

Running

  1. Synthesize data using diffusion models.

In this paper, we use DiT as our default image generator. There are two ways to create synthetic data for distillation.

  • You can synthesize your own data according to the official implementation. The DiT model is XL/2. The Image Resolution is 256X256.

After the sythesis is complete, put the synthesized data into the syn_dataset folder.

  • We provide offline synthesized datasets for experiments.
Synthetic Dataset Data Amount Size Download1
Low Fidelity (s=2, T=100) 200K ~20GB [Baidu Yun]
High Fidelity (s=4, T=250) 200K ~20GB [Baidu Yun]

Download the synthesized dataset and put the dataset into the syn_dataset folder. Then unzip the dataset in the current folder.

  1. Download the true ImageNet validation dataset and put them into the true_dataset folder. Then unzip the dataset in the current folder. You can download the true ImageNet-1K dataset using our links ([Baidu Yun]). Or you can download the dataset from the official website.

  2. Download the pretrained teacher models and put them into the save/models folder.

Teacher models Download1
ImageNet [Baidu Yun]

Note that the pretrained teacher models are also available at torchvision. You can also download the pretrained models from the official website.

  1. Training on 200K synthesized ImageNet-1K dataset: Run the script:
sh scripts/imagenet_distill.sh

The running file will be saved in the save/student_model folder.