Zero-Shot Knowledge Distillation in Deep Networks

Paper link :

Presentation slides link :

Poster link :


  • Python 2.7
  • tensorflow-gpu 1.10.0
  • tensorboard 1.10.0
  • cudatoolkit 9.0
  • cudnn 7.3.1
  • tqdm 4.32.2
  • keras-gpu 2.2.4
  • numpy 1.15.4

How to use this code:

The cifar 10 dataset is available at:

Copy the cifar 10 folder from the above link and put it in the model_training/dataset/ folder

Go to the folder "model_training"

  • Step 1 : Train the Teacher network with cifar 10
 CUDA_VISIBLE_DEVICES=0 python --network teacher --dataset cifar10 --suffix original_data --epoch 1000 --batch_size 512 

The pretrained teacher model weights are also kept in checkpoints/teacher/ folder.

  • Step 2 : Extract final layer weights from the Pretrained Teacher Network

Make sure the checkpoint and meta graph path is correct in the script.

  • Step 3 : Compute and save the Class Similarity for scales of 1.0 and 0.1

Go to the folder di_generation/


Two files with name "visualMat_alexnet_cifar10_scale_1.pickle" and "visualMat_alexnet_cifar10_scale_0.1.pickle" will get saved in the same directory

  • Step 4 : Generate the Data Impressions (DI's)

40000 Di's will be saved in the folder alex_di/cifar_10/dirichlet/40000_di/

The sample generated DI's are also available at :

  • Step 5 : Train the Student network with generated DI's
 CUDA_VISIBLE_DEVICES=0 python --network student --dataset data_impressions --data_augmentation


If you use this code, please cite our work:

title={Zero-Shot Knowledge Distillation in Deep Networks},
author={Nayak, G. K., Mopuri, K. R., Shaj, V., Babu, R. V., and Chakraborty, A.},
booktitle={International Conference on Machine Learning},
