KD with pretrained students. #429
Answered
by
yoshitomo-matsubara
liguopeng0923
asked this question in
Q&A
-
Hi @yoshitomo-matsubara , I want to know whether there are any distilling methods to fine-tune the pre-trained students, not train them from scratch. |
Beta Was this translation helpful? Give feedback.
Answered by
yoshitomo-matsubara
Dec 1, 2023
Replies: 1 comment 3 replies
-
For NLP tasks (e.g., GLUE), the following example and paper fine-tune pretrained BERT-Base as a student, using fine-tuned BERT-Large as a teacher Colab: https://github.com/yoshitomo-matsubara/torchdistill#glue |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For instance, the following papers present methods to train modified, pretrained image classification and object detection models as students (only first layers are modified and randomly initialized), learning from the original pretrained teacher models.
https://ieeexplore.ieee.org/abstract/document/9265295/
https://arxiv.org/abs/2007.15818
https://openaccess.thecvf.com/content/WACV2022/html/Matsubara_Supervised_Compression_for_Resource-Constrained_Edge_Computing_Systems_WACV_2022_paper.html