Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal.
# Create python environment (optional)
conda create -n vidlankd python=3.7
# Install python dependencies
pip install -r requirements.txt
To speed up the training, we use mixed precision with Apex.
git clone
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
(1. Teacher model (BERT-12L-768H) 2. student mode (BERT-12L-768H, KD-NST)) Gdrive link We also updated small models.
Creat directory and put the models under 'snap/vlm' or custom name
We provide scripts to obtain datasets "wiki103" and "wiki".
Wiki103, a seleted subset of English Wikipedia.
bash data/wiki103/get_data_cased.bash
English Wikipedia. The scripts are modified from XLM.
bash data/wiki/get_data_cased.bash en
Howto100m where you can download official captions and videos features.
We follow Howtoo100m to use its feature extractor 2D+3D
- We extracted our 2D-level video features with ResNet152 from torchvision.
- We extracted our 3D-level video features with 3D-RexNext.
GLUE dataset
Download dataset
python --data_dir data/glue --tasks all
Teacher model pre-training
# bash scripts/small_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/small_vlm_howto100m.bash 0,1,2,3 howto100m_bert_small_vokenhinge
# bash scripts/base_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/base_vlm_howto100m.bash 0,1,2,3 howto100m_bert_base_vokenhinge
Knowledge transfer to student model
# bash scripts/small_vlm_wiki103.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/small_vlm_wiki103.bash 0,1,2,3 howto100m_bert_small_vokenhinge/checkpoint-epoch0019 wiki103_bert_small_vokenmmd
# bash scripts/base_vlm_wiki.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/base_vlm_wiki.bash 0,1,2,3 howto100m_bert_base_vokenhinge/checkpoint-epoch0019 wiki_bert_base_vokenmmd
Baseline BERT model
bash scripts/base_wiki.bash 0,1,2,3 wiki_bert_base
Finetuning on GLUE tasks
# bash scripts/run_glue_at_epoch.bash $GPUS $NumTrainEpochs $SNAP_PATH
bash scripts/run_glue_at_epoch.bash 0,1,2,3 3 snap/vlm/wiki103_bert_small_vokenmmd/checkpoint-epoch0019
Part of the code is built based on vokenization, huggingface transformers, and facebook faiss.