Skip to content

zinengtang/DeCEMBERT

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

DECEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization

Implementation of NAACL2021 paper: DECEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization by *Zineng Tang, *Jie Lei, Mohit Bansal

Setup

# Create python environment (optional)
conda create -n decembert python=3.7

# Install python dependencies
pip install -r requirements.txt

To speed up the training, mixed precision is recommended.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Running

Running pre-training command

bash scripts/pretrain.sh 0,1,2,3

Video Features Extraction Code

The feature extraction scripts is provided in the feature_extractor folder.

We extract our 2D-level video features with ResNet152 Github Link: torchvision

We extract our 3D-level video features with 3D-ResNext Github Link: 3D-RexNext

Dense Captions Extraction Code

Following the implementation of dense captioning aided pre-training, we pre-extract dense captions with the following code.

Original Github Link: Dense Captioning with Joint Inference and Visual Context (pytorch reproduced)

Important todos are to change the framerate sampling in code implementation according to dfferent video types.

Dataset Links

Pre-training Dataset

Howto100m

Downstream Dataset

MSRVTT

MSRVTT-QA

Youcook2

(TODO: add downstream tasks)

Reference

@inproceedings{tang2021decembert,
  title={DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization},
  author={Tang, Zineng and Lei, Jie and Bansal, Mohit},
  booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages={2415--2426},
  year={2021}
}

Acknowledgement

Part of the code is built based on huggingface transformers and facebook faiss and TVCaption.

About

Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published