CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Installation

Python >= 3.6
PyTorch version >= 1.7.0
For pre-traininng, please prepare the GPU with at least 16GB Memory (V100, RTX3080Ti)
To develop locally, please follow th instruction below:

    git clone https://github.com/Ydkwim/CTAL.git
    cd CTAL
    pip install -r requirements.txt

Preprocess

Semantic Feature: please refer to the jupyter notebook: nontebook/preprocess_text.ipynb
Acoustic Feature: please refer to the jupyter notebook: notebook/preprocess_audio.ipynb

Upstream Pre-training

After you prepare both the acoustic and semantic features, you can start to pre-training the model with executing following shell command:

    python run_m2pretrainn.py --run transformer \
    --config path/to/your/config.yaml --name model_name

The pre-trained model will be saved to the path: result/transformer/model_name. For the convenience of all the users, we make our pre-trained upstream model available:

Downstream Finetune

It is very convient to use our pre-trained upstream model for different types of audio-and-language downstream tasks, including Sentiment Analysis, Emotion Recognition, Speaker Verification, etc. We prepare a sample fine-tuning script m2p_finetune.py here for everyone. To start the fine-tuning process, you can run the following commands:

Sentiment Regression:

    python m2p_finetune.py --config your/config/path \
    --task_name sentiment --epochs 10 --save_path your/save/path

Emotionn Classification:

    python m2p_finetune.py --config your/config/path \
    --task_name emotion --epochs 10 --save_path your/save/path

Speaker Verification:

    python m2p_finetune.py --config your/config/path \
    --task_name verification --epochs 10 --save_path your/save/path

Contact

If you have any problem to the project, please feel free to report them as issues.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
code		code
config		config
preprocess		preprocess
tokenizer/libri-roberta_train-960		tokenizer/libri-roberta_train-960
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Installation

Preprocess

Upstream Pre-training

Downstream Finetune

Contact

About

Releases

Packages

Contributors 2

Languages

License

Ydkwim/CTAL

Folders and files

Latest commit

History

Repository files navigation

CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations

Installation

Preprocess

Upstream Pre-training

Downstream Finetune

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages