Skip to content

wangyang199609/MuSE

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuSE

A PyTorch implementation of the Muse: Multi-modal target speaker extraction with visual cues

Project Structure

/data/voxceleb2-800: Scripts to preprocess the voxceleb2 datasets.

/pretrain_networks: The visual front-end network

/src: The training scripts

Pre-trained Weights

Download the pre-trained weights for the Visual Frontend and place it in the ./pretrain_networks folder using the following command:

wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1k0Zk90ASft89-xAEUbu5CmZWih_u_lRN' -O visual_frontend.pt

References

  1. The pre-trained weights of the Visual Frontend have been obtained from Afouras T. and Chung J, Deep Audio-Visual Speech Recognition GitHub repository.

  2. The model is adapted from Conv-TasNet GitHub repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.0%
  • Shell 4.0%