Skip to content

ZFTurbo/MVSEP-CDX23-Cinematic-Sound-Demixing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MVSEP-CDX23-Cinematic-Sound-Demixing

Model for Sound demixing challenge 2023: Cinematic Sound Demixing Track - CDX'23. Model performs separation of music into 3 stems "dialog (speech)", "effect (sfx)", "music". Model was trained on DnR dataset. It based on Demucs4. Test set in CDX23 contest was very different from DnR train data. So we released best models with best metrics for DnR test set.

Usage

    python inference.py --input_audio mixture1.wav mixture2.wav --output_folder ./results/

With this command audios with names "mixture1.wav" and "mixture2.wav" will be processed and results will be stored in ./results/ folder in WAV format.

  • Note: for slightly better quality of results use --high_quality parameter. It will be ~3 times slower.

Quality metrics

Quality were measured on DnR test set

Algorithm SDR dialog SDR effect SDR music SDR mean
Demucs HT 4 (single model) 14.18 7.92 6.75 9.62
Demucs HT 4 (3 checkpoints ensemble) 14.68 8.48 7.30 10.16
  • Note 1: SDR - signal to distortion ratio. Larger is better.
  • Note 2: Music stem in DnR dataset can contain vocals

Citation

@misc{solovyev2023benchmarks,
      title={Benchmarks and leaderboards for sound demixing tasks}, 
      author={Roman Solovyev and Alexander Stempkovskiy and Tatiana Habruseva},
      year={2023},
      eprint={2305.07489},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

About

Model for CDX23 (Cinematic Sound Demixing) contest

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages