Skip to content

zxxwxyyy/sonique

Repository files navigation

SONIQUE: Efficient Video Background Music Generation

A Multi-model tool that designed to help video editors generate background music on video & tv series' transition scene. In addition, it can be used by music composers to generate conditioned music base on instruments, genres, tempo rate, and even specific melodies. Check out the demo page for more details.

Performance: Executing the entire process on an NVIDIA 4090 graphics card is accomplished in under a minute. This model requires less than 14 GB GPU memory. When operated on an NVIDIA 3070 Laptop GPU with 8 GB of memory, the process duration extends to 360 seconds.

Table of contents

Install

  1. Clone this repo
  2. Create a conda environment:
conda env create -f environment.yml
  1. Activate the environment, navigate to the root, and run:
pip install .
  1. After installation, you may run the demo with UI interface:
python run_gradio.py --model-config best_model.json --ckpt-path ./ckpts/stable_ep=220.ckpt
  1. To run the demo without interface:
python inference.py --model-config best_model.json --ckpt-path ./ckpts/stable_ep=220.ckpt

Additional inference flags:

  • --use-video:
    • Use input video as condition
    • Default: False
  • --input-video:
    • Path to input video
    • Default: None
  • --use-init:
    • Use melody condition
    • Default: False
  • init-audio:
    • Melody condition path
    • Default: None
  • --llms:
    • Selection of the name of Large Language Model to extract video description to tags
    • Default: Mistral 7B
  • --low-resource:
    • If set to True, models from video -> tags stage will run in 4-bit. Only set it to False if you have enough GPU memory.
    • Default: True
  • --instruments:
    • Input instrument condition
    • Default: None
  • --genres:
    • Input genre condition
    • Default: None
  • --tempo-rate:
    • Input tempo rate condition
    • Default: None

Model Checkpoint

Pretrained model can be download here. Please download, unzip, and save in the root of this project.

sonique/
├── ckpts/
│   ├── .../
├── sonique/
├── run_gradio.py/
...

Data Collection & Preprocessing

See here for details.

Video-to-music-generation

SONIQUE is a multi-model tool leveraging on stable_audio_tools, Video_LLaMA, and popular LLMs from Huggingface.

t2i

Video description is extracted from the input video. I use Video_LLaMA to extract video description from the video. Then it will be pass to LLMs to converted them into tags that describe the background music. For the LLMs currently support:

  • Mistrial 7B (default)
  • Qwen 14B
  • LLaMA3 8B(You will need to get authenticate from Meta)
  • LLaMA2 13B (You will need to get authenticate from Meta)
  • Gemma 7B (You will need to get authenticate from Google)

Text-to-music-generation

Instead of using video, you may also mannually enter instruments, genres and tempo rate to generate music. You may upload melody as condition(inpaint) in use melody condition section. You may also tune the generation parameters and sampler parameters.

Citation

Please consider citing the project if it helps your research:

@misc{zhang2024sonique,
  title={SONIQUE: Efficient Video Background Music Generation},
  author={Zhang, Liqian},
  year={2024},
  publisher={GitHub},
  journal={GitHub repository},
  howpublished={https://github.com/zxxwxyyy/sonique},
}

About

Efficient Video Background Music Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages