minimal-rvc ONNX support LLVC

The LLVC repo provides a sleek minimal implementation of the RVC (Realtime Voice Conversion) v2 model, that can perform 10x realtime on a Colab Tesla T4, but 10x slower than realtime on Colab CPU. This fork aims to extend the same for ONNX support to squeeze more performance out of it

Setup

Tested on Ubuntu 22.04 WSL 2:

git clone https://github.com/tripathiarpan20/LLVC/tree/main
cd LLVC
virtualenv llvcenv
source llvcenv/bin/activate
pip install -r requirements.txt #Takes a while to complete
python download_models.py

#Dealing with some bs with ffmpeg: https://github.com/kkroening/ffmpeg-python/issues/174
sudo apt-get update
sudo apt-get install ffmpeg
pip uninstall ffmpeg 
pip uninstall ffmpeg-python
pip install ffmpeg-python

# Download any `.pth` model file tagged with 'RVC v2' from https://www.weights.gg/
# Paste the `model.pth` RVC v2 file downloaded in last step into the current folder (`LLVC`)

# Paste a sample input audio `.wav` file into the current folder (`LLVC`)

Vanilla Inference

python minimal_rvc/_infer_file.py --input_file libri_sample.wav --out_dir matpat_out --model_path model.pth

The above command would process libri_sample.wav with the RVC v2 model (model.pth) and create an out.wav file in the matpat_out folder

ONNX export

LLVC: Low-Latency Low-Resource Voice Conversion

This repository contains the code necessary to train Koe AI's LLVC models and to reproduce the LLVC paper.

LLVC paper: https://koe.ai/papers/llvc.pdf

LLVC samples: https://koeai.github.io/llvc-demo/

Windows executable: https://koe.ai/recast/download/

Koe AI homepage: https://koe.ai/

Setup

Create a Python environment with e.g. conda: conda create -n llvc python=3.11
Activate the new environment: conda activate llvc
Install torch and torchaudio from https://pytorch.org/get-started/locally/
Install requirements with pip install -r requirements.txt
Download models with python download_models.py
eval.py has requirements that conflict with requirements.txt, so before running this file, create a seperate new Python virtual environment with python 3.9 and run pip install -r eval_requirements.txt

You should now be able to run python infer.py and convert all of the files in test_wavs with the pretrained llvc checkpoint, with the resulting files saved to converted_out.

Inference

python infer.py -p my_checkpoint.pth -c my_config.pth -f input_file -o my_out_dir will convert a single audio file or folder of audio files using the given LLVC checkpoint and save the output to the folder my_out_dir. The -s argument simulate a streaming environment for conversion. The -n argument allows the user to specify the size of input audio chunks in streaming mode, trading increased latency for better RTF.

compare_infer.py allows you to reproduce our streaming no-f0 RVC and QuickVC conversions on input audio of your choice. By default, window_ms and extra_convert_size are set to the values used for no-f0 RVC conversion. See the linked paper for the QuickVC conversion parameters.

Training

Create a folder experiments/my_run containing a config.json (see experiments/llvc/config.json for an example)
Edit the config.json to reflect the location of your dataset and desired architectural modifications
python train.py -d experiments/my_run
The run will be logged to Tensorboard in the directory experiments/my_run/logs

Dataset

Datasets are comprised of a folder containing three subfolders: dev, train and val. Each of these folders contains audio files of the form PREFIX_original.wav, which are audio clips recorded by a variety of input speakers, and PREFIX_converted.wav, which are the original audio clips converted to a single target speaker. val contains clips from the same speakers as test. dev contains clips from different speakers than test.

To recreate the dataset that we use in our paper:

Download dev-clean.tar.gz and train-clean-360.tar.gz from https://www.openslr.org/12 and unzip to llvc/LibriSpeech

python -m minimal_rvc._infer_folder \
                                    --train_set_path "LibriSpeech/train-clean-360" \
                                    --dev_set_path "LibriSpeech/dev-clean" \
                                    --out_path "f_8312_ls360" \
                                    --flatten \
                                    --model_path "llvc_models/models/rvc/f_8312_32k-325.pth" \
                                    --model_name "f_8312" \
                                    --target_sr 16000 \
                                    --f0_method "rmvpe" \
                                    --val_percent 0.02 \
                                    --random_seed 42 \
                                    --f0_up_key 12

Evaluate results

Download test-clean.tar.gz from https://www.openslr.org/12
Use infer.py to convert the test-clean folder using the checkpoint that you want to evaluate
Activate the eval environment and run eval.py on your converted audio and directory of ground-truth audio files.

Credits

Many of the modules written in minimal_rvc/ are based on the following repositories:

Citation

If you find out work relevant to your research, please cite:

@misc{sadov2023lowlatency,
      title={Low-latency Real-time Voice Conversion on CPU}, 
      author={Konstantine Sadov and Matthew Hutter and Asara Near},
      year={2023},
      eprint={2311.00873},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
experiments		experiments
llvc_models		llvc_models
minimal_quickvc		minimal_quickvc
minimal_rvc		minimal_rvc
test_wavs		test_wavs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cached_convnet.py		cached_convnet.py
compare_infer.py		compare_infer.py
dataset.py		dataset.py
discriminators.py		discriminators.py
download_models.py		download_models.py
eval.py		eval.py
eval_requirements.txt		eval_requirements.txt
hfg_disc.py		hfg_disc.py
infer.py		infer.py
mel_processing.py		mel_processing.py
model.py		model.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

minimal-rvc ONNX support LLVC

Setup

Vanilla Inference

ONNX export

LLVC: Low-Latency Low-Resource Voice Conversion

Setup

Inference

Training

Dataset

Evaluate results

Credits

Citation

About

Uh oh!

Releases

Packages

Languages

License

tripathiarpan20/LLVC

Folders and files

Latest commit

History

Repository files navigation

minimal-rvc ONNX support LLVC

Setup

Vanilla Inference

ONNX export

LLVC: Low-Latency Low-Resource Voice Conversion

Setup

Inference

Training

Dataset

Evaluate results

Credits

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages