## Before training

This program saves the last 3 generations of models to Google Drive. Since 1 generation of models is >1GB, you should have at least 3GB of free space in Google Drive. If you do not have such free space, it is recommended to create another Google Account.

Training requires >10GB VRAM. (T4 should be enough) Inference does not require such a lot of VRAM.

## Installation

### Check GPU

In [None]:
!nvidia-smi

### Install dependencies

In [None]:
!python -m pip install -U pip wheel
%pip install -U ipython lxml

# Branch (for development)
BRANCH = "none"
if BRANCH == "none":
    %pip install -U so-vits-svc-fork
else:
    %pip install -U git+https://github.com/34j/so-vits-svc-fork.git@{BRANCH}

## Training
In paperspace, you do not have to rerun those commands except the last one.

### Clean directories

In [None]:
#!rm -r "dataset_raw"
#!rm -r "dataset/44k"

### Download dataset  (Tsukuyomi-chan JVS)
You can download this dataset if you don't have your own dataset.
Make sure you agree to the license when using this dataset.
https://tyc.rei-yumesaki.net/material/corpus/#toc6

In [None]:
# !wget https://tyc.rei-yumesaki.net/files/sozai-tyc-corpus1.zip
# !unzip sozai-tyc-corpus1.zip
# !mv "/つくよみちゃんコーパス Vol.1 声優統計コーパス（JVSコーパス準拠）/おまけ：WAV（+12dB増幅＆高音域削減）/WAV（+12dB増幅＆高音域削減）" "dataset_raw/tsukuyomi"

## Preprocessing

In [None]:
!svc pre-resample

In [None]:
!svc pre-config

In [None]:
F0_METHOD = "dio" #@param ["crepe", "crepe-tiny", "parselmouth", "dio", "harvest"]
!svc pre-hubert -fm {F0_METHOD} -n 4

## Training


To open tensorboard:

1. open terminal and run the following command.
    ```
    tensorboard --logdir=logs/44k --bind_all
    ```
    Note that `--bind_all` is required to access tensorboard from outside the VM.

2. open the following URL in your browser.
    ```
    https://tensorboard-NOTEBOOKID.clg07azjl.paperspacegradient.com
    ```
    where `NOTEBOOKID` is displayed in the terminal. (root@{NOTEBOOKID}:/notebooks#)

See the [documentation](https://docs.paperspace.com/gradient/notebooks/tensorboard/) for more information.

In [None]:
!svc train

## Training Cluster model

In [None]:
!svc train-cluster --output-path drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt

## Inference

Get the author's voice as a source

In [None]:
import random
NAME = str(random.randint(1, 100))
!wget -N "https://github.com/34j/34j/raw/main/jvs-parallel100/{NAME}.wav"
from IPython.display import Audio, display
display(Audio(f"{NAME}.wav"))

Use trained model

In [None]:
!svc infer {NAME}.wav -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json
display(Audio(f"{NAME}.out.wav", autoplay=True))

Use trained model (with cluster)

In [None]:
!svc infer {NAME}.wav -s speaker -r 0.1 -m drive/MyDrive/so-vits-svc-fork/logs/44k/ -c drive/MyDrive/so-vits-svc-fork/logs/44k/config.json -k drive/MyDrive/so-vits-svc-fork/logs/44k/kmeans.pt
display(Audio(f"{NAME}.out.wav", autoplay=True))

### Pretrained models

https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/tree/main

In [None]:
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/G_riri_220.pth"
!wget -N "https://huggingface.co/TachibanaKimika/so-vits-svc-4.0-models/resolve/main/riri/config.json"

In [None]:
!svc infer {NAME}.wav -c config.json -m G_riri_220.pth

In [None]:

display(Audio(f"{NAME}.out.wav", autoplay=True))

https://huggingface.co/therealvul/so-vits-svc-4.0/tree/main

In [None]:
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/G_166400.pth"
!wget -N "https://huggingface.co/therealvul/so-vits-svc-4.0/resolve/main/Pinkie%20(speaking%20sep)/config.json"

In [None]:
!svc infer {NAME}.wav --speaker "Pinkie {neutral}" -c config.json -m G_166400.pth
display(Audio(f"{NAME}.out.wav", autoplay=True))