<a href="https://colab.research.google.com/github/toddlack/OpenVoice/blob/dev/demo_part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Voice Style Control Demo

In [None]:
# @title Clone repo for local use
#Clone a repo and cd into the new directory
content_root = "/content" # @param {"type":"string","placeholder":"content root"}
git_repo='https://github.com/toddlack/OpenVoice.git'  # @param {"type":"string","placeholder":"git repo url"}
git_branch='main'  # @param {"type":"string","placeholder":"git branch"}
app_root=git_repo.split('/')[-1].replace('.git', '')
assets_dir=f'{content_root}/{app_root}/assets'
%cd {content_root}
!git clone --single-branch -b {git_branch} {git_repo}
%cd {app_root}


In [None]:
!cd /content/OpenVoice && pip install -e .


In [5]:
import os
import zipfile
import requests

def download_and_extract(url, download_dir="/content", extract_dir="/content/OpenVoice"):
    """Downloads and extracts a zip file from a URL to the specified directories.

    Args:
        url (str): The URL of the zip file.
        download_dir (str, optional): The directory to download the zip file to. Defaults to "/content".
        extract_dir (str, optional): The directory to extract the zip file to. Defaults to "/content".
    """
    os.makedirs(download_dir, exist_ok=True)  # Ensure download directory exists
    os.makedirs(extract_dir, exist_ok=True)  # Ensure extract directory exists

    # Extract filename from URL
    filename = url.split("/")[-1]
    zip_file_path = os.path.join(download_dir, filename)

    # Download the zip file
    response = requests.get(url, stream=True)
    with open(zip_file_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

    # Extract the zip file
    with zipfile.ZipFile(zip_file_path, "r") as zip_ref:
        zip_ref.extractall(extract_dir)

    print(f"Downloaded {filename} to: {download_dir}")
    print(f"Extracted {filename} to: {extract_dir}")


# URLs of the zip files
urls = [
    "https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_1226.zip",
    "https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_v2_0417.zip",
]

# Download and extract each zip file
for url in urls:
    download_and_extract(url)

Downloaded checkpoints_1226.zip to: /content
Extracted checkpoints_1226.zip to: /content/OpenVoice
Downloaded checkpoints_v2_0417.zip to: /content
Extracted checkpoints_v2_0417.zip to: /content/OpenVoice


In [6]:
!pwd
!python openvoice/openvoice_app.py --share

/content/OpenVoice
RuntimeError: module was compiled against NumPy C-API version 0x10 (NumPy 1.23) but the running NumPy has C-API version 0xf. Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem.
  WeightNorm.apply(module, name, dim)
  checkpoint_dict = torch.load(ckpt_path, map_location=torch.device(self.device))
Loaded checkpoint 'checkpoints/base_speakers/EN/checkpoint.pth'
missing/unexpected keys: [] []
Loaded checkpoint 'checkpoints/base_speakers/ZH/checkpoint.pth'
missing/unexpected keys: [] []
Downloading (…)81_std1.81.model.pkl: 100% 10.0M/10.0M [00:00<00:00, 112MB/s]
  checkpoint = torch.load(resume_path, map_location=torch.device('cpu'))
Loaded checkpoint 'checkpoints/converter/checkpoint.pth'
missing/unexpected keys: [] []
  en_source_default_se = torch.load(f'{en_ckpt_base}/en_default_se.pth').to(device)
  en_s

In [None]:
# prompt: run python module openvoice_app.py

!python -m /content/OpenVoice/openvoice/openvoice_app.py --share


In [None]:
!pip install numpy==1.23 --upgrade
# !pip install .
!cd /content/OpenVoice && pip install -e .
!pip install --force-reinstall OpenVoice

In [None]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import BaseSpeakerTTS, ToneColorConverter

### Initialization

In [None]:
ckpt_base = 'checkpoints/base_speakers/EN'
ckpt_converter = 'checkpoints/converter'
device="cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs'

base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)
base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

### Obtain Tone Color Embedding

The `source_se` is the tone color embedding of the base speaker.
It is an average of multiple sentences generated by the base speaker. We directly provide the result here but
the readers feel free to extract `source_se` by themselves.

In [None]:
source_se = torch.load(f'{ckpt_base}/en_default_se.pth').to(device)

The `reference_speaker.mp3` below points to the short audio clip of the reference whose voice we want to clone. We provide an example here. If you use your own reference speakers, please **make sure each speaker has a unique filename.** The `se_extractor` will save the `targeted_se` using the filename of the audio and **will not automatically overwrite.**

In [None]:
reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)

### Inference

In [None]:
save_path = f'{output_dir}/output_en_default.wav'

# Run the base speaker tts
text = "This audio is generated by OpenVoice."
src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path,
    src_se=source_se,
    tgt_se=target_se,
    output_path=save_path,
    message=encode_message)

**Try with different styles and speed.** The style can be controlled by the `speaker` parameter in the `base_speaker_tts.tts` method. Available choices: friendly, cheerful, excited, sad, angry, terrified, shouting, whispering. Note that the tone color embedding need to be updated. The speed can be controlled by the `speed` parameter. Let's try whispering with speed 0.9.

In [None]:
source_se = torch.load(f'{ckpt_base}/en_style_se.pth').to(device)
save_path = f'{output_dir}/output_whispering.wav'

# Run the base speaker tts
text = "This audio is generated by OpenVoice."
src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='whispering', language='English', speed=0.9)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path,
    src_se=source_se,
    tgt_se=target_se,
    output_path=save_path,
    message=encode_message)

**Try with different languages.** OpenVoice can achieve multi-lingual voice cloning by simply replace the base speaker. We provide an example with a Chinese base speaker here and we encourage the readers to try `demo_part2.ipynb` for a detailed demo.

In [None]:

ckpt_base = 'checkpoints/base_speakers/ZH'
base_speaker_tts = BaseSpeakerTTS(f'{ckpt_base}/config.json', device=device)
base_speaker_tts.load_ckpt(f'{ckpt_base}/checkpoint.pth')

source_se = torch.load(f'{ckpt_base}/zh_default_se.pth').to(device)
save_path = f'{output_dir}/output_chinese.wav'

# Run the base speaker tts
text = "今天天气真好，我们一起出去吃饭吧。"
src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='Chinese', speed=1.0)

# Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
    audio_src_path=src_path,
    src_se=source_se,
    tgt_se=target_se,
    output_path=save_path,
    message=encode_message)

**Tech for good.** For people who will deploy OpenVoice for public usage: We offer you the option to add watermark to avoid potential misuse. Please see the ToneColorConverter class. **MyShell reserves the ability to detect whether an audio is generated by OpenVoice**, no matter whether the watermark is added or not.