<a href="https://colab.research.google.com/github/toddlack/OpenVoice/blob/dev/cross_lingual_voice_clone.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Cross-Lingual Voice Clone Demo

In [1]:
# prompt: mount google drive

from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# @title Clone repo for local use
#Clone a repo and cd into the new directory
content_root = "/content" # @param {"type":"string","placeholder":"content root"}
git_repo='https://github.com/toddlack/OpenVoice.git'  # @param {"type":"string","placeholder":"git repo url"}
git_branch='dev'  # @param {"type":"string","placeholder":"git branch"}
app_root=git_repo.split('/')[-1].replace('.git', '')
assets_dir=f'{content_root}/{app_root}/assets'
%cd {content_root}
!git clone --single-branch -b {git_branch} {git_repo}
%cd {app_root}


/content
fatal: destination path 'OpenVoice' already exists and is not an empty directory.
/content/OpenVoice


In [3]:
!pip install python-dotenv
!cd /content/OpenVoice && pip install -e .
# !pip install --upgrade websockets>=13.0,<15.0
# !pip install --upgrade google-genai
# !pip install -q gradio==3.50.2 langid faster-whisper whisper-timestamped unidecode eng-to-ipa pypinyin cn2an


Obtaining file:///content/OpenVoice
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: MyShell-OpenVoice
  Attempting uninstall: MyShell-OpenVoice
    Found existing installation: MyShell-OpenVoice 0.0.0
    Uninstalling MyShell-OpenVoice-0.0.0:
      Successfully uninstalled MyShell-OpenVoice-0.0.0
  Running setup.py develop for MyShell-OpenVoice
Successfully installed MyShell-OpenVoice-0.0.0


In [4]:
import os
import zipfile
import requests

def download_and_extract(url, download_dir="/content", extract_dir="/content/OpenVoice"):
    """Downloads and extracts a zip file from a URL to the specified directories.

    Args:
        url (str): The URL of the zip file.
        download_dir (str, optional): The directory to download the zip file to. Defaults to "/content".
        extract_dir (str, optional): The directory to extract the zip file to. Defaults to "/content".
    """
    os.makedirs(download_dir, exist_ok=True)  # Ensure download directory exists
    os.makedirs(extract_dir, exist_ok=True)  # Ensure extract directory exists

    # Extract filename from URL
    filename = url.split("/")[-1]
    zip_file_path = os.path.join(download_dir, filename)

    # Download the zip file
    response = requests.get(url, stream=True)
    with open(zip_file_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

    # Extract the zip file
    with zipfile.ZipFile(zip_file_path, "r") as zip_ref:
        zip_ref.extractall(extract_dir)

    print(f"Downloaded {filename} to: {download_dir}")
    print(f"Extracted {filename} to: {extract_dir}")


# URLs of the zip files
urls = [
    "https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_1226.zip",
    "https://myshell-public-repo-host.s3.amazonaws.com/openvoice/checkpoints_v2_0417.zip",
]

# Download and extract each zip file
for url in urls:
    download_and_extract(url)

Downloaded checkpoints_1226.zip to: /content
Extracted checkpoints_1226.zip to: /content/OpenVoice
Downloaded checkpoints_v2_0417.zip to: /content
Extracted checkpoints_v2_0417.zip to: /content/OpenVoice


In [5]:
import os
import torch
from openvoice import se_extractor
from openvoice.api import ToneColorConverter

RuntimeError: module was compiled against NumPy C-API version 0x10 (NumPy 1.23) but the running NumPy has C-API version 0xf. Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem.

Importing the dtw module. When using in academic works please cite:
  T. Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package.
  J. Stat. Soft., doi:10.18637/jss.v031.i07.



### Initialization

In [6]:
ckpt_converter = 'checkpoints/converter'
device="cuda:0" if torch.cuda.is_available() else "cpu"
output_dir = 'outputs'

tone_color_converter = ToneColorConverter(f'{ckpt_converter}/config.json', device=device)
tone_color_converter.load_ckpt(f'{ckpt_converter}/checkpoint.pth')

os.makedirs(output_dir, exist_ok=True)

  WeightNorm.apply(module, name, dim)


Loaded checkpoint 'checkpoints/converter/checkpoint.pth'
missing/unexpected keys: [] []


  checkpoint = torch.load(resume_path, map_location=torch.device('cpu'))
  checkpoint_dict = torch.load(ckpt_path, map_location=torch.device(self.device))


In this demo, we will use OpenAI TTS as the base speaker to produce multi-lingual speech audio. The users can flexibly change the base speaker according to their own needs. Please create a file named `.env` and place OpenAI key as `OPENAI_API_KEY=xxx`. We have also provided a Chinese base speaker model (see `demo_part1.ipynb`).

In [8]:
from openai import OpenAI
from dotenv import load_dotenv

# Please create a file named .env and place your
# OpenAI key as OPENAI_API_KEY=xxx
load_dotenv()

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="This audio will be used to extract the base speaker tone color embedding. " + \
        "Typically a very short audio should be sufficient, but increasing the audio " + \
        "length will also improve the output audio quality."
)

response.stream_to_file(f"{output_dir}/openai_source_output.mp3")

  response.stream_to_file(f"{output_dir}/openai_source_output.mp3")


### Obtain Tone Color Embedding

The `source_se` is the tone color embedding of the base speaker.
It is an average for multiple sentences with multiple emotions
of the base speaker. We directly provide the result here but
the readers feel free to extract `source_se` by themselves.

In [9]:
base_speaker = f"{output_dir}/openai_source_output.mp3"
source_se, audio_name = se_extractor.get_se(base_speaker, tone_color_converter, vad=True)

reference_speaker = 'resources/example_reference.mp3' # This is the voice you want to clone
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=True)

OpenVoice version: v1


Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip


[(0.0, 12.192)]
after vad: dur = 12.192


Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:873.)
  return _VF.stft(  # type: ignore[attr-defined]


OpenVoice version: v1
[(0.0, 58.8188125)]
after vad: dur = 58.81798185941043


### Inference

In [10]:
# Run the base speaker tts
text = [
    "MyShell is a decentralized and comprehensive platform for discovering, creating, and staking AI-native apps.",
    "MyShell es una plataforma descentralizada y completa para descubrir, crear y apostar por aplicaciones nativas de IA.",
    "MyShell est une plateforme décentralisée et complète pour découvrir, créer et miser sur des applications natives d'IA.",
    "MyShell ist eine dezentralisierte und umfassende Plattform zum Entdecken, Erstellen und Staken von KI-nativen Apps.",
    "MyShell è una piattaforma decentralizzata e completa per scoprire, creare e scommettere su app native di intelligenza artificiale.",
    "MyShellは、AIネイティブアプリの発見、作成、およびステーキングのための分散型かつ包括的なプラットフォームです。",
    "MyShell — это децентрализованная и всеобъемлющая платформа для обнаружения, создания и стейкинга AI-ориентированных приложений.",
    "MyShell هي منصة لامركزية وشاملة لاكتشاف وإنشاء ورهان تطبيقات الذكاء الاصطناعي الأصلية.",
    "MyShell是一个去中心化且全面的平台，用于发现、创建和投资AI原生应用程序。",
    "MyShell एक विकेंद्रीकृत और व्यापक मंच है, जो AI-मूल ऐप्स की खोज, सृजन और स्टेकिंग के लिए है।",
    "MyShell é uma plataforma descentralizada e abrangente para descobrir, criar e apostar em aplicativos nativos de IA."
]
src_path = f'{output_dir}/tmp.wav'

for i, t in enumerate(text):

    response = client.audio.speech.create(
        model="tts-1",
        voice="nova",
        input=t,
    )

    response.stream_to_file(src_path)

    save_path = f'{output_dir}/output_crosslingual_{i}.wav'

    # Run the tone color converter
    encode_message = "@MyShell"
    tone_color_converter.convert(
        audio_src_path=src_path,
        src_se=source_se,
        tgt_se=target_se,
        output_path=save_path,
        message=encode_message)

  response.stream_to_file(src_path)
