# 🗣️ Speech-AI-Forge Colab

👋本脚本基于 [Speech-AI-Forge](https://github.com/lenML/Speech-AI-Forge) 构建。如果此项目对你有帮助，欢迎到 github 为我们 star 支持！也欢迎提交 pr issues~

## 运行指南

1. 在菜单栏中选择 **代码执行程序**。
2. 点击 **全部运行**。

运行完成后，请在下方日志中找到如下信息：

```
Running on public URL: https://**.gradio.live
```

该链接即为您可以访问的公网地址。

> 注意：如果在安装包时提示需要重启，请选择 "否"。

## 环境部署

In [1]:
# 1. Clone the repository
!git clone https://github.com/lenML/Speech-AI-Forge

# 2. Change directory to the repository
%cd Speech-AI-Forge

# 3. Install ffmpeg and rubberband-cli
!apt-get update -y
!apt-get install -y ffmpeg rubberband-cli

# 4. Install PyTorch
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 5. Install dependencies
!pip install -r requirements.txt


Cloning into 'Speech-AI-Forge'...
remote: Enumerating objects: 5996, done.[K
remote: Counting objects: 100% (452/452), done.[K
remote: Compressing objects: 100% (205/205), done.[K
remote: Total 5996 (delta 274), reused 303 (delta 246), pack-reused 5544 (from 1)[K
Receiving objects: 100% (5996/5996), 9.67 MiB | 19.42 MiB/s, done.
Resolving deltas: 100% (4100/4100), done.
/content/Speech-AI-Forge
Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [66.7 kB]
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubu

## 下载模型

In [2]:
# @markdown ## 模型下载说明
# @markdown 大部分模型的大小接近 2GB，请确保有足够的存储空间和网络带宽。  <br/>
# @markdown **注意**：至少必须选择一个 TTS 模型。如果没有选择，将默认下载 ChatTTS。

# @markdown ### TTS 模型
# @markdown ChatTTS: [GitHub](https://github.com/2noise/ChatTTS) - A generative speech model for daily dialogue.
download_chattts = True  # @param {"type":"boolean", "placeholder":"下载 ChatTTS 模型"}
# @markdown F5-TTS: [GitHub](https://github.com/SWivid/F5-TTS) - A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
download_f5_tts = False  # @param {"type":"boolean"}
# @markdown CosyVoice: [GitHub](https://github.com/FunAudioLLM/CosyVoice) - Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
download_cosyvoice = False  # @param {"type":"boolean"}
# @markdown FireRedTTS: [GitHub](https://github.com/FireRedTeam/FireRedTTS) - An Open-Sourced LLM-empowered Foundation TTS System
download_fire_red_tts = False  # @param {"type":"boolean"}
# @markdown FishSpeech: [GitHub](https://github.com/fishaudio/fish-speech) - Brand new TTS solution
download_fish_speech = False  # @param {"type":"boolean"}

# @markdown ### ASR 模型
# @markdown Whisper: [GitHub](https://github.com/openai/whisper) - Robust Speech Recognition via Large-Scale Weak Supervision
download_whisper = True  # @param {"type":"boolean"}

# @markdown ### Clone Voice 模型
# @markdown OpenVoice: [GitHub](https://github.com/myshell-ai/OpenVoice) - Instant voice cloning by MIT and MyShell.
download_open_voice = False  # @param {"type":"boolean"}

# @markdown ### 增强模型
# @markdown resemble-enhance: [GitHub](https://github.com/resemble-ai/resemble-enhance) - AI powered speech denoising and enhancement
download_enhancer = True  # @param {"type":"boolean"}

def download_model(command):
    print(f"Executing: {command}")
    !{command}

# 检查是否至少选择了一个 TTS 模型
if not any([download_chattts, download_fish_speech, download_cosyvoice, download_fire_red_tts, download_f5_tts]):
    print("未选择任何 TTS 模型，默认下载 ChatTTS...")
    download_chattts = True

# TTS 模型下载
if download_chattts:
    download_model("python -m scripts.dl_chattts --source huggingface")

if download_fish_speech:
    download_model("python -m scripts.downloader.fish_speech_1_2sft --source huggingface")

if download_cosyvoice:
    download_model("python -m scripts.dl_cosyvoice_instruct --source huggingface")

if download_fire_red_tts:
    download_model("python -m scripts.downloader.fire_red_tts --source huggingface")

if download_f5_tts:
    download_model("python -m scripts.downloader.f5_tts --source huggingface")
    download_model("python -m scripts.downloader.vocos_mel_24khz --source huggingface")

# ASR 模型下载
if download_whisper:
    download_model("python -m scripts.downloader.faster_whisper --source huggingface")

# Clone Voice 模型下载
if download_open_voice:
    download_model("python -m scripts.downloader.open_voice --source huggingface")

# 增强模型下载
if download_enhancer:
    download_model("python -m scripts.dl_enhance --source huggingface")

print("所有选定的模型已下载完成！")


Executing: python -m scripts.dl_chattts --source huggingface
For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder.
Fetching 23 files:   0% 0/23 [00:00<?, ?it/s]
Decoder.safetensors:   0% 0.00/104M [00:00<?, ?B/s][A

Embed.safetensors:   0% 0.00/146M [00:00<?, ?B/s][A[A


DVAE.safetensors:   0% 0.00/60.4M [00:00<?, ?B/s][A[A[A



DVAE.pt:   0% 0.00/27.7M [00:00<?, ?B/s][A[A[A[A




README.md: 100% 1.93k/1.93k [00:00<00:00, 12.4MB/s]





Decoder.pt:   0% 0.00/104M [00:00<?, ?B/s][A[A[A[A[A





.gitattributes: 100% 1.52k/1.52k [00:00<00:00, 12.0MB/s]
Fetching 23 files:   4% 1/23 [00:00<00:07,  2.97it/s]





DVAE_full.pt:   0% 0.00/60.4M [00:00<?, ?B/s][A[A[A[A[A[A






GPT.pt:   0% 0.00/901M [00:00<?, ?B/s][A[A[A[A[A[A[A
Decoder.safetensors:  10% 10.5M/104M [00:00<00:02, 40.4MB/s][A

Embed.safetensors:   7% 10.5M/146M [00:00<00:03, 40.0MB/s][A[A


DVAE.safetensors:  17% 10.5M/60.4M

## 运行 WebUI

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:18:23_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0


In [4]:
!nvidia-smi

Sun Feb  9 07:11:00 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   36C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [8]:
pip uninstall torchaudio

Found existing installation: torchaudio 2.5.1+cu124
Uninstalling torchaudio-2.5.1+cu124:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/torchaudio-2.5.1+cu124.dist-info/*
    /usr/local/lib/python3.11/dist-packages/torchaudio/*
    /usr/local/lib/python3.11/dist-packages/torio/*
Proceed (Y/n)? y
  Successfully uninstalled torchaudio-2.5.1+cu124


In [9]:
pip install torchaudio -f https://download.pytorch.org/whl/torch_stable.html

y
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torchaudio
  Downloading torchaudio-2.6.0-cp311-cp311-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting torch==2.6.0 (from torchaudio)
  Downloading torch-2.6.0-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch==2.6.0->torchaudio)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch==2.6.0->torchaudio)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch==2.6.0->torchaudio)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch==2.6.0->torchaudio)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nv

In [None]:
!python webui.py --share --language=zh-CN

2025-02-09 07:21:16,129 - numexpr.utils - INFO - NumExpr defaulting to 2 threads.
2025-02-09 07:21:35,247 - datasets - INFO - PyTorch version 2.6.0 available.
2025-02-09 07:21:35,264 - datasets - INFO - Polars version 1.9.0 available.
2025-02-09 07:21:35,265 - datasets - INFO - Duckdb version 1.1.3 available.
2025-02-09 07:21:35,266 - datasets - INFO - TensorFlow version 2.18.0 available.
2025-02-09 07:21:35,267 - datasets - INFO - JAX version 0.4.33 available.
2025-02-09 07:21:37.399117: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739085697.699742    6130 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739085697.776866    6130 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one