# üé§ Voice Conversion with SoundLab

This notebook demonstrates text-to-speech (TTS) generation with XTTS-v2 and voice conversion with RVC.

**What you'll learn:**
- Generating speech from text using XTTS-v2
- Cloning voices from reference audio
- Converting voices using RVC (Retrieval-based Voice Conversion)
- Combining TTS + voice conversion for custom voices

## Setup

Install SoundLab with voice dependencies and import the modules.

In [None]:
# Install SoundLab (uncomment if running in Colab)
# !pip install soundlab[voice]

from pathlib import Path

from soundlab.io import load_audio, save_audio
from soundlab.voice import RVCConverter, SVCConfig, TTSConfig, XTTSGenerator

print("‚úÖ SoundLab voice modules imported!")

## Part 1: Text-to-Speech with XTTS-v2

XTTS-v2 is a powerful TTS model that can clone voices from just a few seconds of reference audio.

### 1.1 Basic TTS Generation

Generate speech without voice cloning using the default voice.

In [None]:
# @title TTS Configuration
# @markdown Configure the text-to-speech settings.

TEXT = "Hello! This is a demonstration of SoundLab's text-to-speech capabilities."  # @param {type: "string"}
LANGUAGE = "en"  # @param ["en", "es", "fr", "de", "it", "pt", "pl", "tr", "ru", "nl", "cs", "ar", "zh", "ja", "ko"]
SPEED = 1.0  # @param {type: "slider", min: 0.5, max: 2.0, step: 0.1}

# Create TTS config
tts_config = TTSConfig(
    language=LANGUAGE,
    speed=SPEED,
)

print("üéõÔ∏è TTS Configuration:")
print(f"   Language: {tts_config.language}")
print(f"   Speed: {tts_config.speed}x")
print(f"   Text: {TEXT[:50]}...")

In [None]:
# Generate speech
tts = XTTSGenerator(tts_config)

print("üéôÔ∏è Generating speech...")
result = tts.generate(TEXT)

print("\n‚úÖ Generation complete!")
print(f"   Duration: {result.audio.duration:.2f}s")
print(f"   Sample rate: {result.audio.sample_rate} Hz")

In [None]:
# Preview the generated speech
from IPython.display import Audio, display

print("üéß Generated Speech:")
display(Audio(result.audio.samples.T, rate=result.audio.sample_rate))

### 1.2 Voice Cloning

Clone a voice from reference audio. For best results:
- Use 5-15 seconds of clean speech
- No background noise or music
- Clear, natural speaking voice

In [None]:
# @title Voice Cloning Settings
# @markdown Provide reference audio for voice cloning.

REFERENCE_AUDIO = "speaker_reference.wav"  # @param {type: "string"}
CLONE_TEXT = "This speech should sound like the reference speaker."  # @param {type: "string"}

# For Colab: uncomment to upload reference audio
# from google.colab import files
# uploaded = files.upload()
# REFERENCE_AUDIO = list(uploaded.keys())[0]

In [None]:
# Load reference audio
try:
    reference = load_audio(REFERENCE_AUDIO)
    print(f"üìÅ Reference loaded: {REFERENCE_AUDIO}")
    print(f"   Duration: {reference.duration:.2f}s")

    # Preview reference
    print("\nüéß Reference Voice:")
    display(Audio(reference.samples.T, rate=reference.sample_rate))
except FileNotFoundError:
    print("‚ö†Ô∏è Reference file not found. Using default voice.")
    reference = None

In [None]:
# Generate with cloned voice
if reference is not None:
    print("üéôÔ∏è Generating with cloned voice...")
    cloned_result = tts.generate(CLONE_TEXT, speaker_reference=reference)

    print("\n‚úÖ Cloned speech generated!")
    print(f"   Duration: {cloned_result.audio.duration:.2f}s")

    print("\nüéß Cloned Voice Output:")
    display(Audio(cloned_result.audio.samples.T, rate=cloned_result.audio.sample_rate))
else:
    print("‚ö†Ô∏è Skipping voice cloning (no reference audio)")

## Part 2: Voice Conversion with RVC

RVC (Retrieval-based Voice Conversion) can convert any voice to a target voice model.

**Note:** RVC requires separate model files. See the [RVC project](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) for training custom models.

### 2.1 Setup RVC Models

Download or specify paths to your RVC model files.

In [None]:
# @title RVC Model Configuration
# @markdown Specify paths to your RVC model files.

RVC_MODEL_PATH = "models/rvc/my_voice.pth"  # @param {type: "string"}
RVC_INDEX_PATH = "models/rvc/my_voice.index"  # @param {type: "string"}

# Check if models exist
model_exists = Path(RVC_MODEL_PATH).exists()
index_exists = Path(RVC_INDEX_PATH).exists()

if model_exists and index_exists:
    print("‚úÖ RVC models found!")
    print(f"   Model: {RVC_MODEL_PATH}")
    print(f"   Index: {RVC_INDEX_PATH}")
else:
    print("‚ö†Ô∏è RVC models not found.")
    print("")
    print("To use RVC, you need:")
    print("1. A trained .pth model file")
    print("2. An optional .index file for better quality")
    print("")
    print("See: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI")

### 2.2 Configure RVC

In [None]:
# @title RVC Settings
# @markdown Tune the voice conversion parameters.

PITCH_SHIFT = 0  # @param {type: "slider", min: -12, max: 12, step: 1}
INDEX_RATE = 0.75  # @param {type: "slider", min: 0.0, max: 1.0, step: 0.05}
FILTER_RADIUS = 3  # @param {type: "slider", min: 0, max: 7, step: 1}
PROTECT = 0.33  # @param {type: "slider", min: 0.0, max: 0.5, step: 0.01}

# Create SVC config
svc_config = SVCConfig(
    model_path=RVC_MODEL_PATH,
    index_path=RVC_INDEX_PATH if index_exists else None,
    pitch_shift=PITCH_SHIFT,
    index_rate=INDEX_RATE,
    filter_radius=FILTER_RADIUS,
    protect=PROTECT,
)

print("üéõÔ∏è RVC Configuration:")
print(f"   Pitch shift: {svc_config.pitch_shift} semitones")
print(f"   Index rate: {svc_config.index_rate}")
print(f"   Filter radius: {svc_config.filter_radius}")
print(f"   Protect: {svc_config.protect}")

### 2.3 Convert Voice

In [None]:
# @title Input Audio for Conversion
# @markdown Provide audio to convert.

INPUT_AUDIO = "input_voice.wav"  # @param {type: "string"}

# For Colab: uncomment to upload
# from google.colab import files
# uploaded = files.upload()
# INPUT_AUDIO = list(uploaded.keys())[0]

In [None]:
if model_exists:
    # Load input audio
    try:
        input_audio = load_audio(INPUT_AUDIO)
        print(f"üìÅ Input loaded: {INPUT_AUDIO}")
        print(f"   Duration: {input_audio.duration:.2f}s")

        # Preview input
        print("\nüéß Original Voice:")
        display(Audio(input_audio.samples.T, rate=input_audio.sample_rate))

        # Create converter and convert
        converter = RVCConverter(svc_config)

        print("\nüéôÔ∏è Converting voice...")
        converted = converter.convert(input_audio)

        print("\n‚úÖ Conversion complete!")
        print(f"   Duration: {converted.audio.duration:.2f}s")

        print("\nüéß Converted Voice:")
        display(Audio(converted.audio.samples.T, rate=converted.audio.sample_rate))

    except FileNotFoundError:
        print(f"‚ö†Ô∏è Input file not found: {INPUT_AUDIO}")
else:
    print("‚ö†Ô∏è Skipping RVC conversion (model not found)")

## Part 3: TTS + Voice Conversion Pipeline

Combine TTS and RVC to generate speech in any custom voice.

In [None]:
# @title Full Pipeline: Text ‚Üí TTS ‚Üí RVC ‚Üí Output
# @markdown Generate text in a custom RVC voice.

PIPELINE_TEXT = "This text will be spoken in the custom RVC voice model."  # @param {type: "string"}

if model_exists:
    # Step 1: Generate TTS
    print("Step 1: Generating TTS...")
    tts_output = tts.generate(PIPELINE_TEXT)

    print("üéß TTS Output (before conversion):")
    display(Audio(tts_output.audio.samples.T, rate=tts_output.audio.sample_rate))

    # Step 2: Convert with RVC
    print("\nStep 2: Converting with RVC...")
    final_output = converter.convert(tts_output.audio)

    print("\nüéß Final Output (TTS + RVC):")
    display(Audio(final_output.audio.samples.T, rate=final_output.audio.sample_rate))

    print("\n‚úÖ Pipeline complete!")
else:
    print("‚ö†Ô∏è Pipeline requires RVC model. See Part 2 for setup.")

## Part 4: Save Results

In [None]:
# @title Export Generated Audio
# @markdown Save your generated audio to disk.

OUTPUT_DIR = "voice_output"  # @param {type: "string"}
OUTPUT_FORMAT = "wav"  # @param ["wav", "mp3", "flac"]

# Create output directory
output_path = Path(OUTPUT_DIR)
output_path.mkdir(parents=True, exist_ok=True)

# Save TTS output
tts_file = output_path / f"tts_output.{OUTPUT_FORMAT}"
save_audio(result.audio, tts_file)
print(f"üíæ Saved TTS: {tts_file}")

# Save converted output if available
if model_exists and "final_output" in dir():
    rvc_file = output_path / f"rvc_output.{OUTPUT_FORMAT}"
    save_audio(final_output.audio, rvc_file)
    print(f"üíæ Saved RVC: {rvc_file}")

print(f"\n‚úÖ Files saved to {OUTPUT_DIR}/")

In [None]:
# For Colab: Download as ZIP
# import shutil
# from google.colab import files
#
# zip_path = shutil.make_archive("voice_output", "zip", OUTPUT_DIR)
# files.download(zip_path)

## üéâ Done!

You've learned how to use SoundLab's voice generation capabilities.

**Summary:**
- **XTTS-v2**: Text-to-speech with voice cloning from reference audio
- **RVC**: Voice conversion using trained voice models
- **Pipeline**: Combine TTS + RVC for custom voice generation

**Next steps:**
- Try the [Stem Separation](./stem_separation.ipynb) notebook to extract vocals
- Explore the [MIDI Transcription](./midi_transcription.ipynb) notebook
- Check out the [SoundLab Studio](../soundlab_studio.ipynb) for the full pipeline

**Tips:**
- For voice cloning, use 5-15 seconds of clean speech
- RVC works best with models trained on similar voice types
- Use `protect` parameter to preserve consonants and reduce artifacts
- Adjust `pitch_shift` for gender conversion (+12 for male‚Üífemale, -12 for female‚Üímale)