<a href="https://colab.research.google.com/github/thossai000/openai-tts/blob/main/OpenAI_TTS_Gradio_Demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install uv --quiet --progress-bar=off
!uv pip install --system --quiet gradio requests pydub

In [4]:
import requests
import gradio as gr
import os
from pydub import AudioSegment
from io import BytesIO

**requests** is a powerful library for making HTTP requests.

**gradio** is an easy way to build web UIs in Python.

**pydub** helps with audio manipulation (for example, converting response bytes if they come in MP3 format and Gradio expects WAV).

**io.BytesIO** can help us handle audio as in-memory bytes.

In [5]:
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
# We need credentials to authenticate requests to the TTS endpoint. This is our openAI key we generated in our account

Store your key in a secret if you’re using a public environment like Hugging Face Spaces.

In [6]:
'''We’ll create a Python function that takes in text, chooses a TTS model (e.g., tts-1 or tts-1-hd),
and returns raw audio bytes (MP3 or WAV).'''
def call_openai_tts(text, model="tts-1"):
    """
    Call the hypothetical OpenAI TTS endpoint with the given text,
    returning raw audio content (e.g., MP3 bytes).
    """
    openai_api_key = os.getenv("OPENAI_API_KEY")
    if not openai_api_key:
        raise ValueError("OpenAI API key not found in environment.")

    # Hypothetical endpoint (this is not an official endpoint)
    url = f"https://api.openai.com/v1/audio/generations?model={model}"

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {openai_api_key}",
    }

    # The exact payload structure depends on how OpenAI TTS expects data
    # We'll assume a JSON with 'text' param
    payload = {
        "text": text
    }

    response = requests.post(url, headers=headers, json=payload)

    if response.status_code != 200:
        raise RuntimeError(f"TTS request failed: {response.status_code} {response.text}")

    # For demonstration, assume 'audio_content' in the response holds the MP3 audio in base64 or raw bytes
    # If it’s returned as raw bytes, we can directly use response.content
    # If it’s base64, we’d decode it
    # Let's pretend the response is direct binary MP3 data:
    audio_bytes = response.content

    return audio_bytes

Above we wrote a Function to Call the TTS Endpoint,

Next we setup a Gradio UI




In [None]:
import gradio as gr
import os
from openai import OpenAI
from pydub import AudioSegment
from io import BytesIO

# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def generate_speech(text, model_choice, voice_choice="alloy"):
    """
    Generate speech from text using OpenAI's TTS API

    Args:
        text (str): Text to convert to speech
        model_choice (str): Model to use (tts-1 or tts-1-hd)
        voice_choice (str): Voice to use (default: alloy)

    Returns:
        str: Path to the generated audio file
    """
    try:
        # Call OpenAI's TTS API using the official client library
        response = client.audio.speech.create(
            model=model_choice,
            voice=voice_choice,
            input=text
        )

        # Get the speech data
        speech_data = response.read()

        # Convert MP3 bytes to WAV for Gradio
        audio_segment = AudioSegment.from_file(BytesIO(speech_data), format="mp3")

        # Save the WAV file temporarily
        output_file = "output.wav"
        audio_segment.export(output_file, format="wav")

        return output_file

    except Exception as e:
        raise gr.Error(f"Failed to generate speech: {str(e)}")

# Define available voices and models
AVAILABLE_VOICES = ["alloy", "echo", "fable", "onyx", "nova", "shimmer"]
AVAILABLE_MODELS = ["tts-1", "tts-1-hd"]

# Create the Gradio interface
with gr.Blocks() as demo:
    gr.Markdown("# OpenAI Text-to-Speech Demo")

    with gr.Row():
        with gr.Column():
            text_input = gr.Textbox(
                label="Enter text to synthesize",
                value="Hello! This is a test of OpenAI's text-to-speech technology.",
                lines=3
            )

            with gr.Row():
                model_input = gr.Radio(
                    choices=AVAILABLE_MODELS,
                    value="tts-1",
                    label="TTS Model"
                )
                voice_input = gr.Radio(
                    choices=AVAILABLE_VOICES,
                    value="alloy",
                    label="Voice"
                )

            generate_button = gr.Button("Generate Speech")

        with gr.Column():
            audio_output = gr.Audio(
                label="Generated Speech",
                type="filepath"
            )

    # Add some helpful information
    gr.Markdown("""
    ### Model Information
    - **tts-1**: Standard quality model, faster generation
    - **tts-1-hd**: Higher quality model, slightly slower generation

    ### Voice Descriptions
    - **alloy**: Versatile, balanced voice
    - **echo**: Warm, natural voice
    - **fable**: Expressive, youthful voice
    - **onyx**: Deep, authoritative voice
    - **nova**: Energetic, professional voice
    - **shimmer**: Clear, gentle voice
    """)

    # Set up the event handler
    generate_button.click(
        fn=generate_speech,
        inputs=[text_input, model_input, voice_input],
        outputs=audio_output
    )

# Launch the demo
if __name__ == "__main__":
    demo.launch(debug=True)

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://bc98d2cf2adb1b2bcf.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
