# TTS Server on Google Colab

This notebook sets up a Flask-based Text-to-Speech server using Parler TTS on Google Colab.

## Features
- Runs a local TTS server accessible via ngrok tunnel
- Uses Parler TTS model from Hugging Face
- Optimized for GPU acceleration
- REST API for text-to-speech synthesis

## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install -q flask pyngrok
!pip install -q "parler-tts @ git+https://github.com/huggingface/parler-tts.git"
!pip install -q transformers accelerate

print("‚úÖ All dependencies installed!")

## Step 2: Check GPU Availability

In [None]:
import torch

if torch.cuda.is_available():
    print(f"‚úÖ GPU available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("‚ö†Ô∏è No GPU available. Using CPU (slower performance).")

## Step 3: Clone Model from Hugging Face (Optional)

Clone the model repository to local storage. This is useful if you want to:
- Save the model for later use
- Avoid re-downloading on reconnect
- Upload to Google Drive for persistent storage

**Note:** Skip this step if you want to load directly from HF (faster first-time setup).

In [None]:
# Optional: Clone the model from Hugging Face
# Uncomment the following lines if you want to clone the model locally

# !git lfs install
# !git clone https://huggingface.co/ai4bharat/indic-parler-tts ./indic-parler-tts

# If you cloned the model, set LOCAL_MODEL_PATH to the cloned directory
# LOCAL_MODEL_PATH = "./indic-parler-tts"

# Otherwise, load directly from Hugging Face (recommended for Colab)
LOCAL_MODEL_PATH = None  # Set to "./indic-parler-tts" if you cloned locally

print("‚úÖ Model path configured!")
if LOCAL_MODEL_PATH:
    print(f"   Using local model: {LOCAL_MODEL_PATH}")
else:
    print("   Will load directly from Hugging Face")

## Step 4: Load the TTS Model

This will download the Parler TTS model from Hugging Face (or load from local path if cloned).

In [None]:
import numpy as np
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import os

# Model configuration
MODEL_NAME = "ai4bharat/indic-parler-tts"  # AI4Bharat's Indic Parler TTS (supports Indian languages)

# Use local path if set, otherwise load from HF
MODEL_PATH = LOCAL_MODEL_PATH if LOCAL_MODEL_PATH else MODEL_NAME

# Performance optimizations
torch.set_num_threads(os.cpu_count())
torch.set_num_interop_threads(os.cpu_count())

device = "cuda:0" if torch.cuda.is_available() else "cpu"
print(f"Loading model from: {MODEL_PATH}")
print(f"Using device: {device}")

# Load model
model = ParlerTTSForConditionalGeneration.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    low_cpu_mem_usage=True,
).to(device)

# Enable optimizations
if torch.cuda.is_available():
    model = model.half()
    torch.backends.cudnn.benchmark = True
    print("‚úÖ GPU optimizations enabled: FP16 precision, cuDNN auto-tuning")
else:
    torch.set_float32_matmul_precision('high')
    print("‚úÖ CPU optimizations enabled")

model.eval()
torch.backends.cuda.matmul.allow_tf32 = True
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
sample_rate = model.config.sampling_rate

print(f"‚úÖ Model loaded successfully!")
print(f"   Sample rate: {sample_rate} Hz")

## Step 5: Test the Model (Optional)

Let's test the model with a simple text-to-speech conversion. Try with both English and Indian languages!

In [None]:
from IPython.display import Audio

# Test synthesis
test_text = "Hello! This is a test of the text to speech system."
description = "A clear, natural voice with moderate pace and good pronunciation."

print(f"Generating audio for: '{test_text}'")

with torch.inference_mode():
    input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
    prompt_input_ids = tokenizer(test_text, return_tensors="pt").input_ids.to(device)
    
    generation = model.generate(
        input_ids=input_ids,
        prompt_input_ids=prompt_input_ids,
        do_sample=True,
        temperature=1.0,
        use_cache=True,
    )
    
    audio_arr = generation.cpu().numpy().squeeze()

print("‚úÖ Audio generated!")
display(Audio(audio_arr, rate=sample_rate))

## Step 6: Create Flask Server

Now we'll create the Flask server with the TTS endpoint.

In [None]:
from flask import Flask, request, jsonify, send_file
from io import BytesIO

app = Flask(__name__)

@app.route('/synthesize', methods=['POST'])
def synthesize():
    data = request.get_json()
    text = data.get('text')
    
    if not text:
        return jsonify({"error": "No text provided"}), 400

    print(f"Synthesizing: {text}")

    try:
        with torch.inference_mode():
            # Define prompt and description
            prompt = text
            description = data.get('description', "A clear, natural voice with moderate pace.")

            # Tokenize inputs
            input_ids = tokenizer(
                description, return_tensors="pt"
            ).input_ids.to(device)
            prompt_input_ids = tokenizer(
                prompt, return_tensors="pt"
            ).input_ids.to(device)

            # Generate audio
            generation = model.generate(
                input_ids=input_ids,
                prompt_input_ids=prompt_input_ids,
                do_sample=True,
                temperature=1.0,
                use_cache=True,
            )
            
            audio_arr = generation.cpu().numpy().squeeze()
        
        # Normalize to int16 PCM
        if np.max(np.abs(audio_arr)) == 0:
            scaled_audio = np.zeros_like(audio_arr, dtype=np.int16)
        else:
            scaled_audio = (
                (audio_arr / np.max(np.abs(audio_arr))) * 32767
            ).astype(np.int16)

        pcm_data = scaled_audio.tobytes()
        
        return send_file(
            BytesIO(pcm_data),
            mimetype='application/octet-stream'
        )

    except Exception as e:
        print(f"Error during synthesis: {e}")
        return jsonify({"error": str(e)}), 500

@app.route('/info', methods=['GET'])
def info():
    return jsonify({"sample_rate": sample_rate, "channels": 1, "model": MODEL_NAME})

@app.route('/health', methods=['GET'])
def health():
    return jsonify({"status": "healthy", "device": device})

print("‚úÖ Flask server configured!")

## Step 7: Setup ngrok for Public Access

ngrok creates a public URL that tunnels to your Colab instance.

In [None]:
from pyngrok import ngrok
import threading

# Optional: Set your ngrok auth token for better limits
# Get free token from: https://dashboard.ngrok.com/get-started/your-authtoken
# ngrok.set_auth_token("YOUR_NGROK_AUTH_TOKEN")

# Start Flask in a background thread
def run_flask():
    app.run(port=5001, threaded=True)

flask_thread = threading.Thread(target=run_flask, daemon=True)
flask_thread.start()

# Create ngrok tunnel
public_url = ngrok.connect(5001)

print("\n" + "="*60)
print("üöÄ TTS SERVER IS RUNNING!")
print("="*60)
print(f"\nüì° Public URL: {public_url}")
print(f"\nEndpoints:")
print(f"  ‚Ä¢ Health check: {public_url}/health")
print(f"  ‚Ä¢ Server info:  {public_url}/info")
print(f"  ‚Ä¢ Synthesize:   {public_url}/synthesize (POST)")
print("\n" + "="*60)
print("\nüí° Example cURL command:")
print(f'''curl -X POST {public_url}/synthesize \\
  -H "Content-Type: application/json" \\
  -d '{{
    "text": "Hello, this is a test.",
    "description": "A clear, natural voice."
  }}' \\
  --output audio.raw''')
print("\n" + "="*60)
print("\n‚ö†Ô∏è Keep this cell running to maintain the server!")
print("   Press the stop button to shut down the server.\n")

## Step 8: Test the API (Optional)

Test your deployed server directly from the notebook. Try both English and Indian languages!

In [None]:
import requests
from IPython.display import Audio
import numpy as np

# Test the server with English
print("Testing with English text...")
test_url = f"{public_url}/synthesize"
test_payload = {
    "text": "Welcome to the TTS server running on Google Colab!",
    "description": "An enthusiastic, clear voice with moderate pace."
}

print(f"Testing server at: {test_url}")
response = requests.post(test_url, json=test_payload)

if response.status_code == 200:
    # Convert raw PCM bytes to numpy array
    audio_data = np.frombuffer(response.content, dtype=np.int16)
    # Convert int16 to float for playback
    audio_float = audio_data.astype(np.float32) / 32767.0
    
    print("‚úÖ Audio received successfully!")
    display(Audio(audio_float, rate=sample_rate))
else:
    print(f"‚ùå Error: {response.status_code}")
    print(response.text)

# Test with Hindi (optional)
print("\nTesting with Hindi text...")
hindi_payload = {
    "text": "‡§®‡§Æ‡§∏‡•ç‡§§‡•á, ‡§Ø‡§π ‡§è‡§ï ‡§™‡§∞‡•Ä‡§ï‡•ç‡§∑‡§£ ‡§π‡•à‡•§",
    "description": "A clear Hindi voice with moderate pace."
}

response_hindi = requests.post(test_url, json=hindi_payload)
if response_hindi.status_code == 200:
    audio_data_hindi = np.frombuffer(response_hindi.content, dtype=np.int16)
    audio_float_hindi = audio_data_hindi.astype(np.float32) / 32767.0
    print("‚úÖ Hindi audio received successfully!")
    display(Audio(audio_float_hindi, rate=sample_rate))

## Stop the Server

Run this cell when you want to stop the server and close the ngrok tunnel.

In [None]:
# Close ngrok tunnels
ngrok.kill()
print("‚úÖ Server stopped and ngrok tunnel closed.")