# 🎙️ NotebookLM-Kokoro TTS with GPU Acceleration

This notebook implements the NotebookLM-Kokoro TTS system with GPU acceleration and multiprocessing support. It uses Kokoro for text-to-speech generation and Gradio for the user interface.

## Setup Environment and Dependencies

First, let's install the required packages and configure the GPU environment.

In [None]:
# Install required packages
!pip install kokoro>=0.9.2 gradio soundfile torch PyPDF2 numpy openai

# Verify GPU availability 
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

In [None]:
# Import required libraries
import os
import tempfile
import gradio as gr
import soundfile as sf
import numpy as np
import ast
import shutil
import warnings
import concurrent.futures
import multiprocessing
import torch
from kokoro import KPipeline
warnings.filterwarnings("ignore")

# Configure device
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
NUM_WORKERS = multiprocessing.cpu_count()

## Define Helper Functions and System Prompts

Set up the core system prompts and helper functions for transcript generation.

In [None]:
TRANSCRIPT_WRITER_SYSTEM_PROMPT = """
You are a world-class storyteller and you have worked as a ghost writer.
Welcome the listeners by talking about the Chapter Title.
You will be talking to a guest.

Do not address the other speaker as Speaker 1 or Speaker 2.

Instructions for Speaker 1:
Speaker 1: Leads the conversation and teaches the guest, giving incredible anecdotes and analogies when explaining.
Speaker 1: Do not address the guest as Speaker 2.

Instructions for Speaker 2:
Speaker 2: Keeps the conversation on track by asking follow up questions. Gets super excited or confused when asking questions.
Speaker 2: Do not address the other speaker as Speaker 1.
"""

TRANSCRIPT_REWRITER_SYSTEM_PROMPT = """
You are an international Oscar-winning screenwriter.
Your job is to use the transcript written below to re-write it for an AI Text-To-Speech Pipeline.

IMPORTANT FORMAT INSTRUCTIONS:
You must return a JSON array of arrays, where each inner array contains exactly two strings:
1. The speaker label (either "Speaker 1" or "Speaker 2")
2. The dialogue text

YOUR RESPONSE MUST BE VALID JSON.
"""

## Implement GPU-Accelerated TTS Pipeline

Create the Kokoro TTS pipeline with GPU support for faster processing.

In [None]:
def create_gpu_pipeline(lang_code="a", repo_id="hexgrad/Kokoro-82M"):
    """Initialize Kokoro pipeline with GPU support"""
    pipeline = KPipeline(lang_code=lang_code, repo_id=repo_id)
    pipeline.to(DEVICE)
    return pipeline

def process_segment_gpu(entry_and_voice_map):
    """Process audio segment with GPU acceleration"""
    entry, voice_map = entry_and_voice_map
    speaker, dialogue = entry
    chosen_voice = voice_map.get(speaker, "af_heart")
    
    pipeline = create_gpu_pipeline()
    generator = pipeline(dialogue, voice=chosen_voice)
    
    segment_audio = []
    with torch.cuda.amp.autocast():  # Enable automatic mixed precision
        for _, _, audio in generator:
            segment_audio.append(audio)
            
    if segment_audio:
        return np.concatenate(segment_audio, axis=0)
    return None

## Implement Parallel Audio Generation

Set up multiprocessing for parallel audio generation with GPU support.

In [None]:
def generate_audio_parallel(script, speaker1_voice, speaker2_voice, output_file):
    """Generate audio using parallel processing with GPU acceleration"""
    voice_map = {"Speaker 1": speaker1_voice, "Speaker 2": speaker2_voice}
    
    try:
        transcript_list = ast.literal_eval(script.strip())
        entries_with_voice_map = [(entry, voice_map) for entry in transcript_list]
        
        with concurrent.futures.ProcessPoolExecutor(max_workers=NUM_WORKERS) as executor:
            results = list(executor.map(process_segment_gpu, entries_with_voice_map))
            
        all_audio_segments = [r for r in results if r is not None]
        if not all_audio_segments:
            return None
            
        # Add pauses between segments
        sample_rate = 24000
        pause = np.zeros(sample_rate, dtype=np.float32)
        final_audio = np.concatenate([seg for pair in zip(all_audio_segments, 
                                    [pause] * len(all_audio_segments)) for seg in pair][:-1])
        
        sf.write(output_file, final_audio, sample_rate)
        return output_file
        
    except Exception as e:
        print(f"Error in audio generation: {e}")
        return None

## Create Gradio Interface

Build the Gradio web interface with GPU monitoring.

In [None]:
def create_gradio_interface():
    """Create Gradio interface with GPU monitoring"""
    with gr.Blocks(theme=gr.themes.Soft()) as app:
        gr.Markdown("# 🎙️ NotebookLM-Kokoro TTS (GPU-Accelerated)")
        
        with gr.Row():
            with gr.Column():
                pdf_input = gr.File(label="Upload PDF", file_types=[".pdf"])
                speaker1_voice = gr.Dropdown(
                    choices=["af_heart", "af_bella", "hf_beta"],
                    value="af_heart",
                    label="Speaker 1 Voice"
                )
                speaker2_voice = gr.Dropdown(
                    choices=["af_nicole", "bf_emma"],
                    value="af_nicole",
                    label="Speaker 2 Voice"
                )
                
            with gr.Column():
                status = gr.Textbox(label="Status")
                audio_output = gr.Audio(label="Generated Audio")
                
        # Add GPU monitoring
        if torch.cuda.is_available():
            gpu_info = gr.Markdown(f"""
            🖥️ GPU: {torch.cuda.get_device_name(0)}
            💾 Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB
            """)
            
    return app

# Launch the interface
if __name__ == "__main__":
    demo = create_gradio_interface()
    demo.launch(share=True)

## Run Demo

Test the complete pipeline with an example PDF.

In [None]:
# Example usage
pdf_path = "example.pdf"
transcript, _ = generate_podcast_script(pdf_path)
audio_file = generate_audio_parallel(transcript, "af_heart", "af_nicole", "output.wav")
print(f"Audio generated: {audio_file}")