# Conversation -> JSON -> Audio

This notebook generates multi-turn conversations using an LLM, saves them as JSON, and generates audio files from those conversations using Azure Text-to-Speech.

What this notebook does:

- Generates conversation JSON for several topics using configured prompts.
- Saves generated conversation JSON into `generated_data_2/`.
- Converts conversation JSON into SSML and uses Azure Cognitive Services Text-to-Speech to create WAV audio files.

Prerequisites (set these in your environment or a `.env` file):

- AZURE_OPENAI_API_KEY
- AZURE_OPENAI_API_VERSION (if required)
- AZURE_OPENAI_ENDPOINT
- AZURE_SPEECH_KEY
- AZURE_SPEECH_REGION

Quick run (from repo root):

1. Ensure environment variables are set (or create a `.env` file).
2. In a Python environment with dependencies installed, run the cells top-to-bottom.

Helpful installs (run in your terminal if needed):
```bash
python -m pip install python-dotenv requests openai
```

Notes:

- Run cells in order. If any API calls fail, check your keys and quota.
- The final cell will create audio files under `./data/en` (or `./data/non_en`).

## Open AI chat completion


In [None]:
from dotenv import load_dotenv
import os
# The Azure OpenAI client wrapper (AzureOpenAI) is used to call chat completion endpoints
# Ensure the azure/openai client library is installed and configured correctly for your environment.
from openai import AzureOpenAI

load_dotenv()  # Load environment variables from .env if present

client = AzureOpenAI(
  api_key = os.getenv("AZURE_OPENAI_API_KEY"),  
  api_version = os.getenv("AZURE_OPENAI_API_VERSION"),
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
)

deployment = os.getenv('AZURE_OPENAI_DEPLOYMENT', 'gpt-4o-mini')  # Use deployment from env or fallback

def get_openai_response(prompt: str) -> str:
    """Call Azure OpenAI chat completion and return assistant text content.
    Args:
        prompt (str): The prompt to send as a user message.
    Returns:
        str: The assistant response content.
    """

    response = client.chat.completions.create(
        model=deployment,
        temperature=0.9,
        messages=[
            {"role": "system", "content": "Assistant is a large language model trained by OpenAI."},
            {"role": "user", "content": prompt}
        ]
    )

    # Response shape depends on client; adapt if necessary
    return response.choices[0].message.content

## Generate Conversation

In [None]:
from prompts.geriatric_health import CONVERSATION_PROMPT as geriatric_health_prompt
from prompts.golf_coaching import CONVERSATION_PROMPT as golf_coaching_prompt
from prompts.mental_health import CONVERSATION_PROMPT as mental_health_prompt
from prompts.physical_health import CONVERSATION_PROMPT as physical_health_prompt
import json
import os

def generate_conversation(prompt: str) -> str:
    """Generate a conversation string from LLM using the provided prompt.
    This function wraps the Azure OpenAI call and returns raw string output from the model.
    """
    print("Requesting conversation from model...")
    return get_openai_response(prompt)

os.makedirs("generated_data", exist_ok=True)  # ensure output folder exists

prompts = [
    ("mental_health", mental_health_prompt),
    ("physical_health", physical_health_prompt),
    ("geriatric_health", geriatric_health_prompt),
    ("golf_coaching", golf_coaching_prompt)
]

for name, prompt in prompts:
    print(f"Generating conversation for: {name}")
    conversation = generate_conversation(prompt)
    # Many models return code fences around JSON; try to strip those safely
    if conversation.strip().startswith("```") and "json" in conversation.lower():
        conversation = conversation.strip().lstrip("```json").rstrip("```").strip()
    try:
        conv_json = json.loads(conversation)
    except Exception as e:
        print(f"Failed to parse JSON response for {name}: {e}. Trying fallback replacements...")
        # fallback: try to fix common issues such as single quotes or newlines
        conv_json = json.loads(conversation.replace("\n", " ").replace("'", '"'))
    # Save output to file for later audio generation
    with open(f"generated_data/{name}.json", "w", encoding="utf-8") as f:
        json.dump(conv_json, f, ensure_ascii=False, indent=2)
    print(f"Saved generated conversation to generated_data_2/{name}.json")

In [None]:
import json
from dotenv import load_dotenv
import os
load_dotenv()

def load_json_from_file(file_path: str) -> dict:
    """Loads a JSON object from a file and returns it.
    Args:
        file_path (str): Path to the JSON file.
    Returns:
        dict: Parsed JSON content.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"JSON file not found: {file_path}")
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)

def convert_conv_to_json(conv: str) -> dict:
    """Convert a model output string that contains JSON into a dict.
    Handles code fences and minor formatting errors.
    Args:
        conv (str): Raw model output string.
    Returns:
        dict: Parsed JSON object.
    """
    # Remove common code fence wrappers
    conv_cleaned = conv.strip()
    if conv_cleaned.startswith('```') and 'json' in conv_cleaned[:10].lower():
        conv_cleaned = conv_cleaned.split('```', 1)[1].strip()
    conv_cleaned = conv_cleaned.strip('`\n ')
    # Try to parse normally, then try safe replacements
    try:
        return json.loads(conv_cleaned)
    except Exception:
        # Fallback: replace single quotes and excessive newlines
        safe = conv_cleaned.replace("\n", " ").replace("'", '"')
        return json.loads(safe)

In [None]:
import requests
import json
import os
import shutil
from dotenv import load_dotenv
import random
# Concatenate audio files using wave module
import wave

load_dotenv()

# Azure Text to Speech API details - ensure these are set in your environment
subscription_key = os.getenv("AZURE_SPEECH_KEY")
region = os.getenv("AZURE_SPEECH_REGION")

# Candidate voices used for TTS - change or extend as required
en_voices = [
    "en-US-JennyNeural", "en-US-GuyNeural", "en-US-AriaNeural", "en-US-DavisNeural",
    "en-GB-LibbyNeural", "en-GB-RyanNeural", "en-AU-NatashaNeural", "en-AU-WilliamNeural",
    "en-ZA-LeahNeural", "en-ZA-LukeNeural",
]

non_en_voices = [
    "es-ES-ElviraNeural", "es-MX-DaliaNeural", "zh-CN-XiaoxiaoNeural", "hi-IN-SwaraNeural",
    "pt-BR-FranciscaNeural", "de-DE-KatjaNeural", "es-ES-AlvaroNeural", "es-MX-JorgeNeural",
    "zh-CN-YunxiNeural", "hi-IN-MadhurNeural", "pt-BR-AntonioNeural", "de-DE-ConradNeural",
]

def generate_audio_from_conversation(conv_json: dict, output_name: str, agent1_name: str, agent2_name: str, use_non_en_voices: bool=False) -> None:
    """Generate audio for a conversation JSON and write concatenated WAV file.
    Args:
        conv_json (dict): Conversation JSON with `conversation` list of {role: text} entries.
        output_name (str): Base output name for produced audio file(s).
        agent1_name (str): Name of participant 1 as present in conversation JSON.
        agent2_name (str): Name of participant 2 as present in conversation JSON.
        use_non_en_voices (bool): Whether to prioritize non-English voices.
    """

    endpoint = f"https://{region}.tts.speech.microsoft.com/cognitiveservices/v1"
    # Select voice pools based on language preference
    if use_non_en_voices:
        folder = './data/non_en'
        voices = non_en_voices + en_voices
    else:
        folder = './data/en'
        voices = en_voices
    # Ensure directories exist
    os.makedirs(folder, exist_ok=True)
    os.makedirs('./.temp', exist_ok=True)

    agent1 = {
        "name": agent1_name,
        "voice": random.choice(voices)
    }
    agent2 = {
        "name": agent2_name,
        "voice": random.choice(list(reversed(voices)))
    }
    print(f"Using voices: {agent1['voice']} (for {agent1_name}), {agent2['voice']} (for {agent2_name})")

    # Build SSML for each message with slight variation in speed for naturalness
    ssml_parts = []
    for line in conv_json.get('conversation', []):
        for role, text in line.items():
            if role == "order":
                continue
            voice = agent1['voice'] if role == agent1['name'] else agent2['voice']
            speed = round(random.uniform(1.1, 1.4), 2)
            ssml = f"""
            <speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
                <voice name='{voice}'>
                    <prosody rate='{speed}'>{text}</prosody>
                </voice>
            </speak>
            """
            ssml_parts.append(ssml)

    # Call Azure TTS for each SSML segment and save to temporary wav files
    audio_files = []
    headers = {
        'Ocp-Apim-Subscription-Key': subscription_key,
        'Content-Type': 'application/ssml+xml',
        'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm'
    }

    for i, ssml in enumerate(ssml_parts):
        response = requests.post(endpoint, headers=headers, data=ssml.encode('utf-8'))
        if response.status_code == 200:
            temp_audio_path = f'./.temp/part_{i}.wav'
            with open(temp_audio_path, 'wb') as audio_file:
                audio_file.write(response.content)
            audio_files.append(temp_audio_path)
        else:
            print(f"Error generating audio for part {i}: {response.status_code} - {response.text}")

    if not audio_files:
        print("No audio files were generated; aborting concatenation.")
        return

    # Create final filename using voices for traceability
    file_name = f'{output_name}_{agent1["voice"][:5]}_{agent2["voice"][:5]}'
    final_audio_path = f'{folder}/{file_name}.wav'

    # Concatenate WAV files into a single file
    with wave.open(final_audio_path, 'wb') as final_audio:
        with wave.open(audio_files[0], 'rb') as part:
            final_audio.setparams(part.getparams())
            final_audio.writeframes(part.readframes(part.getnframes()))
        for audio_file in audio_files[1:]:
            with wave.open(audio_file, 'rb') as part:
                final_audio.writeframes(part.readframes(part.getnframes()))

    print(f"Saved final audio to {final_audio_path}")

In [None]:
conversation_files = [
    ("generated_data/mental_health.json", "mental_health", "clinician", "patient"),
    ("generated_data/physical_health.json", "physical_health", "clinician", "patient"),
    ("generated_data/geriatric_health.json", "geriatric_health", "clinician", "patient"),
    ("generated_data/golf_coaching.json", "golf_coaching", "coach", "athlete"),
]

for file_path, output_name, agent1, agent2 in conversation_files:
    conv_json = load_json_from_file(file_path)
    generate_audio_from_conversation(conv_json, output_name, agent1, agent2)


{'name': 'clinician', 'voice': 'en-AU-NatashaNeural'} {'name': 'patient', 'voice': 'en-ZA-LeahNeural'}


## Run the full notebook

> After setting environment variables, run all code cells from top to bottom.

> If you only want to generate audio for existing JSON files, run only the `generate_audio_from_conversation` cell and the final loop cell.

Common errors and fixes:

- API key errors: check `.env` and environment variables are set correctly.
- Parsing errors: model returned non-JSON; inspect the raw string (print it) and fix the prompt or parsing logic.
- TTS failures: verify `AZURE_SPEECH_KEY` and `AZURE_SPEECH_REGION` and that your subscription supports the requested voice.