# 🎤 CLARISSA Voice Input Showcase

**Talk to Your Reservoir Simulation**

This notebook demonstrates CLARISSA's voice interface - control reservoir simulations through natural speech.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wolfram-laube/clarissa/blob/main/docs/tutorials/notebooks/16_Voice_Input_Showcase.ipynb)

---

## What You'll Learn

1. **Speech-to-Text** - Convert voice to text with Whisper
2. **Intent Recognition** - Parse commands into structured intents
3. **Command Execution** - Trigger visualizations by voice
4. **Full Pipeline** - End-to-end voice control demo

## 1️⃣ Setup & Installation

In [None]:
# Install required packages
!pip install -q openai anthropic plotly numpy ipywidgets

# For local Whisper (optional - skip if using API)
# !pip install -q faster-whisper

print("✅ Packages installed (OpenAI + Anthropic)")

In [None]:
import os
import json
import base64
from dataclasses import dataclass, field
from typing import Dict, Any, Optional, List
from enum import Enum
import numpy as np
import plotly.graph_objects as go
from IPython.display import display, HTML, Audio
import ipywidgets as widgets

print("✅ Imports ready")

In [None]:
# API Key Setup - Choose your LLM provider
# ═══════════════════════════════════════════════════════════════
# CLARISSA supports both OpenAI and Anthropic (Claude) for intent parsing.
# Whisper (OpenAI) is always used for speech-to-text.
# ═══════════════════════════════════════════════════════════════

# Option 1: Colab secrets (recommended)
try:
    from google.colab import userdata
    
    # Try Anthropic first (CLARISSA's native LLM)
    try:
        os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')
        print("✅ Anthropic API key loaded (Claude)")
    except:
        pass
    
    # Also try OpenAI (for Whisper STT + optional intent parsing)
    try:
        os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
        print("✅ OpenAI API key loaded (Whisper STT)")
    except:
        pass
        
except ImportError:
    pass

# Check what's available
anthropic_key = os.getenv('ANTHROPIC_API_KEY')
openai_key = os.getenv('OPENAI_API_KEY')

print()
print("═" * 50)
print("LLM Configuration:")
print("═" * 50)

if anthropic_key:
    print("🟢 Claude (Anthropic): Available - will use for intent parsing")
    LLM_PROVIDER = "anthropic"
elif openai_key:
    print("🟢 GPT-4 (OpenAI): Available - will use for intent parsing")
    LLM_PROVIDER = "openai"
else:
    print("🟡 No LLM API key - using rule-based parsing only")
    print("   (Works for common commands, limited for complex queries)")
    LLM_PROVIDER = "rules"

if openai_key:
    print("🟢 Whisper STT: Available")
else:
    print("🟡 Whisper STT: Not available (no OpenAI key)")
    print("   (Text input mode only)")

print("═" * 50)
print()
print("💡 To add API keys in Colab:")
print("   1. Click 🔑 Secrets (left sidebar)")
print("   2. Add: ANTHROPIC_API_KEY and/or OPENAI_API_KEY")
print("   3. Enable notebook access")
print("   4. Restart runtime")

## 2️⃣ Core Voice Components

These are simplified versions of the full CLARISSA voice module.

In [None]:
# Intent Types
class IntentType(Enum):
    VISUALIZE_PROPERTY = "visualize_property"
    QUERY_VALUE = "query_value"
    NAVIGATE = "navigate"
    HELP = "help"
    CANCEL = "cancel"
    CONFIRM = "confirm"
    UNKNOWN = "unknown"

@dataclass
class Intent:
    """Parsed intent from voice command."""
    type: IntentType
    confidence: float
    slots: Dict[str, Any] = field(default_factory=dict)
    raw_text: str = ""

@dataclass 
class VoiceResponse:
    """Response to voice command."""
    success: bool
    text: str
    intent: Optional[Intent] = None
    visualization: Optional[go.Figure] = None

print("✅ Data classes defined")

In [None]:
# Domain vocabulary for better recognition
DOMAIN_VOCABULARY = """
Reservoir simulation terms: permeability, porosity, water saturation, 
oil saturation, pressure, BHP, bottomhole pressure, OOIP,
waterflood, injector, producer, PROD1, INJ1, INJ2, INJ3, INJ4,
millidarcy, mD, psi, bbl/day, STB, FOPT, FOPR, FWPT, FWPR, FWCT,
water cut, layer, grid, cell, timestep, 3D, cross-section, animation
"""

print("📚 Domain vocabulary loaded")

### 2.1 Speech-to-Text (Whisper)

In [None]:
def transcribe_audio(audio_path: str) -> str:
    """
    Transcribe audio file using OpenAI Whisper API.
    
    Args:
        audio_path: Path to audio file (WAV, MP3, etc.)
        
    Returns:
        Transcribed text
    """
    import openai
    
    client = openai.OpenAI()
    
    with open(audio_path, 'rb') as audio_file:
        response = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            prompt=DOMAIN_VOCABULARY,
            language="en"
        )
    
    return response.text

print("🎤 Whisper transcription function ready")

### 2.2 Intent Parser (LLM-based)

In [None]:
INTENT_PROMPT = """You are a reservoir simulation assistant. Parse the user's voice command into a structured intent.

Available intents:
- visualize_property: Show reservoir properties (permeability, porosity, saturation, pressure)
- query_value: Ask about simulation values (rates, pressures, water cut, cumulative production)
- navigate: Go to different sections (results, sensitivity, model)
- help: Ask for help or guidance
- cancel: Stop or cancel current action
- confirm: Confirm a pending action

Slots to extract:
- property: permeability, porosity, water_saturation, oil_saturation, pressure
- layer: integer (1-5 typically)
- time_days: integer (simulation day)
- view_type: 3d, cross_section_xy, cross_section_xz, animation
- well: PROD1, INJ1, INJ2, INJ3, INJ4
- target: results, sensitivity, model, export

User said: "{text}"

Respond with ONLY valid JSON (no markdown, no explanation):
{{"intent": "<type>", "confidence": <0.0-1.0>, "slots": {{...}}}}"""

import re

def parse_intent_rules(text: str) -> Intent:
    """Rule-based intent parsing - works WITHOUT API key."""
    text_lower = text.lower().strip()
    slots = {}
    
    # Cancel / Confirm / Help
    if text_lower in ["stop", "cancel", "never mind", "abort", "quit"]:
        return Intent(IntentType.CANCEL, 1.0, {}, text)
    if text_lower in ["yes", "yeah", "yep", "confirm", "ok", "okay", "do it", "go ahead"]:
        return Intent(IntentType.CONFIRM, 1.0, {}, text)
    if text_lower == "help" or "what can" in text_lower:
        return Intent(IntentType.HELP, 1.0, {}, text)
    
    # Visualization
    viz_keywords = ["show", "display", "visualize", "plot", "view", "see"]
    if any(kw in text_lower for kw in viz_keywords):
        if "perm" in text_lower: slots["property"] = "permeability"
        elif "poro" in text_lower: slots["property"] = "porosity"
        elif "saturation" in text_lower or " sw" in text_lower: slots["property"] = "water_saturation"
        elif "pressure" in text_lower: slots["property"] = "pressure"
        
        layer_match = re.search(r'layer\s*(\d+)', text_lower)
        if layer_match: slots["layer"] = int(layer_match.group(1))
        
        time_match = re.search(r'(?:day|time)\s*(\d+)', text_lower)
        if time_match: slots["time_days"] = int(time_match.group(1))
        
        if not slots.get("property") and not slots.get("layer"):
            slots["property"] = "permeability"
        
        return Intent(IntentType.VISUALIZE_PROPERTY, 0.95, slots, text)
    
    # Query
    if any(kw in text_lower for kw in ["what", "how much", "tell me"]):
        if "oil rate" in text_lower: slots["property"] = "oil_rate"
        elif "water cut" in text_lower: slots["property"] = "water_cut"
        elif "water rate" in text_lower: slots["property"] = "water_rate"
        elif "pressure" in text_lower: slots["property"] = "pressure"
        if slots:
            return Intent(IntentType.QUERY_VALUE, 0.9, slots, text)
    
    return None


def parse_with_claude(text: str) -> Intent:
    """Parse intent using Claude (Anthropic)."""
    import anthropic
    client = anthropic.Anthropic()
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=200,
        messages=[{"role": "user", "content": INTENT_PROMPT.format(text=text)}]
    )
    
    result_text = response.content[0].text.strip()
    if result_text.startswith("```"):
        result_text = result_text.split("```")[1].replace("json", "", 1)
    
    data = json.loads(result_text)
    return Intent(
        IntentType(data.get("intent", "unknown")),
        float(data.get("confidence", 0.8)),
        data.get("slots", {}),
        text
    )


def parse_with_openai(text: str) -> Intent:
    """Parse intent using GPT-4 (OpenAI)."""
    import openai
    client = openai.OpenAI()
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Parse voice commands into JSON. Respond with ONLY valid JSON."},
            {"role": "user", "content": INTENT_PROMPT.format(text=text)}
        ],
        temperature=0.1,
        max_tokens=200
    )
    
    result_text = response.choices[0].message.content.strip()
    if result_text.startswith("```"):
        result_text = result_text.split("```")[1].replace("json", "", 1)
    
    data = json.loads(result_text)
    return Intent(
        IntentType(data.get("intent", "unknown")),
        float(data.get("confidence", 0.8)),
        data.get("slots", {}),
        text
    )


def parse_intent(text: str) -> Intent:
    """
    Parse text into structured intent.
    
    Priority:
    1. Rule-based (instant, no API)
    2. Claude (CLARISSA native)
    3. OpenAI GPT-4
    4. Fallback to unknown
    """
    # Try rules first
    rule_result = parse_intent_rules(text)
    if rule_result is not None:
        print(f"   📋 Parsed with rules")
        return rule_result
    
    # Try Claude (CLARISSA's native LLM)
    if os.getenv('ANTHROPIC_API_KEY'):
        try:
            result = parse_with_claude(text)
            print(f"   🤖 Parsed with Claude")
            return result
        except Exception as e:
            print(f"   ⚠️ Claude error: {e}")
    
    # Try OpenAI
    if os.getenv('OPENAI_API_KEY'):
        try:
            result = parse_with_openai(text)
            print(f"   🤖 Parsed with GPT-4")
            return result
        except Exception as e:
            print(f"   ⚠️ OpenAI error: {e}")
    
    # No LLM available
    print(f"   ℹ️ No LLM available for complex command")
    return Intent(IntentType.UNKNOWN, 0.0, {}, text)

print("🧠 Intent parser ready")
print("   Priority: Rules → Claude → GPT-4 → Fallback")

### 2.3 Visualization Generator

In [None]:
# Create synthetic reservoir data for demo
NX, NY, NZ = 10, 10, 5

def generate_demo_data():
    """Generate synthetic reservoir properties."""
    np.random.seed(42)
    
    # Permeability with channel
    perm = np.random.lognormal(mean=4.5, sigma=0.5, size=(NX, NY, NZ))
    perm[3:7, :, :] *= 3  # High-perm channel
    
    # Porosity correlated with perm
    poro = 0.15 + 0.1 * (np.log(perm) - 4) / 2
    poro = np.clip(poro, 0.05, 0.35)
    
    # Water saturation (varies with time)
    def get_saturation(time_days):
        progress = min(time_days / 1800, 1.0)
        sw = np.ones((NX, NY, NZ)) * 0.2  # Connate water
        # Water front moving from injectors
        for i in range(NX):
            for j in range(NY):
                dist = np.sqrt((i - NX//2)**2 + (j - NY//2)**2)
                if dist < progress * NX * 0.7:
                    sw[i, j, :] = 0.2 + 0.5 * (1 - dist / (NX * 0.7))
        return np.clip(sw, 0.2, 0.8)
    
    return {
        'permeability': perm,
        'porosity': poro,
        'get_saturation': get_saturation
    }

DEMO_DATA = generate_demo_data()
print("📊 Demo data generated")
print(f"   Grid: {NX}×{NY}×{NZ} = {NX*NY*NZ} cells")

In [None]:
def create_3d_visualization(prop_name: str, prop_data: np.ndarray) -> go.Figure:
    """Create 3D scatter plot of property."""
    x, y, z = [], [], []
    values = []
    
    for i in range(NX):
        for j in range(NY):
            for k in range(NZ):
                x.append(i)
                y.append(j)
                z.append(k)
                values.append(prop_data[i, j, k])
    
    fig = go.Figure(data=go.Scatter3d(
        x=x, y=y, z=z,
        mode='markers',
        marker=dict(
            size=8,
            color=values,
            colorscale='Viridis',
            colorbar=dict(title=prop_name.title()),
            opacity=0.8
        )
    ))
    
    fig.update_layout(
        title=f"3D {prop_name.title()} Distribution",
        scene=dict(
            xaxis_title="X",
            yaxis_title="Y", 
            zaxis_title="Layer",
            aspectmode='cube'
        ),
        height=500
    )
    
    return fig

def create_cross_section(prop_name: str, prop_data: np.ndarray, layer: int) -> go.Figure:
    """Create 2D heatmap cross-section."""
    layer_idx = max(0, min(layer - 1, NZ - 1))
    data_2d = prop_data[:, :, layer_idx]
    
    fig = go.Figure(data=go.Heatmap(
        z=data_2d.T,
        colorscale='Viridis',
        colorbar=dict(title=prop_name.title())
    ))
    
    # Add well markers
    fig.add_trace(go.Scatter(
        x=[NX//2], y=[NY//2],
        mode='markers+text',
        marker=dict(size=15, color='blue', symbol='circle'),
        text=['PROD1'], textposition='top center',
        name='Producer'
    ))
    
    # Injectors at corners
    inj_x = [1, 1, NX-2, NX-2]
    inj_y = [1, NY-2, 1, NY-2]
    inj_names = ['INJ1', 'INJ2', 'INJ3', 'INJ4']
    fig.add_trace(go.Scatter(
        x=inj_x, y=inj_y,
        mode='markers+text',
        marker=dict(size=12, color='red', symbol='triangle-up'),
        text=inj_names, textposition='top center',
        name='Injectors'
    ))
    
    fig.update_layout(
        title=f"{prop_name.title()} at Layer {layer}",
        xaxis_title="X",
        yaxis_title="Y",
        height=450
    )
    
    return fig

print("🎨 Visualization functions ready")

### 2.4 Command Executor

In [None]:
def execute_intent(intent: Intent) -> VoiceResponse:
    """
    Execute parsed intent and return response.
    
    Args:
        intent: Parsed Intent object
        
    Returns:
        VoiceResponse with text and optional visualization
    """
    slots = intent.slots
    
    if intent.type == IntentType.VISUALIZE_PROPERTY:
        prop = slots.get('property', 'permeability')
        layer = slots.get('layer')
        time_days = slots.get('time_days', 500)
        view_type = slots.get('view_type', '3d')
        
        # Get property data
        if 'saturation' in prop or prop == 'sw':
            prop_data = DEMO_DATA['get_saturation'](time_days)
            prop_name = 'water_saturation'
        elif prop in ['permeability', 'perm']:
            prop_data = DEMO_DATA['permeability']
            prop_name = 'permeability'
        elif prop in ['porosity', 'poro']:
            prop_data = DEMO_DATA['porosity']
            prop_name = 'porosity'
        else:
            prop_data = DEMO_DATA['permeability']
            prop_name = 'permeability'
        
        # Create visualization
        if layer or 'cross' in str(view_type):
            layer = layer or 3
            fig = create_cross_section(prop_name, prop_data, layer)
            text = f"Showing {prop_name.replace('_', ' ')} at layer {layer}."
        else:
            fig = create_3d_visualization(prop_name, prop_data)
            text = f"Showing {prop_name.replace('_', ' ')} in 3D."
        
        return VoiceResponse(True, text, intent, fig)
    
    elif intent.type == IntentType.QUERY_VALUE:
        prop = slots.get('property', 'oil_rate')
        
        # Simulate query results
        values = {
            'oil_rate': ('1,250', 'bbl/day'),
            'water_rate': ('450', 'bbl/day'),
            'water_cut': ('26', '%'),
            'pressure': ('3,450', 'psi'),
            'bhp': ('3,450', 'psi'),
            'cumulative_oil': ('2.3', 'MMSTB'),
            'fopt': ('2.3', 'MMSTB'),
        }
        
        prop_key = prop.lower().replace(' ', '_')
        if prop_key in values:
            val, unit = values[prop_key]
            text = f"The {prop.replace('_', ' ')} is {val} {unit}."
        else:
            text = f"I don't have data for {prop}."
        
        return VoiceResponse(True, text, intent)
    
    elif intent.type == IntentType.HELP:
        text = """You can say things like:
• "Show me the permeability"
• "Show layer 3"
• "What's the oil rate?"
• "Show saturation at day 500"
• "Stop" or "Cancel" to abort"""
        return VoiceResponse(True, text, intent)
    
    elif intent.type == IntentType.CANCEL:
        return VoiceResponse(True, "Cancelled.", intent)
    
    elif intent.type == IntentType.CONFIRM:
        return VoiceResponse(True, "Confirmed.", intent)
    
    else:
        return VoiceResponse(
            False, 
            "I didn't understand that. Try saying 'help' for available commands.",
            intent
        )

print("⚡ Command executor ready")

## 3️⃣ Voice Pipeline

The complete voice processing pipeline:

In [None]:
def process_voice_command(text_or_audio: str, is_audio: bool = False) -> VoiceResponse:
    """
    Complete voice command processing pipeline.
    
    Args:
        text_or_audio: Either text command or path to audio file
        is_audio: True if input is audio file path
        
    Returns:
        VoiceResponse with result
    """
    print("\n" + "="*50)
    
    # Step 1: Transcribe if audio
    if is_audio:
        print("🎤 Step 1: Transcribing audio...")
        try:
            text = transcribe_audio(text_or_audio)
            print(f"   Transcription: \"{text}\"")
        except Exception as e:
            return VoiceResponse(False, f"Transcription failed: {e}")
    else:
        text = text_or_audio
        print(f"📝 Input: \"{text}\"")
    
    # Step 2: Parse intent
    print("\n🧠 Step 2: Parsing intent...")
    intent = parse_intent(text)
    print(f"   Intent: {intent.type.value}")
    print(f"   Confidence: {intent.confidence:.0%}")
    if intent.slots:
        print(f"   Slots: {intent.slots}")
    
    # Step 3: Execute
    print("\n⚡ Step 3: Executing command...")
    response = execute_intent(intent)
    
    # Step 4: Response
    print(f"\n💬 Response: {response.text}")
    print("="*50)
    
    return response

print("🚀 Voice pipeline ready!")

## 4️⃣ Interactive Demo

Try voice commands! (Text mode - type what you would say)

In [None]:
# Demo 1: Show permeability in 3D
response = process_voice_command("show me the permeability")
if response.visualization:
    response.visualization.show()

In [None]:
# Demo 2: Cross-section at specific layer
response = process_voice_command("show layer 3")
if response.visualization:
    response.visualization.show()

In [None]:
# Demo 3: Query a value
response = process_voice_command("what is the water cut?")

In [None]:
# Demo 4: Show saturation at specific time
response = process_voice_command("show water saturation at day 1000")
if response.visualization:
    response.visualization.show()

In [None]:
# Demo 5: Help command
response = process_voice_command("help")

## 5️⃣ Interactive Widget

Try your own commands:

In [None]:
# Create interactive widget
text_input = widgets.Text(
    placeholder='Type a command (e.g., "show porosity in 3D")',
    description='🎤 Say:',
    layout=widgets.Layout(width='80%')
)

output = widgets.Output()

def on_submit(change):
    with output:
        output.clear_output()
        if text_input.value:
            response = process_voice_command(text_input.value)
            if response.visualization:
                display(response.visualization)

text_input.on_submit(lambda x: on_submit(x))

submit_btn = widgets.Button(description="Process", button_style='primary')
submit_btn.on_click(lambda x: on_submit(x))

display(widgets.HBox([text_input, submit_btn]))
display(output)

print("\n💡 Type a command and press Enter or click Process")
print("\nExample commands:")
print('  • "show me porosity"')
print('  • "show saturation at layer 2"')
print('  • "what is the oil rate"')
print('  • "help"')

## 6️⃣ Audio File Demo

Upload an audio file to test real speech-to-text:

In [None]:
# File upload widget
from google.colab import files

def process_uploaded_audio():
    """Upload and process an audio file."""
    print("📁 Upload an audio file (WAV, MP3, M4A)...")
    uploaded = files.upload()
    
    for filename in uploaded.keys():
        print(f"\n🎵 Processing: {filename}")
        response = process_voice_command(filename, is_audio=True)
        
        if response.visualization:
            display(response.visualization)
        
        return response

# Uncomment to test with audio upload:
# process_uploaded_audio()

## 7️⃣ Test Suite

Evaluate intent recognition accuracy:

In [None]:
# Test cases
TEST_CASES = [
    ("show me the permeability", "visualize_property", {"property": "permeability"}),
    ("display porosity in 3D", "visualize_property", {"property": "porosity"}),
    ("show layer 3", "visualize_property", {"layer": 3}),
    ("what is the oil rate", "query_value", {"property": "oil_rate"}),
    ("what's the water cut", "query_value", {"property": "water_cut"}),
    ("help", "help", {}),
    ("stop", "cancel", {}),
    ("yes", "confirm", {}),
]

def run_tests():
    """Run test suite and report accuracy."""
    print("🧪 Running intent recognition tests...\n")
    
    passed = 0
    failed = 0
    
    for text, expected_intent, expected_slots in TEST_CASES:
        intent = parse_intent(text)
        
        intent_match = intent.type.value == expected_intent
        
        if intent_match:
            passed += 1
            status = "✅"
        else:
            failed += 1
            status = "❌"
        
        print(f"{status} \"{text}\"")
        print(f"   Expected: {expected_intent}, Got: {intent.type.value}")
        if not intent_match:
            print(f"   Confidence: {intent.confidence:.0%}")
        print()
    
    total = passed + failed
    accuracy = passed / total * 100 if total > 0 else 0
    
    print("="*50)
    print(f"📊 Results: {passed}/{total} passed ({accuracy:.0f}% accuracy)")
    
    return accuracy

# Uncomment to run tests (requires API key):
# run_tests()

## 📚 Summary

This notebook demonstrated:

1. **Speech-to-Text** - Whisper API transcription with domain vocabulary
2. **Intent Parsing** - LLM-based command understanding
3. **Slot Extraction** - Property, layer, time parameters
4. **Visualization** - 3D and cross-section views triggered by voice

### Next Steps

- **Full Integration**: Connect to OPM Flow simulations
- **Live Microphone**: Browser-based audio capture
- **TTS Response**: Audio feedback with OpenAI TTS
- **Offline Mode**: Local Whisper for air-gapped deployments

### Resources

- [Voice Input Tutorial](../guides/voice-input-tutorial.md)
- [ADR-028: Voice Architecture](../../architecture/adr/ADR-028-voice-input-architecture.md)
- [CLARISSA Source Code](https://gitlab.com/wolfram_laube/blauweiss_llc/clarissa)

---

*Part of CLARISSA - Conversational Language Agent for Reservoir Simulation*