# Audio Moderation System: Personal Information Detection

This notebook demonstrates a mock audio moderation system that processes audio recordings to detect and flag personal information disclosure. The system uses Gemini for transcription and analysis, with PydanticAI for structured output handling.

## Setup and Dependencies

In [None]:
import os
from pathlib import Path
from typing import List, Optional
from enum import Enum

import soundfile as sf
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from pydantic_ai import Agent

# Load environment variables
assert load_dotenv("../.env"), "Couldn not load ../.env"
assert "GEMINI_API_KEY" in os.environ, "GEMINI_API_KEY not found in environment variables"

# Only needed on the Udacity workspace. Comment this out if running on another system.
os.environ['HF_HOME'] = '/voc/data/huggingface'
os.environ['OLLAMA_MODELS'] = '/voc/data/ollama/cache'
os.environ['HF_HUB_OFFLINE'] = '1'
os.environ['PATH'] = f"/voc/data/ollama/bin:/voc/data/ffmpeg/bin:{os.environ.get('PATH', '')}"
os.environ['LD_LIBRARY_PATH'] = f"/voc/data/ollama/lib:/voc/data/ffmpeg/lib:{os.environ.get('LD_LIBRARY_PATH', '')}"

import nest_asyncio
nest_asyncio.apply()

## Data Models and Policy Definitions

Let's assume that a company has a policy preventing people from exchanging customer information during unsecured calls or communications. This could be a possible schema for that:

In [None]:
from typing import Literal


class ViolationType(str, Enum):
    PHONE_NUMBER = "phone_number"
    EMAIL_ADDRESS = "email_address"
    SSN = "social_security_number"
    ADDRESS = "physical_address"
    CREDIT_CARD = "credit_card_number"
    DATE_OF_BIRTH = "date_of_birth"


class PolicyViolation(BaseModel):
    violation_type: ViolationType
    detected_text: str
    timestamp_start: Optional[float] = None
    context: str = Field(description="Surrounding context where violation was detected")


# TODO: Create the ModerationResult data model
# (aka output schema) for the model. Include the following
# elements:
# - transcript: str
# - violations_found: List[PolicyViolation]
# - risk_level: str (LOW, MEDIUM, or HIGH based on violations)
# - recommendations: List[str]
class ModerationResult(BaseModel):
    ... #complete

## Audio Processing Utilities

Let's define some helper utils to prepare data for processing:

In [None]:
import librosa
import io
from pydantic_ai import Agent, BinaryContent


def audio_to_bytes(audio_array, sample_rate):
    """Convert audio array to bytes for Pydantic AI."""
    buffer = io.BytesIO()
    sf.write(buffer, audio_array, sample_rate, format='WAV')
    buffer.seek(0)
    return buffer.getvalue()

def format_audio_for_gemini(file_path: Path):
    """Format a single audio sample from the dataset."""
    audio_array, sample_rate = librosa.load(file_path, sr=None, mono=True)
    
    # Convert audio to bytes
    audio_bytes = audio_to_bytes(
        audio_array,
        sample_rate
    )
    
    # Create binary content for Pydantic AI
    # NOTE: to make this more efficient, we could upload
    # audio as mp3 instead of wav
    audio_content = BinaryContent(
        data=audio_bytes,
        media_type='audio/wav'
    )
    
    return audio_array, sample_rate, audio_content

## PydanticAI Agent Configuration

In [None]:
# TODO: Initialize the PydanticAI agent with Gemini
# and the appropriate output type and system prompt.
# Use the 'gemini-2.5-flash-lite' model.
# The system prompt should instruct the model to:
# 1. Transcribe the audio
# 2. Analyze the transcript for personal information violations
# 3. Flag any instances of: phone numbers, email addresses, SSN, physical
#    addresses, credit card numbers, dates of birth
# 4. Assess the overall risk level and provide recommendations
moderation_agent = ... #complete

## Core Moderation Functions

## Demonstration with Sample Audio

We load a sample file and apply our moderation system:

In [None]:
import textwrap

# Analyze a single audio file (modify the filename as needed)
sample_file = "../audio_moderation.mp3"

print(f"Analyzing: {sample_file}")

# TODO; Prepare audio for API. Call the format_audio_for_gemini
# function on the sample_file and collect the audio_array, sample_rate, and audio_content.
audio_array, sample_rate, audio_content = ... #complete

# TODO: create a list containing a prompt (asking the model to transcribe and
# analyze the audio) and the audio itself.
message_content = ... #complete

# Process with PydanticAI agent
result = moderation_agent.run_sync(message_content)

print("\n=== ANALYSIS RESULTS ===")
print(
    f"Transcript: {textwrap.fill(result.output.transcript, width=80)}"
)
print(f"Risk Level: {result.output.risk_level}")
print(f"Violations Found: {len(result.output.violations_found)}")

if result.output.violations_found:
    print("\n=== POLICY VIOLATIONS ===")
    for i, violation in enumerate(result.output.violations_found, 1):
        print(f"{i}. {violation.violation_type.value.replace('_', ' ').title()}")
        print(f"   Detected: {violation.detected_text}")
        print(f"   Context: {violation.context}")

print("\n=== RECOMMENDATIONS ===")
for rec in result.output.recommendations:
    print(f"- {rec}")