# Logging and monitoring a multimodal application

In this notebook, we'll take a minimal Gradio chatbot that uses Pydantic AI to interact with Google Gemini, and we will add logging and monitoring.

First, set up your credentials:

```bash
cp env.example .env
```

Then add your Gemini API key to `.env`:

```yaml
GEMINI_API_KEY=[your key]
```

Load the credentials:

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

# Verify API key is loaded
assert os.getenv("GEMINI_API_KEY"), "GEMINI_API_KEY not found. Please check your .env file."

## Defining the Gradio application

This is a minimal Gradio application defining a chatbot, using Pydantic AI to interact with the model:

In [2]:
from pydantic_ai import Agent
from pydantic_ai.messages import BinaryContent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
import gradio as gr

import uuid
import filetype

from rate_limiting import rate_limit


# These are the MIME types supported by Gemini as of today
# (see https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash)
SUPPORTED_MIME_TYPES = [
    # Image
    "image/png",
    "image/jpeg",
    "image/webp",
    # Text
    "application/pdf",
    "text/plain",
    # Video
    "video/x-flv",
    "video/quicktime",
    "video/mpeg",
    "video/mpegs",
    "video/mpg",
    "video/mp4",
    "video/webm",
    "video/wmv",
    "video/3gpp",
    # Audio
    "audio/x-aac",
    "audio/flac",
    "audio/mp3",
    "audio/m4a",
    "audio/mpeg",
    "audio/mpga",
    "audio/mp4",
    "audio/ogg",
    "audio/pcm",
    "audio/wav",
    "audio/x-wav",
    "audio/webm",
]


class ChatSession:
    
    def __init__(self, agent: Agent):

        # unique identifier for the session
        self.session_id = str(uuid.uuid4())

        # Create a Pydantic AI agent with Gemini
        self.agent = agent
        
    # This function will be called by Gradio when the user sends a message
    async def chat(
        self, message, history, past_messages
    ):
        """
        Args:
            message: dict with 'text' and optional 'files' keys
            history: Gradio's chat history (for display only, we ignore it)
            past_messages: Pydantic AI's message history (the one we actually use)
        """
        prompt_parts = []

        # See if the input coming from gradio (i.e., from the user)
        # has a text part
        if message.get("text"):

            prompt_parts.append(message["text"])

        # See if the input coming from gradio (i.e., from the user)
        # has any files attached
        if message.get("files"):

            for file_path in message["files"]:

                # Check that we have a valid media type
                kind = filetype.guess(file_path)

                if (
                    kind is None  # not recognized
                    or not kind.mime  # no MIME type
                    or kind.mime not in SUPPORTED_MIME_TYPES
                ):

                    # Not a recognized media type, skip it
                    raise IOError(
                        f"Unsupported file type: {file_path} ({kind.mime if kind else 'unknown'})"
                    )

                else:

                    with open(file_path, "rb") as f:
                        file_bytes = f.read()

                    # Add the file content as BinaryContent to the prompt parts
                    # Pydantic AI will handle uploading it to Gemini
                    # We also specify the media type (MIME type)
                    # Note: for large files, you will need to use the File API
                    # Here we do direct upload for simplicity
                    prompt_parts.append(
                        BinaryContent(data=file_bytes, media_type=kind.mime)
                    )

        if not prompt_parts:
            raise ValueError("Please provide a message or at least one file.")

        # Here we use await, since agent.run is an async method
        # This signals to python that this is a potentially long-running operation
        # (e.g., waiting for a response from the AI model) and
        # it should allow other tasks to run in the meantime, while we wait
        result = await self.agent.run(prompt_parts, message_history=past_messages)

        # After we finished waiting, we return both the response AND the updated history
        return result.output, result.all_messages()


class GradioWrapper:
    """
    Wrapper around ChatSession to handle errors in Gradio
    """

    def __init__(self, chat_session: ChatSession):
        self.chat_session = chat_session
    
    # Apply rate limiting to our chat function
    # (say 120 calls per minute, i.e., 2 calls per second on average)
    @rate_limit(calls=120, period=60)
    async def chat_gradio(self, message, history, past_messages):
        """
        Wrapper around chat() to catch and display errors in Gradio apps.
        """

        try:
            return await self.chat_session.chat(message, history, past_messages)
        except Exception as e:

            raise gr.Error(f"Error: {str(e)}")

agent = Agent(
    model=GoogleModel("gemini-2.5-flash-lite"),
    # You typically want to set a much more specific system prompt, depending on
    # your application. Here we keep it generic.
    system_prompt="You are a helpful AI assistant. Be concise and friendly.",
    # Good practice: limit the max response tokens to avoid excessive usage
    # You might also want to consider limiting the thinking budget
    # (a token is ~0.75 words of English, so 1024 tokens is about 750 words)
    model_settings=GoogleModelSettings(max_response_tokens=1024),
    retries=3
)

gradio_chat_session = GradioWrapper(ChatSession(agent))

# State to hold Pydantic AI's message history
past_messages_state = gr.State([])

# Create the chatbot interface
demo = gr.ChatInterface(
    # fn is the function that drives the chatbot
    # For us, it is the chat method
    # NOTE: gradio automatically recognizes async functions and handles them appropriately
    fn=gradio_chat_session.chat_gradio,
    type="messages",
    # Enable multimodal inputs
    multimodal=True,
    title="Multimodal AI Chatbot",
    description="Chat with Gemini! Upload images, audio, or video files along with your questions.",
    # This is the field where the user types messages or uploads files
    textbox=gr.MultimodalTextbox(
        # Allow to upload multiple files of specified types
        file_count="multiple",
        # Allow images, audio, and video files
        file_types=["image", "audio", "video", "text", ".pdf"],
        # Placeholder text
        placeholder="Type a message or upload files (images, audio, video, text, pdf)...",
    ),
    # This is where the chat history is displayed
    chatbot=gr.Chatbot(
        height=300,
        show_copy_button=True,
        placeholder="üëã Hello! I'm your AI assistant. Upload media files or just chat with me!",
        type="messages",
    ),
    # Need to manage state for Pydantic AI's message history
    additional_inputs=[past_messages_state],  # Add the state as input
    additional_outputs=[past_messages_state],  # Add the state as output
)

# Launch the app
if __name__ == "__main__":
    demo.launch(share=False)

* Running on local URL:  http://127.0.0.1:7869
* To create a public link, set `share=True` in `launch()`.


## Instrument our chatbot

Now let's modify this code by adding our instrumentation. We will use Phoenix Arize for this, since it provides a good integration with Jupyter.

In production you will actually deploy a tracing server (either Phoenix or something else) or use commercial platforms for this. Here for convenience we do everything inside Jupyter

In [8]:
import gradio as gr
import phoenix as px

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.pydantic_ai import OpenInferenceSpanProcessor
import uuid
import filetype


# Launch Phoenix with persistence
session = px.launch_app()
print(f"Phoenix UI: {session.url}")

# We define a processing pipeline for the output of our
# tool:
#                        pydantic AI
#.                           |
#                            v
#                   OpenInferenceSpanProcessor 
# [translate from pydantic AI to the OTLP format that Phoenix understands]
#.                           |
#.                           v
#           SimpleSpanProcessor with OTLPSpanExported 
#      [send immediately the span to Phoenix in OTLP format]
tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)

# Translate pydantic AI -> Open Telemetry format (OTLP)
tracer_provider.add_span_processor(OpenInferenceSpanProcessor())

# Configure exporter to send to Phoenix
exporter = OTLPSpanExporter(endpoint="http://127.0.0.1:6006/v1/traces")
tracer_provider.add_span_processor(SimpleSpanProcessor(exporter))

# Get tracer for manual span creation
# (this is the equivalent of getting a logger in python, but for Phoenix)
tracer = trace.get_tracer(__name__)


def add_media_link_to_span(file_path: str, media_type: str, index: int):
    """Save uploaded file and add link to span for reference."""
    from pathlib import Path
    import shutil

    current_span = trace.get_current_span()
    if not current_span:
        return

    try:
        # Create uploads directory if it doesn't exist
        uploads_dir = Path("./uploaded_media")
        uploads_dir.mkdir(exist_ok=True)

        # Copy file with a unique name
        source_path = Path(file_path)
        timestamp = str(uuid.uuid4())[:8]
        dest_path = uploads_dir / f"{timestamp}_{source_path.name}"
        shutil.copy(file_path, dest_path)

        # Add file metadata to span with absolute file:// URL
        absolute_path = dest_path.resolve()
        current_span.set_attribute(
            f"input.{media_type}.{index}.url", f"file://{absolute_path}"
        )
        current_span.set_attribute(
            f"input.{media_type}.{index}.filename", source_path.name
        )
        current_span.set_attribute(
            f"input.{media_type}.{index}.size_bytes", source_path.stat().st_size
        )
    except Exception:
        # If file save fails, silently skip
        pass


class GradioWrapperWithTracing:

    def __init__(self, chat_session: ChatSession):
        
        self.chat_session = chat_session

        # Enable instrumentation in the agent
        self.chat_session.agent.instrument = True

        # This is key: it allows us to keep every message within the 
        # same chat organized in the same trace
        self.conversation_span = tracer.start_span(
            "conversation",
            attributes={"session.id": chat_session.session_id}
        )

    @rate_limit(calls=120, period=60)
    async def chat_gradio(self, message, history, past_messages):

        try:

            # Nests this span under the existing conversation span            
            with tracer.start_as_current_span(
                "chat_turn",
                context=trace.set_span_in_context(self.conversation_span),
                # Here we can add custom attributes
                attributes={"turn_number": len(history) // 2 + 1}
            ):
                
                # If there are files, add them to the span for reference
                for index, file_path in enumerate(message.get("files", [])):

                    # Check that we have a valid media type
                    kind = filetype.guess(file_path)
                    if not kind:
                        continue

                    # Add media link to span
                    add_media_link_to_span(file_path, kind.mime, index)
                
                # Call the real chatbot
                return await self.chat_session.chat(message, history, past_messages)
            
        except Exception as e:

            raise gr.Error(f"Error: {str(e)}")

gradio_wrapper_with_tracing = GradioWrapperWithTracing(ChatSession(agent))

# NOTE:
# This part is identical to before
past_messages_state = gr.State([])

demo = gr.ChatInterface(
    fn=gradio_wrapper_with_tracing.chat_gradio,
    type="messages",
    multimodal=True,
    title="Multimodal AI Chatbot",
    description="Chat with Gemini! Upload images, audio, or video files along with your questions.",
    textbox=gr.MultimodalTextbox(
        file_count="multiple",
        file_types=["image", "audio", "video", "text"],
        placeholder="Type a message or upload files (images, audio, video, text)...",
    ),
    chatbot=gr.Chatbot(
        height=300,
        show_copy_button=True,
        placeholder="Hello! I'm your AI assistant. Upload media files or just chat with me!",
        type="messages",
    ),
    additional_inputs=[past_messages_state],
    additional_outputs=[past_messages_state],
)

if __name__ == "__main__":
    demo.launch(share=False)

Existing running Phoenix instance detected! Shutting it down and starting a new instance...
Overriding of current TracerProvider is not allowed


üåç To view the Phoenix app in your browser, visit http://localhost:6006/
üìñ For more information on how to use Phoenix, check out https://arize.com/docs/phoenix
Phoenix UI: http://localhost:6006/
* Running on local URL:  http://127.0.0.1:7871
* To create a public link, set `share=True` in `launch()`.


In [10]:
session.view(height=800)

üì∫ Opening a view to the Phoenix app. The app is running at http://localhost:6006/
