|
1 |
| -# Overview of key Azure Services, overall Application Architecture and Codebase |
2 |
| - |
3 |
| -## Introduction to Semantic Kernel, Azure Open AI, AI Foundry, and Cosmos DB |
4 |
| - |
5 |
| -Semantic Kernel is a lightweight, open-source development kit that lets you easily build AI agents and integrate the latest AI models into your C#, Python, or Java codebase. It serves as an efficient middleware that enables rapid delivery of enterprise-grade solutions. Semantic Kernel combines prompts with existing APIs to perform actions, and it uses OpenAPI specifications to share extensions with other developers. It is designed to be future-proof, allowing you to swap out AI models without needing to rewrite your entire codebase. Here are some key points: |
6 |
| -- Enterprise Ready: Trusted by Microsoft and other Fortune 500 companies, Semantic Kernel is flexible, modular, and secure. It supports telemetry and other security features, ensuring responsible AI solutions at scale. |
7 |
| -- Automating Business Processes: It combines prompts with existing APIs to perform actions. By describing your existing code to AI models, Semantic Kernel translates model requests into function calls and returns the results. |
8 |
| -- Modular and Extensible: You can add your existing code as plugins, maximizing your investment by integrating AI services through out-of-the-box connectors. It uses OpenAPI specifications, allowing you to share extensions with other developers. |
9 |
| - |
10 |
| -Azure OpenAI is a service provided by Microsoft that allows businesses and developers to integrate powerful AI models into their applications using the Azure cloud platform. Here are some key points: |
11 |
| -- Access to Advanced AI Models: Azure OpenAI provides access to state-of-the-art AI models, including GPT-4, which can be used for a variety of tasks such as natural language processing, translation, and more. |
12 |
| -- Scalability and Reliability: Leveraging the Azure infrastructure, the service ensures high availability, scalability, and security, making it suitable for enterprise-level applications. |
13 |
| -- Integration with Azure Services: Azure OpenAI can be seamlessly integrated with other Azure services like Azure Cognitive Services, Azure Machine Learning, and Azure Data Factory, enabling comprehensive AI solutions. |
14 |
| -- Customization and Fine-Tuning: Users can customize and fine-tune AI models to better suit their specific needs, ensuring more accurate and relevant outputs. |
15 |
| -- Compliance and Security: Azure OpenAI adheres to strict compliance standards and provides robust security features to protect data and ensure responsible AI usage. |
16 |
| - |
17 |
| -Azure AI Foundry is a comprehensive platform provided by Microsoft that enables businesses and developers to create, deploy, and manage AI solutions. Here are some key points: |
18 |
| -- End-to-End AI Development: Azure AI Foundry offers tools and services for the entire AI lifecycle, from data preparation and model training to deployment and monitoring. |
19 |
| -- Integration with Azure Services: It seamlessly integrates with other Azure services like Azure Machine Learning, Azure Cognitive Services, and Azure Data Factory, providing a unified environment for AI development. |
20 |
| -- Scalability and Flexibility: The platform is designed to handle large-scale AI projects, offering robust infrastructure and flexible deployment options to meet diverse business needs. |
21 |
| -- Advanced AI Models: Users have access to cutting-edge AI models and can leverage pre-built solutions or customize models to suit specific requirements. |
22 |
| -- Security and Compliance: Azure AI Foundry adheres to stringent security standards and compliance regulations, ensuring the protection of data and responsible AI usage. |
| 1 | +# Real-Time Multi-Agent Customer Service System |
| 2 | + |
| 3 | +This architecture enables seamless, real-time customer service experiences via web and telephony, orchestrating multiple specialized AI agents with robust state management and enterprise data integration. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## 1. High-Level Architecture Overview |
| 8 | + |
23 | 9 |
|
24 |
| -Cosmos DB is a globally distributed, multi-model database service provided by Microsoft. Here are some key points: |
25 |
| -- Global Distribution: Azure Cosmos DB allows you to distribute your data across multiple regions worldwide, ensuring high availability and low latency. |
26 |
| -- Multi-Model Support: It supports various data models, including document, graph, key-value, and column-family, making it versatile for different types of applications. |
27 |
| -- Scalability: The service offers automatic scaling of throughput and storage, allowing you to handle large amounts of data and high traffic seamlessly. |
28 |
| -- Consistency Models: Azure Cosmos DB provides five consistency models (strong, bounded staleness, session, consistent prefix, and eventual), giving you control over the trade-off between consistency and performance. |
29 |
| -- Comprehensive Security: It includes features like encryption at rest, network isolation, and compliance with industry standards to ensure data security. |
30 | 10 |
|
31 |
| -## Introduction to AI Foundry's Evaluation Framework |
32 |
| -## Integrating Azure Content Safety |
33 |
| -## Ensuring secure and responsible AI practices |
34 |
| -#### Overview of the app architecture |
35 |
| -- Walkthrough of the codebase |
| 11 | + |
| 12 | +| Block | Role | Key Tech & Patterns | |
| 13 | +|----------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------| |
| 14 | +| **User Experience** | Captures user speech/text and renders the agent’s reply instantly. | - React + Vite Web UI <br> - WebSocket (realtime conversational UI)<br> - Azure Communication Services (ACS) for phone calls | |
| 15 | +| **Frontend Service** | Manages unique session IDs across channels; bridges between users and agents. | - Session management (UUID/callerID as session keys)<br> - HTTPS/WebSocket multiplexing | |
| 16 | +| **Agent Service** | Runs a Router Agent for intent detection, orchestrates Domain Agents (Flight, Hotel, etc.), and manages state. | - Semantic Kernel (SK)<br> - Azure OpenAI GPT-4o (realtime streaming)<br> - Multi-agent orchestration<br> - Real-time session manager | |
| 17 | +| **Business Data** | Provides durable data sources (knowledge base and transactional DB) for agent tools. | - SQL (transactions)<br> - Vector DB (searchable knowledge) | |
| 18 | +| **Session State Store** | Persists session and chat history, enabling stateless horizontal scaling and resilience. | - In-memory (local demo)<br>- Redis (distributed/production) | |
| 19 | + |
| 20 | +**Session Identity:** |
| 21 | +Every request—regardless of channel—is tied to a unique session ID (UUID for web, callerId for phone), ensuring context continuity and correct agent matching. |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +## 2. User Experience & Front-end Integration |
| 26 | + |
| 27 | +### 2.1 Web Interface |
| 28 | + |
| 29 | +- **React + Vite SPA:** |
| 30 | + - Initiates a new session (`uuid4`) on conversation start. |
| 31 | + - Establishes a persistent WebSocket to `/realtime?session_state_key={uuid}` for streaming audio and agent responses. |
| 32 | + - Streams mic audio as base64-encoded PCM frames. |
| 33 | + - Plays `response.audio.delta` tokens immediately for low-latency voice. |
| 34 | + - Renders live transcriptions and any cited “grounding” sources from agent tools. |
| 35 | + |
| 36 | +### 2.2 Phone Interface (Azure Communication Services) |
| 37 | + |
| 38 | +- **ACS Bridge (acs_realtime.py):** |
| 39 | + - Listens for incoming phone calls via Event Grid. |
| 40 | + - Answers and streams mixed-mono audio to `/realtime` over WebSocket using the callerId as the session key. |
| 41 | + - Forwards agent audio responses back to the caller; handles barge-in (interrupt) with StopAudio signaling. |
| 42 | + |
| 43 | +### 2.3 Session Lifecycle |
| 44 | + |
| 45 | +- **Session key is persistent** for the duration of the user interaction (browser tab or phone call). |
| 46 | +- **Session ends** cleanly when user disconnects/stops: triggers input_audio_buffer clear and session cleanup. |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## 3. Agent Service Design |
| 51 | + |
| 52 | +### 3.1 Multi-Agent Orchestration (rtmt.py) |
| 53 | + |
| 54 | +- **Router Agent** |
| 55 | + - Lightweight, always-on; performs `detect_intent()` on each new utterance. |
| 56 | + - Decides which Domain Agent (e.g., Flight, Hotel) should handle the request. |
| 57 | +- **Domain Agents (Flight, Hotel, etc.)** |
| 58 | + - Each has: |
| 59 | + - A YAML persona prompt (templated with user details). |
| 60 | + - Its own SK Kernel instance with specialized tools (function-calling enabled). |
| 61 | + - Capabilities for both information lookup and transactional actions. |
| 62 | + |
| 63 | +### 3.2 In-Session Intent Switching |
| 64 | + |
| 65 | +- **Intent Detection** invoked on every transcript. |
| 66 | +- **If intent changes** mid-session: |
| 67 | + - Flushes partial audio with `input_audio_buffer.clear`. |
| 68 | + - Initializes the new Domain Agent’s kernel; preserves |
36 | 69 |
|
37 |
| ---- |
38 |
| -#### Navigation: [Home](../../README.md) | [Next Section](../02_setup/README.md) |
| 70 | + conversation history for seamless transfer. |
| 71 | + - State flags (`transfer_conversation`, `active_response`) ensure only one agent responds at a time, preventing overlap or duplication. |
| 72 | + |
| 73 | +### 3.3 Tool Execution & Grounding |
| 74 | + |
| 75 | +- **Tools** are Python coroutines decorated with `@kernel_function` and exposed to LLMs via function-calling. |
| 76 | +- **Example tool functions:** |
| 77 | + - `load_user_flight_info`, `check_flight_status` (SQL SELECT) |
| 78 | + - `confirm_flight_change`, `confirm_reservation_change` (SQL UPDATE/INSERT) |
| 79 | + - `search_*_knowledgebase` (vector search for top-k matching knowledge base chunks) |
| 80 | +- **Agent responses are grounded**: Agents must only cite facts retrieved by tools; if the relevant data is missing, agents return “I don’t know.” |
| 81 | + |
| 82 | +### 3.4 Real-Time WebSocket Loop |
| 83 | + |
| 84 | +- **Bidirectional streaming:** |
| 85 | + - Client → backend: `input_audio_buffer.append` streams audio for Whisper transcription. |
| 86 | + - Backend → client: `response.audio.delta` streams synthesized agent speech token-by-token. |
| 87 | +- **Concurrency:** |
| 88 | + - Two asyncio tasks (`from_client_to_realtime`, `from_realtime_to_client`) ensure continuous, low-latency (<200ms per token) interactions. |
| 89 | + |
| 90 | +### 3.5 Session & Agent Instance Management |
| 91 | + |
| 92 | +- **Session Dict Structure:** |
| 93 | + ```python |
| 94 | + self.sessions[session_state_key] = { |
| 95 | + current_agent, current_kernel, history, |
| 96 | + target_agent_name, transfer_conversation, |
| 97 | + active_response, realtime_settings, |
| 98 | + customer_name, customer_id |
| 99 | + } |
| 100 | + ``` |
| 101 | +- **Chat history** is truncated to the last `n` turns using `ChatHistoryTruncationReducer` for prompt efficiency. |
| 102 | +- **SessionState locks** safeguard concurrent access (e.g., if multiple sockets connect for a single session), ensuring state consistency. |
| 103 | + |
| 104 | +--- |
| 105 | + |
| 106 | +## 4. Business Data Layer |
| 107 | + |
| 108 | +| Store | Content & Purpose | Access Pattern | |
| 109 | +|------------------|--------------------------------------------------------|-----------------------------------------------------------| |
| 110 | +| SQL DB | Flights, reservations, payments, user records, etc. | ACID RW via SQLAlchemy, used by agent tools under SK | |
| 111 | +| Vector DB | Policy and FAQ articles, embedded as vector chunks | Top-k cosine similarity search via `search_*_knowledgebase`| |
| 112 | + |
| 113 | +- **Abstraction:** |
| 114 | + - Business data access is **black-boxed** behind agent tools. |
| 115 | + - Agents issue tool calls—never raw SQL/embedding queries—providing security, data integrity, and clear provenance. |
| 116 | + - Enables easy migration between local JSON/SQLite and cloud-scale managed services. |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## 5. Agent Session Store |
| 121 | + |
| 122 | +- **In-memory cache** for local single-instance runs. |
| 123 | +- **Redis-backed store** for distributed deployment, supporting: |
| 124 | + - Stateless, non-sticky load-balancing. |
| 125 | + - Session persistence and resumption after pod restarts or failures. |
| 126 | +- **Stores reduced chat history** (by default, last 3 turns), so any agent instance can resume a session seamlessly. |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## 6. Scalability & Deployment Guidelines |
| 131 | + |
| 132 | +| Component | Scaling Approach | |
| 133 | +|-----------------|----------------------------------------------------------------------------------------------| |
| 134 | +| Front-End | Static assets on CDN; unlimited horizontal scale. | |
| 135 | +| ACS Bridge | Stateless app; deploy multiple replicas (App Service/Container Apps). | |
| 136 | +| Agent Service | Containerize (`rtmt.py`), deploy as multiple pods with Redis for session state. | |
| 137 | +| | SK manages WebSocket/GPT-4o connections; throughput scales with Azure OpenAI deployments. | |
| 138 | +| Transactional DB| Promote from SQLite (dev) to Azure SQL/PostgreSQL; use read replicas for high-QPS. | |
| 139 | +| Vector DB | Move from JSON/SciPy (local) to Azure AI Search, Pinecone, or Qdrant for horizontal scaling. | |
| 140 | +| Model Traffic | Multiple GPT-4o/-mini deployments; SK can load-balance transparently. | |
| 141 | + |
| 142 | +- **Design for Seamless Growth:** |
| 143 | + The architecture scales gracefully from a single VM demo to a global, multi-region deployment with high-availability and auto-scaling, all without changes to the core code structure. |
| 144 | + |
| 145 | +--- |
| 146 | + |
| 147 | +## 7. Key Architectural Strengths |
| 148 | + |
| 149 | +- **Multi-channel:** Supports both web and phone users with unified, context-aware agent service. |
| 150 | +- **Real-time & Low Latency:** Uses streaming WebSocket and token-by-token synthesis for sub-200ms response times. |
| 151 | +- **Intelligent Multi-Agent Routing:** Intent detection and on-the-fly agent transfer keep interactions efficient and relevant. |
| 152 | +- **Enterprise-Grade Data Security:** Agents access business data only through secure, audited tools; never direct queries. |
| 153 | +- **Robust State Management:** Redis-powered session store enables stateless scaling and resilience. |
| 154 | +- **Modular & Extensible:** New domain agents (Rail, Insurance, etc.) or business tools can be plugged in easily. |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +**This architecture combines the best of cloud-native, real-time interaction, secure AI orchestration, and practical scaling, making it ready for both quick prototyping and robust, enterprise deployments.** |
0 commit comments