Skip to content

Commit b4e5d70

Browse files
author
James N.
committed
update archicture
1 parent df6fff0 commit b4e5d70

File tree

190 files changed

+1008
-17516
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

190 files changed

+1008
-17516
lines changed

docs/01_architecture/README.md

Lines changed: 155 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,158 @@
1-
# Overview of key Azure Services, overall Application Architecture and Codebase
2-
3-
## Introduction to Semantic Kernel, Azure Open AI, AI Foundry, and Cosmos DB
4-
5-
Semantic Kernel is a lightweight, open-source development kit that lets you easily build AI agents and integrate the latest AI models into your C#, Python, or Java codebase. It serves as an efficient middleware that enables rapid delivery of enterprise-grade solutions. Semantic Kernel combines prompts with existing APIs to perform actions, and it uses OpenAPI specifications to share extensions with other developers. It is designed to be future-proof, allowing you to swap out AI models without needing to rewrite your entire codebase. Here are some key points:
6-
- Enterprise Ready: Trusted by Microsoft and other Fortune 500 companies, Semantic Kernel is flexible, modular, and secure. It supports telemetry and other security features, ensuring responsible AI solutions at scale.
7-
- Automating Business Processes: It combines prompts with existing APIs to perform actions. By describing your existing code to AI models, Semantic Kernel translates model requests into function calls and returns the results.
8-
- Modular and Extensible: You can add your existing code as plugins, maximizing your investment by integrating AI services through out-of-the-box connectors. It uses OpenAPI specifications, allowing you to share extensions with other developers.
9-
10-
Azure OpenAI is a service provided by Microsoft that allows businesses and developers to integrate powerful AI models into their applications using the Azure cloud platform. Here are some key points:
11-
- Access to Advanced AI Models: Azure OpenAI provides access to state-of-the-art AI models, including GPT-4, which can be used for a variety of tasks such as natural language processing, translation, and more.
12-
- Scalability and Reliability: Leveraging the Azure infrastructure, the service ensures high availability, scalability, and security, making it suitable for enterprise-level applications.
13-
- Integration with Azure Services: Azure OpenAI can be seamlessly integrated with other Azure services like Azure Cognitive Services, Azure Machine Learning, and Azure Data Factory, enabling comprehensive AI solutions.
14-
- Customization and Fine-Tuning: Users can customize and fine-tune AI models to better suit their specific needs, ensuring more accurate and relevant outputs.
15-
- Compliance and Security: Azure OpenAI adheres to strict compliance standards and provides robust security features to protect data and ensure responsible AI usage.
16-
17-
Azure AI Foundry is a comprehensive platform provided by Microsoft that enables businesses and developers to create, deploy, and manage AI solutions. Here are some key points:
18-
- End-to-End AI Development: Azure AI Foundry offers tools and services for the entire AI lifecycle, from data preparation and model training to deployment and monitoring.
19-
- Integration with Azure Services: It seamlessly integrates with other Azure services like Azure Machine Learning, Azure Cognitive Services, and Azure Data Factory, providing a unified environment for AI development.
20-
- Scalability and Flexibility: The platform is designed to handle large-scale AI projects, offering robust infrastructure and flexible deployment options to meet diverse business needs.
21-
- Advanced AI Models: Users have access to cutting-edge AI models and can leverage pre-built solutions or customize models to suit specific requirements.
22-
- Security and Compliance: Azure AI Foundry adheres to stringent security standards and compliance regulations, ensuring the protection of data and responsible AI usage.
1+
# Real-Time Multi-Agent Customer Service System
2+
3+
This architecture enables seamless, real-time customer service experiences via web and telephony, orchestrating multiple specialized AI agents with robust state management and enterprise data integration.
4+
5+
---
6+
7+
## 1. High-Level Architecture Overview
8+
![Logical architecture](../../media/logical_architecture.png)
239

24-
Cosmos DB is a globally distributed, multi-model database service provided by Microsoft. Here are some key points:
25-
- Global Distribution: Azure Cosmos DB allows you to distribute your data across multiple regions worldwide, ensuring high availability and low latency.
26-
- Multi-Model Support: It supports various data models, including document, graph, key-value, and column-family, making it versatile for different types of applications.
27-
- Scalability: The service offers automatic scaling of throughput and storage, allowing you to handle large amounts of data and high traffic seamlessly.
28-
- Consistency Models: Azure Cosmos DB provides five consistency models (strong, bounded staleness, session, consistent prefix, and eventual), giving you control over the trade-off between consistency and performance.
29-
- Comprehensive Security: It includes features like encryption at rest, network isolation, and compliance with industry standards to ensure data security.
3010

31-
## Introduction to AI Foundry's Evaluation Framework
32-
## Integrating Azure Content Safety
33-
## Ensuring secure and responsible AI practices
34-
#### Overview of the app architecture
35-
- Walkthrough of the codebase
11+
12+
| Block | Role | Key Tech & Patterns |
13+
|----------------------|------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|
14+
| **User Experience** | Captures user speech/text and renders the agent’s reply instantly. | - React + Vite Web UI <br> - WebSocket (realtime conversational UI)<br> - Azure Communication Services (ACS) for phone calls |
15+
| **Frontend Service** | Manages unique session IDs across channels; bridges between users and agents. | - Session management (UUID/callerID as session keys)<br> - HTTPS/WebSocket multiplexing |
16+
| **Agent Service** | Runs a Router Agent for intent detection, orchestrates Domain Agents (Flight, Hotel, etc.), and manages state. | - Semantic Kernel (SK)<br> - Azure OpenAI GPT-4o (realtime streaming)<br> - Multi-agent orchestration<br> - Real-time session manager |
17+
| **Business Data** | Provides durable data sources (knowledge base and transactional DB) for agent tools. | - SQL (transactions)<br> - Vector DB (searchable knowledge) |
18+
| **Session State Store** | Persists session and chat history, enabling stateless horizontal scaling and resilience. | - In-memory (local demo)<br>- Redis (distributed/production) |
19+
20+
**Session Identity:**
21+
Every request—regardless of channel—is tied to a unique session ID (UUID for web, callerId for phone), ensuring context continuity and correct agent matching.
22+
23+
---
24+
25+
## 2. User Experience & Front-end Integration
26+
27+
### 2.1 Web Interface
28+
29+
- **React + Vite SPA:**
30+
- Initiates a new session (`uuid4`) on conversation start.
31+
- Establishes a persistent WebSocket to `/realtime?session_state_key={uuid}` for streaming audio and agent responses.
32+
- Streams mic audio as base64-encoded PCM frames.
33+
- Plays `response.audio.delta` tokens immediately for low-latency voice.
34+
- Renders live transcriptions and any cited “grounding” sources from agent tools.
35+
36+
### 2.2 Phone Interface (Azure Communication Services)
37+
38+
- **ACS Bridge (acs_realtime.py):**
39+
- Listens for incoming phone calls via Event Grid.
40+
- Answers and streams mixed-mono audio to `/realtime` over WebSocket using the callerId as the session key.
41+
- Forwards agent audio responses back to the caller; handles barge-in (interrupt) with StopAudio signaling.
42+
43+
### 2.3 Session Lifecycle
44+
45+
- **Session key is persistent** for the duration of the user interaction (browser tab or phone call).
46+
- **Session ends** cleanly when user disconnects/stops: triggers input_audio_buffer clear and session cleanup.
47+
48+
---
49+
50+
## 3. Agent Service Design
51+
52+
### 3.1 Multi-Agent Orchestration (rtmt.py)
53+
54+
- **Router Agent**
55+
- Lightweight, always-on; performs `detect_intent()` on each new utterance.
56+
- Decides which Domain Agent (e.g., Flight, Hotel) should handle the request.
57+
- **Domain Agents (Flight, Hotel, etc.)**
58+
- Each has:
59+
- A YAML persona prompt (templated with user details).
60+
- Its own SK Kernel instance with specialized tools (function-calling enabled).
61+
- Capabilities for both information lookup and transactional actions.
62+
63+
### 3.2 In-Session Intent Switching
64+
65+
- **Intent Detection** invoked on every transcript.
66+
- **If intent changes** mid-session:
67+
- Flushes partial audio with `input_audio_buffer.clear`.
68+
- Initializes the new Domain Agent’s kernel; preserves
3669

37-
---
38-
#### Navigation: [Home](../../README.md) | [Next Section](../02_setup/README.md)
70+
conversation history for seamless transfer.
71+
- State flags (`transfer_conversation`, `active_response`) ensure only one agent responds at a time, preventing overlap or duplication.
72+
73+
### 3.3 Tool Execution & Grounding
74+
75+
- **Tools** are Python coroutines decorated with `@kernel_function` and exposed to LLMs via function-calling.
76+
- **Example tool functions:**
77+
- `load_user_flight_info`, `check_flight_status` (SQL SELECT)
78+
- `confirm_flight_change`, `confirm_reservation_change` (SQL UPDATE/INSERT)
79+
- `search_*_knowledgebase` (vector search for top-k matching knowledge base chunks)
80+
- **Agent responses are grounded**: Agents must only cite facts retrieved by tools; if the relevant data is missing, agents return “I don’t know.”
81+
82+
### 3.4 Real-Time WebSocket Loop
83+
84+
- **Bidirectional streaming:**
85+
- Client → backend: `input_audio_buffer.append` streams audio for Whisper transcription.
86+
- Backend → client: `response.audio.delta` streams synthesized agent speech token-by-token.
87+
- **Concurrency:**
88+
- Two asyncio tasks (`from_client_to_realtime`, `from_realtime_to_client`) ensure continuous, low-latency (<200ms per token) interactions.
89+
90+
### 3.5 Session & Agent Instance Management
91+
92+
- **Session Dict Structure:**
93+
```python
94+
self.sessions[session_state_key] = {
95+
current_agent, current_kernel, history,
96+
target_agent_name, transfer_conversation,
97+
active_response, realtime_settings,
98+
customer_name, customer_id
99+
}
100+
```
101+
- **Chat history** is truncated to the last `n` turns using `ChatHistoryTruncationReducer` for prompt efficiency.
102+
- **SessionState locks** safeguard concurrent access (e.g., if multiple sockets connect for a single session), ensuring state consistency.
103+
104+
---
105+
106+
## 4. Business Data Layer
107+
108+
| Store | Content & Purpose | Access Pattern |
109+
|------------------|--------------------------------------------------------|-----------------------------------------------------------|
110+
| SQL DB | Flights, reservations, payments, user records, etc. | ACID RW via SQLAlchemy, used by agent tools under SK |
111+
| Vector DB | Policy and FAQ articles, embedded as vector chunks | Top-k cosine similarity search via `search_*_knowledgebase`|
112+
113+
- **Abstraction:**
114+
- Business data access is **black-boxed** behind agent tools.
115+
- Agents issue tool calls—never raw SQL/embedding queries—providing security, data integrity, and clear provenance.
116+
- Enables easy migration between local JSON/SQLite and cloud-scale managed services.
117+
118+
---
119+
120+
## 5. Agent Session Store
121+
122+
- **In-memory cache** for local single-instance runs.
123+
- **Redis-backed store** for distributed deployment, supporting:
124+
- Stateless, non-sticky load-balancing.
125+
- Session persistence and resumption after pod restarts or failures.
126+
- **Stores reduced chat history** (by default, last 3 turns), so any agent instance can resume a session seamlessly.
127+
128+
---
129+
130+
## 6. Scalability & Deployment Guidelines
131+
132+
| Component | Scaling Approach |
133+
|-----------------|----------------------------------------------------------------------------------------------|
134+
| Front-End | Static assets on CDN; unlimited horizontal scale. |
135+
| ACS Bridge | Stateless app; deploy multiple replicas (App Service/Container Apps). |
136+
| Agent Service | Containerize (`rtmt.py`), deploy as multiple pods with Redis for session state. |
137+
| | SK manages WebSocket/GPT-4o connections; throughput scales with Azure OpenAI deployments. |
138+
| Transactional DB| Promote from SQLite (dev) to Azure SQL/PostgreSQL; use read replicas for high-QPS. |
139+
| Vector DB | Move from JSON/SciPy (local) to Azure AI Search, Pinecone, or Qdrant for horizontal scaling. |
140+
| Model Traffic | Multiple GPT-4o/-mini deployments; SK can load-balance transparently. |
141+
142+
- **Design for Seamless Growth:**
143+
The architecture scales gracefully from a single VM demo to a global, multi-region deployment with high-availability and auto-scaling, all without changes to the core code structure.
144+
145+
---
146+
147+
## 7. Key Architectural Strengths
148+
149+
- **Multi-channel:** Supports both web and phone users with unified, context-aware agent service.
150+
- **Real-time & Low Latency:** Uses streaming WebSocket and token-by-token synthesis for sub-200ms response times.
151+
- **Intelligent Multi-Agent Routing:** Intent detection and on-the-fly agent transfer keep interactions efficient and relevant.
152+
- **Enterprise-Grade Data Security:** Agents access business data only through secure, audited tools; never direct queries.
153+
- **Robust State Management:** Redis-powered session store enables stateless scaling and resilience.
154+
- **Modular & Extensible:** New domain agents (Rail, Insurance, etc.) or business tools can be plugged in easily.
155+
156+
---
157+
158+
**This architecture combines the best of cloud-native, real-time interaction, secure AI orchestration, and practical scaling, making it ready for both quick prototyping and robust, enterprise deployments.**

media/logical_architecture.png

151 KB
Loading

0 commit comments

Comments
 (0)