The core idea behind this project is to move beyond static, single-turn training data for language models. Real-world conversation is a dynamic, branching tree of possibilities, where each response can lead to a different outcome. Standard instruction-response datasets don't capture this complexity.
The Conversational Analysis Engine (CAE) addresses this by simulating these conversation trees. It explores different response paths to find the most optimal one based on a defined goal. The data generated from these simulations can be used to create rich, RL-friendly datasets that show why a certain conversational path is better than another, which is ideal for training methods like PPO.
Our approach is to combine the generative power of LLMs with the strategic planning of classical algorithms. Specifically, this project uses a Monte Carlo Tree Search (MCTS) algorithm—similar to the one used by AlphaGo—to explore the vast space of possible conversations.
LLMs are excellent at generating contextually relevant and human-like responses. However, they don't inherently optimize for long-term conversational goals. By pairing an LLM with MCTS, the system can look ahead, simulate future turns, and choose responses that lead to better long-term outcomes, rather than just picking the most probable next response.
CAE is a dual-purpose system. It serves as both a standard, production-ready chat API and a sophisticated analysis engine. This means you can use it for:
- Standard Chat: Use it as a regular chat API for sending messages, maintaining conversation history, and managing users.
- On-Demand Analysis: At any point in a conversation, you can trigger an MCTS analysis to explore optimal response paths from that state.
This design allows for seamless integration into existing applications. You can start with basic chat functionality and selectively apply sophisticated analysis where it matters most, such as customer support escalations, therapeutic conversations, or educational tutoring.
Here's a visual overview of how all components fit together:
graph TB
subgraph "Client Layer"
Client[HTTP Client]
end
subgraph "API Layer"
FastAPI[FastAPI Server]
UserAPI["/users"]
ChatAPI["/chats"]
AnalysisAPI["/analysis"]
end
subgraph "Service Layer"
ChatService[Chat Service]
LLMService[LLM Service]
AnalysisService[Analysis Service]
subgraph "MCTS Components"
MCTSAlgo[MCTS Algorithm]
RespGen[Response Generator]
ConvSim[Conversation Simulator]
Scorer[Conversation Scorer]
Analyzer[Conversation Analyzer]
end
end
subgraph "Data Layer"
DB[(PostgreSQL)]
Users[Users Table]
Chats[Chats Table]
Messages[Messages Table]
Analyses[Analyses Table]
end
subgraph "External Services"
LLM[OpenAI API]
end
Client --> FastAPI
FastAPI --> UserAPI
FastAPI --> ChatAPI
FastAPI --> AnalysisAPI
UserAPI --> DB
ChatAPI --> ChatService
AnalysisAPI --> AnalysisService
ChatService --> LLMService
ChatService --> DB
AnalysisService --> MCTSAlgo
AnalysisService --> DB
MCTSAlgo --> RespGen
MCTSAlgo --> ConvSim
MCTSAlgo --> Scorer
MCTSAlgo --> Analyzer
RespGen --> LLMService
ConvSim --> LLMService
Scorer --> LLMService
Analyzer --> LLMService
LLMService --> LLM
DB --> Users
DB --> Chats
DB --> Messages
DB --> Analyses
style Client fill:#e1f5fe
style FastAPI fill:#fff3e0
style ChatService fill:#f3e5f5
style AnalysisService fill:#f3e5f5
style LLMService fill:#e8f5e9
style MCTSAlgo fill:#fce4ec
style DB fill:#f5f5f5
style LLM fill:#e3f2fd
The system is built as an async FastAPI application with three core components:
- LLM Service Layer: Interfaces with OpenAI-compatible APIs for language generation.
- MCTS Algorithm: Explores conversation trees to find optimal response paths.
- Analysis Engine: Scores conversations based on emotional intelligence and goal achievement.
Here's how the Monte Carlo Tree Search implementation works:
1. Selection → 2. Expansion → 3. Simulation → 4. Backpropagation
Starting from root conversation nodes (initial response options), the algorithm uses the UCB1 (Upper Confidence Bound) formula to balance exploitation vs exploration:
ucb1 = avg_score + exploration_constant * sqrt(2 * ln(parent_visits) / child_visits)
This ensures we don't just greedily follow the best-known path but also explore promising unknowns.
When we reach a node that hasn't been fully explored, we generate new response branches using the LLM. The system prompts for diverse responses that serve the conversation goal:
# From response_generator.py
async def generate_expansion_response(
self,
messages: list[Message],
existing_responses: list[str],
goal: Optional[str],
max_tokens: int,
) -> Optional[str]:
# Generates a new response different from existing branches
For each new branch, the system simulates how the conversation might unfold over several turns. This is a key step, as it allows the engine to evaluate the long-term consequences of a response, rather than just its immediate quality.
# The simulator generates realistic user reactions and conversation continuations
simulation_data = await self.simulator.simulate_conversation(
extended_messages,
config["simulation_depth"],
config.get("goal"),
config["max_tokens"],
)
Scores from simulations propagate back up the tree, updating statistics for all ancestor nodes. This allows good downstream outcomes to influence earlier decision points.
Conversations are evaluated on multiple dimensions:
General Metrics:
- Clarity: How understandable the response is
- Relevance: How well it addresses the context
- Engagement: Likelihood to maintain interest
- Authenticity: How genuine and natural it feels
- Coherence: Logical flow of conversation
- Respectfulness: Appropriate tone
Goal-Specific Metrics: When a conversation goal is specified (e.g., "help the user feel better"), additional scoring factors are dynamically generated to measure progress toward that goal.
Before diving into the analysis capabilities, it's important to understand that CAE provides a complete chat API that can be used independently:
# 1. Create a user
POST /users/
{"name": "Alice"}
→ {"id": "user-uuid", "name": "Alice", "created_at": "..."}
# 2. Start a conversation
POST /chats/
{
"user_id": "user-uuid",
"message": "I'm feeling overwhelmed with my workload"
}
→ [
{"role": "user", "content": "I'm feeling overwhelmed..."},
{"role": "assistant", "content": "I understand that feeling..."}
]
# 3. Continue the conversation
POST /chats/
{
"user_id": "user-uuid",
"chat_id": "chat-uuid",
"message": "I don't know where to start"
}
→ [full conversation history with new messages]
The chat service handles:
- Session Management: Automatic session creation and persistence
- Message History: Full conversation tracking with timestamps
- User Context: Each user has their own isolated conversation space
- Async Processing: Non-blocking I/O for high throughput
- Error Recovery: Graceful handling of LLM failures with retries
The API follows RESTful principles with three main endpoint groups:
POST /users/
- Create a new userGET /users/
- List all usersGET /users/{user_id}
- Get specific userDELETE /users/{user_id}
- Delete user and all associated dataGET /users/{user_id}/chats
- Get user's chat sessions
POST /chats/
- Send a message (creates session if needed)GET /chats/{chat_id}
- Get chat historyDELETE /chats/{chat_id}
- Delete chat session
POST /analysis/
- Analyze a conversation using MCTSGET /analysis/{chat_id}
- Get all analyses for a chat
The analysis endpoint is where the magic happens. Here's what a typical request looks like:
{
"chat_id": "uuid-here",
"conversation_goal": "Help the user process their emotions constructively",
"num_branches": 5,
"simulation_depth": 3,
"mcts_iterations": 10,
"exploration_constant": 1.414
}
The PostgreSQL database uses async SQLAlchemy with careful relationship modeling:
- Users → has many → Chats
- Chats → has many → Messages
- Chats → has many → Analyses
All relationships use CASCADE deletes for data integrity.
One of the key optimizations is aggressive parallelization. During MCTS iterations, all branch evaluations happen concurrently:
# From algorithm.py
tasks = [
self._expand_and_simulate(base_messages, node, config)
for _, node in nodes_to_process
]
results = await asyncio.gather(*tasks)
This dramatically speeds up the exploration process, especially when dealing with high-latency LLM calls.
The combination of chat + analysis opens up powerful applications:
# Regular chat handling
response = await chat_api.send_message(
"I've been waiting 3 days for my refund!"
)
# Trigger analysis when conversation gets heated
if detect_escalation(conversation):
analysis = await analyze_conversation(
chat_id=chat_id,
conversation_goal="De-escalate and retain customer",
num_branches=7,
simulation_depth=4
)
# Use the optimal response path
Analyze different teaching approaches to find what resonates best with each student's learning style.
Explore conversation paths that lead to positive emotional outcomes while maintaining therapeutic boundaries.
Optimize for both customer satisfaction and conversion, finding the balance between being helpful and achieving business goals.
While not fully utilized in the current implementation, CAE includes a sophisticated tool-calling system inherited from the LLM service layer:
class AbstractTool(ABC, BaseModel):
"""Base class for LLM tools"""
tool_schema: ClassVar[ToolSchema]
@classmethod
@abstractmethod
def tool_function(cls) -> Callable:
"""Return the tool's implementation function"""
pass
This infrastructure allows for:
- Dynamic Tool Registration: Tools are automatically discovered via class inheritance
- Schema Validation: OpenAI-compatible function schemas
- Async Execution: All tools run asynchronously for performance
- Error Handling: Graceful degradation when tools fail
Future implementations could add tools for:
- Knowledge base queries during conversation simulation
- Sentiment analysis for better scoring
- External API calls for real-time information
- Database lookups for personalization
The system is designed for production scale:
- Concurrent Users: Async architecture supports hundreds of simultaneous conversations
- Analysis Speed: MCTS analysis typically completes in 10-30 seconds for standard configurations
- Database Pooling: Connection pooling with 20 connections, 10 overflow
- Memory: ~100-500MB per active MCTS analysis depending on tree size
- CPU: Primarily bound by LLM API latency, not local computation
- Storage: Conversation history grows linearly; analyses are JSON-compressed
- Parallel Branch Evaluation: All MCTS branches evaluate simultaneously
- Progressive Pruning: Underperforming branches are pruned every 5 iterations
- Caching: (Future) LLM responses could be cached for similar prompts
- Batching: (Future) Multiple analyses could share LLM calls
Here's what you get from a conversation analysis:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"chat_id": "chat-uuid",
"conversation_goal": "Help user feel heard and provide constructive feedback",
"selected_branch_index": 2,
"selected_response": "I hear your frustration, and it's completely valid...",
"analysis": "This response was selected because it achieves the highest balance of empathy (0.92) and constructive guidance (0.88). Unlike branch 1 which was overly sympathetic without offering solutions, or branch 3 which jumped too quickly to advice-giving, this response first validates the user's emotions before gently transitioning to actionable steps. The simulation showed users responding positively in 4 out of 5 scenarios, with improved emotional state and openness to suggestions.",
"branches": [
{
"response": "That sounds really tough...",
"score": 0.72,
"visits": 8,
"simulated_user_reactions": ["Thanks for listening", "But what do I do?"],
"general_metrics": {
"empathy": 0.95,
"constructiveness": 0.45,
"clarity": 0.80
}
},
// ... more branches
],
"mcts_statistics": {
"total_iterations": 10,
"nodes_evaluated": 45,
"average_depth_explored": 3.2,
"computation_time_seconds": 18.7
}
}
-
Embedding Support: The pgvector integration is commented out, meaning we can't do semantic search over conversation histories. This would enable finding similar past conversations to inform current analysis.
-
Tool System: While the infrastructure for tool-calling is present, it's not fully utilized. Future iterations could allow the assistant to query external knowledge bases or perform actions during simulation.
-
Pruning Strategy: The current branch pruning is somewhat naive (threshold-based). More sophisticated approaches like progressive widening or RAVE (Rapid Action Value Estimation) could improve efficiency.
-
Scoring Calibration: The scoring metrics are currently evaluated by the same LLM doing generation. This could lead to self-serving biases. An independent evaluation model would be more robust.
-
Memory Efficiency: Large conversation trees can consume significant memory. Implementing node recycling or disk-based storage for deep trees would improve scalability.
- Multi-Agent Simulation: Instead of simulating just user responses, model multiple conversation participants
- Adversarial Training: Use MCTS to find conversation paths that break the model, improving robustness
- Online Learning: Update scoring functions based on real user feedback
- Distributed MCTS: Spread tree exploration across multiple workers for massive parallelization
- Python 3.10+ (for local development)
- PostgreSQL 14+ (or use Docker)
- Docker and Docker Compose (for containerized deployment)
- An OpenAI-compatible LLM API endpoint
We provide convenient scripts to get you started quickly:
- Docker Quick Start:
./scripts/docker-start.sh
- Interactive setup with Docker - Local Development:
./scripts/start.sh
- Sets up and runs locally with auto-configuration
The easiest way to get started is using Docker Compose:
-
Clone the repository:
git clone https://github.com/MVPandey/CAE.git cd CAE
-
Copy the example environment file and configure it:
cp env.example .env # Edit .env with your API keys and configuration
-
Start the application with Docker Compose:
docker-compose up -d
The API will be available at http://localhost:8000
with interactive docs at http://localhost:8000/docs
.
To stop the application:
docker-compose down
To view logs:
docker-compose logs -f cae
For local development without Docker:
-
Clone the repository:
git clone https://github.com/MVPandey/CAE.git cd CAE
-
Run the start script:
./scripts/start.sh
This script will:
- Check Python version (3.10+ required)
- Create a virtual environment if needed
- Install all dependencies
- Check for
.env
file (create from template if missing) - Verify PostgreSQL connection
- Start the FastAPI server with hot-reloading
-
If you don't have a
.env
file, the script will create one. Update it with your configuration:cp env.example .env # Edit .env with your actual values
If you prefer manual setup:
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt --no-deps
-
Set up your environment variables:
cp env.example .env # Edit .env with your configuration
-
Run the application:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
For debugging with VS Code:
- Open the project in VS Code
- Go to the Run and Debug view (Ctrl+Shift+D)
- Select "Debug FastAPI" from the launch configuration dropdown
- Press F5 to start the server with debugging enabled
To build just the Docker image:
docker build -t cae:latest .
To run the container manually:
docker run -d \
--name cae \
-p 8000:8000 \
--env-file .env \
cae:latest
For production deployment:
-
Use the production Docker setup:
# Build with production Dockerfile docker build -f Dockerfile.prod -t cae:prod . # Or use docker-compose for production docker-compose -f docker-compose.prod.yml up -d
-
Production configuration:
- Use strong passwords for database
- Set
LOG_LEVEL=WARNING
to reduce log verbosity - Configure proper API rate limiting
- Use environment-specific
.env
files
-
With Nginx (recommended):
# Start with nginx profile docker-compose -f docker-compose.prod.yml --profile with-nginx up -d
-
Security considerations:
- Always use HTTPS in production (SSL/TLS termination)
- Implement API authentication if needed
- Regular security updates for base images
- Use secrets management for API keys
- Enable database SSL connections
-
Monitoring:
- Health endpoint:
http://your-domain/health
- Application logs in
./logs
directory - Configure log aggregation (ELK, CloudWatch, etc.)
- Set up alerts for errors and high latency
- Health endpoint:
-
Scaling:
- Horizontal scaling with multiple app containers
- Use connection pooling for database
- Consider caching layer for LLM responses
- Load balancer for multiple instances
Variable | Description | Default |
---|---|---|
LLM_API_KEY |
OpenAI API key | Required |
LLM_API_BASE_URL |
LLM API base URL | https://api.openai.com/v1 |
LLM_MODEL_NAME |
LLM model to use | gpt-4 |
EMBEDDING_MODEL_API_KEY |
Embedding API key | Required |
EMBEDDING_MODEL_BASE_URL |
Embedding API base URL | https://api.openai.com/v1 |
EMBEDDING_MODEL_NAME |
Embedding model | text-embedding-3-large |
DB_HOST |
PostgreSQL host | localhost |
DB_PORT |
PostgreSQL port | 5432 |
DB_NAME |
Database name | conversation_analysis |
DB_USER |
Database user | Required |
DB_SECRET |
Database password | Required |
LOG_LEVEL |
Logging level | INFO |
LLM_TIMEOUT_SECONDS |
LLM request timeout | 600 |
On startup, the application will automatically create the database tables if they do not already exist.
The Conversational Analysis Engine is a practical tool for both production chat applications and advanced conversational research. It provides the infrastructure to go beyond surface-level responses and analyze the long-term impact of conversational choices.
The primary contribution of this project is the integration of MCTS with a standard chat API, allowing for on-demand, deep analysis of conversation states. This enables the creation of sophisticated, goal-oriented datasets suitable for reinforcement learning. Future work will focus on improving the efficiency of the MCTS algorithm, expanding the tool system, and integrating more advanced scoring models.