Conversational Analysis Engine (CAE)

Project Goal

The core idea behind this project is to move beyond static, single-turn training data for language models. Real-world conversation is a dynamic, branching tree of possibilities, where each response can lead to a different outcome. Standard instruction-response datasets don't capture this complexity.

The Conversational Analysis Engine (CAE) addresses this by simulating these conversation trees. It explores different response paths to find the most optimal one based on a defined goal. The data generated from these simulations can be used to create rich, RL-friendly datasets that show why a certain conversational path is better than another, which is ideal for training methods like PPO.

Our Approach: Combining LLMs and MCTS

Our approach is to combine the generative power of LLMs with the strategic planning of classical algorithms. Specifically, this project uses a Monte Carlo Tree Search (MCTS) algorithm—similar to the one used by AlphaGo—to explore the vast space of possible conversations.

LLMs are excellent at generating contextually relevant and human-like responses. However, they don't inherently optimize for long-term conversational goals. By pairing an LLM with MCTS, the system can look ahead, simulate future turns, and choose responses that lead to better long-term outcomes, rather than just picking the most probable next response.

A Dual-Purpose System: Chat API + Analysis Engine

CAE is a dual-purpose system. It serves as both a standard, production-ready chat API and a sophisticated analysis engine. This means you can use it for:

Standard Chat: Use it as a regular chat API for sending messages, maintaining conversation history, and managing users.
On-Demand Analysis: At any point in a conversation, you can trigger an MCTS analysis to explore optimal response paths from that state.

This design allows for seamless integration into existing applications. You can start with basic chat functionality and selectively apply sophisticated analysis where it matters most, such as customer support escalations, therapeutic conversations, or educational tutoring.

System Architecture

Here's a visual overview of how all components fit together:

graph TB
    subgraph "Client Layer"
        Client[HTTP Client]
    end
    
    subgraph "API Layer"
        FastAPI[FastAPI Server]
        UserAPI["/users"]
        ChatAPI["/chats"]
        AnalysisAPI["/analysis"]
    end
    
    subgraph "Service Layer"
        ChatService[Chat Service]
        LLMService[LLM Service]
        AnalysisService[Analysis Service]
        
        subgraph "MCTS Components"
            MCTSAlgo[MCTS Algorithm]
            RespGen[Response Generator]
            ConvSim[Conversation Simulator]
            Scorer[Conversation Scorer]
            Analyzer[Conversation Analyzer]
        end
    end
    
    subgraph "Data Layer"
        DB[(PostgreSQL)]
        Users[Users Table]
        Chats[Chats Table]
        Messages[Messages Table]
        Analyses[Analyses Table]
    end
    
    subgraph "External Services"
        LLM[OpenAI API]
    end
    
    Client --> FastAPI
    FastAPI --> UserAPI
    FastAPI --> ChatAPI
    FastAPI --> AnalysisAPI
    
    UserAPI --> DB
    ChatAPI --> ChatService
    AnalysisAPI --> AnalysisService
    
    ChatService --> LLMService
    ChatService --> DB
    
    AnalysisService --> MCTSAlgo
    AnalysisService --> DB
    
    MCTSAlgo --> RespGen
    MCTSAlgo --> ConvSim
    MCTSAlgo --> Scorer
    MCTSAlgo --> Analyzer
    
    RespGen --> LLMService
    ConvSim --> LLMService
    Scorer --> LLMService
    Analyzer --> LLMService
    
    LLMService --> LLM
    
    DB --> Users
    DB --> Chats
    DB --> Messages
    DB --> Analyses
    
    style Client fill:#e1f5fe
    style FastAPI fill:#fff3e0
    style ChatService fill:#f3e5f5
    style AnalysisService fill:#f3e5f5
    style LLMService fill:#e8f5e9
    style MCTSAlgo fill:#fce4ec
    style DB fill:#f5f5f5
    style LLM fill:#e3f2fd

How It Works

The system is built as an async FastAPI application with three core components:

LLM Service Layer: Interfaces with OpenAI-compatible APIs for language generation.
MCTS Algorithm: Explores conversation trees to find optimal response paths.
Analysis Engine: Scores conversations based on emotional intelligence and goal achievement.

MCTS Implementation

Here's how the Monte Carlo Tree Search implementation works:

1. Selection → 2. Expansion → 3. Simulation → 4. Backpropagation

1. Selection Phase

Starting from root conversation nodes (initial response options), the algorithm uses the UCB1 (Upper Confidence Bound) formula to balance exploitation vs exploration:

ucb1 = avg_score + exploration_constant * sqrt(2 * ln(parent_visits) / child_visits)

This ensures we don't just greedily follow the best-known path but also explore promising unknowns.

2. Expansion Phase

When we reach a node that hasn't been fully explored, we generate new response branches using the LLM. The system prompts for diverse responses that serve the conversation goal:

# From response_generator.py
async def generate_expansion_response(
    self,
    messages: list[Message],
    existing_responses: list[str],
    goal: Optional[str],
    max_tokens: int,
) -> Optional[str]:
    # Generates a new response different from existing branches

3. Simulation Phase

For each new branch, the system simulates how the conversation might unfold over several turns. This is a key step, as it allows the engine to evaluate the long-term consequences of a response, rather than just its immediate quality.

# The simulator generates realistic user reactions and conversation continuations
simulation_data = await self.simulator.simulate_conversation(
    extended_messages,
    config["simulation_depth"],
    config.get("goal"),
    config["max_tokens"],
)

4. Backpropagation Phase

Scores from simulations propagate back up the tree, updating statistics for all ancestor nodes. This allows good downstream outcomes to influence earlier decision points.

The Scoring System

Conversations are evaluated on multiple dimensions:

General Metrics:

Clarity: How understandable the response is
Relevance: How well it addresses the context
Engagement: Likelihood to maintain interest
Authenticity: How genuine and natural it feels
Coherence: Logical flow of conversation
Respectfulness: Appropriate tone

Goal-Specific Metrics: When a conversation goal is specified (e.g., "help the user feel better"), additional scoring factors are dynamically generated to measure progress toward that goal.

The Chat API: Production-Ready Conversation Management

Before diving into the analysis capabilities, it's important to understand that CAE provides a complete chat API that can be used independently:

Basic Chat Flow

# 1. Create a user
POST /users/
{"name": "Alice"}
→ {"id": "user-uuid", "name": "Alice", "created_at": "..."}

# 2. Start a conversation
POST /chats/
{
    "user_id": "user-uuid",
    "message": "I'm feeling overwhelmed with my workload"
}
→ [
    {"role": "user", "content": "I'm feeling overwhelmed..."},
    {"role": "assistant", "content": "I understand that feeling..."}
]

# 3. Continue the conversation
POST /chats/
{
    "user_id": "user-uuid",
    "chat_id": "chat-uuid",
    "message": "I don't know where to start"
}
→ [full conversation history with new messages]

The chat service handles:

Session Management: Automatic session creation and persistence
Message History: Full conversation tracking with timestamps
User Context: Each user has their own isolated conversation space
Async Processing: Non-blocking I/O for high throughput
Error Recovery: Graceful handling of LLM failures with retries

API Design

The API follows RESTful principles with three main endpoint groups:

`/users` - User Management

POST /users/ - Create a new user
GET /users/ - List all users
GET /users/{user_id} - Get specific user
DELETE /users/{user_id} - Delete user and all associated data
GET /users/{user_id}/chats - Get user's chat sessions

`/chats` - Conversation Management

POST /chats/ - Send a message (creates session if needed)
GET /chats/{chat_id} - Get chat history
DELETE /chats/{chat_id} - Delete chat session

`/analysis` - MCTS Analysis

POST /analysis/ - Analyze a conversation using MCTS
GET /analysis/{chat_id} - Get all analyses for a chat

The analysis endpoint is where the magic happens. Here's what a typical request looks like:

{
    "chat_id": "uuid-here",
    "conversation_goal": "Help the user process their emotions constructively",
    "num_branches": 5,
    "simulation_depth": 3,
    "mcts_iterations": 10,
    "exploration_constant": 1.414
}

Database Schema

The PostgreSQL database uses async SQLAlchemy with careful relationship modeling:

Users → has many → Chats
Chats → has many → Messages
Chats → has many → Analyses

All relationships use CASCADE deletes for data integrity.

Parallel Processing

One of the key optimizations is aggressive parallelization. During MCTS iterations, all branch evaluations happen concurrently:

# From algorithm.py
tasks = [
    self._expand_and_simulate(base_messages, node, config)
    for _, node in nodes_to_process
]
results = await asyncio.gather(*tasks)

This dramatically speeds up the exploration process, especially when dealing with high-latency LLM calls.

Real-World Use Cases

The combination of chat + analysis opens up powerful applications:

1. Customer Support Optimization

# Regular chat handling
response = await chat_api.send_message(
    "I've been waiting 3 days for my refund!"
)

# Trigger analysis when conversation gets heated
if detect_escalation(conversation):
    analysis = await analyze_conversation(
        chat_id=chat_id,
        conversation_goal="De-escalate and retain customer",
        num_branches=7,
        simulation_depth=4
    )
    # Use the optimal response path

2. Educational Tutoring

Analyze different teaching approaches to find what resonates best with each student's learning style.

3. Mental Health Support

Explore conversation paths that lead to positive emotional outcomes while maintaining therapeutic boundaries.

4. Sales Conversations

Optimize for both customer satisfaction and conversion, finding the balance between being helpful and achieving business goals.

The Tool System Infrastructure

While not fully utilized in the current implementation, CAE includes a sophisticated tool-calling system inherited from the LLM service layer:

class AbstractTool(ABC, BaseModel):
    """Base class for LLM tools"""
    tool_schema: ClassVar[ToolSchema]
    
    @classmethod
    @abstractmethod
    def tool_function(cls) -> Callable:
        """Return the tool's implementation function"""
        pass

This infrastructure allows for:

Dynamic Tool Registration: Tools are automatically discovered via class inheritance
Schema Validation: OpenAI-compatible function schemas
Async Execution: All tools run asynchronously for performance
Error Handling: Graceful degradation when tools fail

Future implementations could add tools for:

Knowledge base queries during conversation simulation
Sentiment analysis for better scoring
External API calls for real-time information
Database lookups for personalization

Performance Characteristics

The system is designed for production scale:

Throughput

Concurrent Users: Async architecture supports hundreds of simultaneous conversations
Analysis Speed: MCTS analysis typically completes in 10-30 seconds for standard configurations
Database Pooling: Connection pooling with 20 connections, 10 overflow

Resource Usage

Memory: ~100-500MB per active MCTS analysis depending on tree size
CPU: Primarily bound by LLM API latency, not local computation
Storage: Conversation history grows linearly; analyses are JSON-compressed

Optimization Strategies

Parallel Branch Evaluation: All MCTS branches evaluate simultaneously
Progressive Pruning: Underperforming branches are pruned every 5 iterations
Caching: (Future) LLM responses could be cached for similar prompts
Batching: (Future) Multiple analyses could share LLM calls

Example Analysis Output

Here's what you get from a conversation analysis:

{
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "chat_id": "chat-uuid",
    "conversation_goal": "Help user feel heard and provide constructive feedback",
    "selected_branch_index": 2,
    "selected_response": "I hear your frustration, and it's completely valid...",
    "analysis": "This response was selected because it achieves the highest balance of empathy (0.92) and constructive guidance (0.88). Unlike branch 1 which was overly sympathetic without offering solutions, or branch 3 which jumped too quickly to advice-giving, this response first validates the user's emotions before gently transitioning to actionable steps. The simulation showed users responding positively in 4 out of 5 scenarios, with improved emotional state and openness to suggestions.",
    "branches": [
        {
            "response": "That sounds really tough...",
            "score": 0.72,
            "visits": 8,
            "simulated_user_reactions": ["Thanks for listening", "But what do I do?"],
            "general_metrics": {
                "empathy": 0.95,
                "constructiveness": 0.45,
                "clarity": 0.80
            }
        },
        // ... more branches
    ],
    "mcts_statistics": {
        "total_iterations": 10,
        "nodes_evaluated": 45,
        "average_depth_explored": 3.2,
        "computation_time_seconds": 18.7
    }
}

Shortfalls and Areas for Improvement

Current Limitations

Embedding Support: The pgvector integration is commented out, meaning we can't do semantic search over conversation histories. This would enable finding similar past conversations to inform current analysis.
Tool System: While the infrastructure for tool-calling is present, it's not fully utilized. Future iterations could allow the assistant to query external knowledge bases or perform actions during simulation.
Pruning Strategy: The current branch pruning is somewhat naive (threshold-based). More sophisticated approaches like progressive widening or RAVE (Rapid Action Value Estimation) could improve efficiency.
Scoring Calibration: The scoring metrics are currently evaluated by the same LLM doing generation. This could lead to self-serving biases. An independent evaluation model would be more robust.
Memory Efficiency: Large conversation trees can consume significant memory. Implementing node recycling or disk-based storage for deep trees would improve scalability.

Future Enhancements

Multi-Agent Simulation: Instead of simulating just user responses, model multiple conversation participants
Adversarial Training: Use MCTS to find conversation paths that break the model, improving robustness
Online Learning: Update scoring functions based on real user feedback
Distributed MCTS: Spread tree exploration across multiple workers for massive parallelization

Getting Started

Prerequisites

Python 3.10+ (for local development)
PostgreSQL 14+ (or use Docker)
Docker and Docker Compose (for containerized deployment)
An OpenAI-compatible LLM API endpoint

Quick Start Scripts

We provide convenient scripts to get you started quickly:

Docker Quick Start: ./scripts/docker-start.sh - Interactive setup with Docker
Local Development: ./scripts/start.sh - Sets up and runs locally with auto-configuration

Quick Start with Docker

The easiest way to get started is using Docker Compose:

Clone the repository:

git clone https://github.com/MVPandey/CAE.git
cd CAE

Copy the example environment file and configure it:

cp env.example .env
# Edit .env with your API keys and configuration

Start the application with Docker Compose:
```
docker-compose up -d
```

The API will be available at http://localhost:8000 with interactive docs at http://localhost:8000/docs.

To stop the application:

docker-compose down

To view logs:

docker-compose logs -f cae

Local Development Setup

For local development without Docker:

Clone the repository:

git clone https://github.com/MVPandey/CAE.git
cd CAE

Run the start script:
```
./scripts/start.sh
```
This script will:
- Check Python version (3.10+ required)
- Create a virtual environment if needed
- Install all dependencies
- Check for .env file (create from template if missing)
- Verify PostgreSQL connection
- Start the FastAPI server with hot-reloading
If you don't have a .env file, the script will create one. Update it with your configuration:
```
cp env.example .env
# Edit .env with your actual values
```

Manual Setup

If you prefer manual setup:

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt --no-deps

Set up your environment variables:

cp env.example .env
# Edit .env with your configuration

Run the application:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Running with VS Code

For debugging with VS Code:

Open the project in VS Code
Go to the Run and Debug view (Ctrl+Shift+D)
Select "Debug FastAPI" from the launch configuration dropdown
Press F5 to start the server with debugging enabled

Docker Build Options

To build just the Docker image:

docker build -t cae:latest .

To run the container manually:

docker run -d \
  --name cae \
  -p 8000:8000 \
  --env-file .env \
  cae:latest

Production Deployment

For production deployment:

Use the production Docker setup:

# Build with production Dockerfile
docker build -f Dockerfile.prod -t cae:prod .

# Or use docker-compose for production
docker-compose -f docker-compose.prod.yml up -d

Production configuration:
- Use strong passwords for database
- Set LOG_LEVEL=WARNING to reduce log verbosity
- Configure proper API rate limiting
- Use environment-specific .env files

With Nginx (recommended):

# Start with nginx profile
docker-compose -f docker-compose.prod.yml --profile with-nginx up -d

Security considerations:
- Always use HTTPS in production (SSL/TLS termination)
- Implement API authentication if needed
- Regular security updates for base images
- Use secrets management for API keys
- Enable database SSL connections
Monitoring:
- Health endpoint: http://your-domain/health
- Application logs in ./logs directory
- Configure log aggregation (ELK, CloudWatch, etc.)
- Set up alerts for errors and high latency
Scaling:
- Horizontal scaling with multiple app containers
- Use connection pooling for database
- Consider caching layer for LLM responses
- Load balancer for multiple instances

Environment Variables

Variable	Description	Default
`LLM_API_KEY`	OpenAI API key	Required
`LLM_API_BASE_URL`	LLM API base URL	`https://api.openai.com/v1`
`LLM_MODEL_NAME`	LLM model to use	`gpt-4`
`EMBEDDING_MODEL_API_KEY`	Embedding API key	Required
`EMBEDDING_MODEL_BASE_URL`	Embedding API base URL	`https://api.openai.com/v1`
`EMBEDDING_MODEL_NAME`	Embedding model	`text-embedding-3-large`
`DB_HOST`	PostgreSQL host	`localhost`
`DB_PORT`	PostgreSQL port	`5432`
`DB_NAME`	Database name	`conversation_analysis`
`DB_USER`	Database user	Required
`DB_SECRET`	Database password	Required
`LOG_LEVEL`	Logging level	`INFO`
`LLM_TIMEOUT_SECONDS`	LLM request timeout	`600`

On startup, the application will automatically create the database tables if they do not already exist.

Conclusion

The Conversational Analysis Engine is a practical tool for both production chat applications and advanced conversational research. It provides the infrastructure to go beyond surface-level responses and analyze the long-term impact of conversational choices.

The primary contribution of this project is the integration of MCTS with a standard chat API, allowing for on-demand, deep analysis of conversation states. This enables the creation of sophisticated, goal-oriented datasets suitable for reinforcement learning. Future work will focus on improving the efficiency of the MCTS algorithm, expanding the tool system, and integrating more advanced scoring models.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
app		app
docs		docs
media		media
migrations		migrations
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.prod		Dockerfile.prod
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
nginx.conf.example		nginx.conf.example
requirements.txt		requirements.txt

MVPandey/CAE

Folders and files

Latest commit

History

Repository files navigation