A modular, production-ready backend framework for building AI chat applications powered by large language models (LLMs), using FastAPI and dependency injection. This framework supports multiple expert types and incorporates the latest techniques for building scalable chatbot applications.
- 🧠 Multiple Expert Types: Support for different chat modes:
- QNA: Basic question-answering chatbot
- RAG: Retrieval-Augmented Generation with document processing
- DEEPRESEARCH: Advanced research assistant with web search capabilities
- 🔄 Flexible LLM Support: Multiple LLM providers (OpenAI, Azure OpenAI, LlamaCpp, Vertex AI)
- 🎯 Streaming Responses: Real-time streaming chat using Server-Sent Events (SSE)
- 📚 Document Processing: Built-in PDF processing and vector storage for RAG
- 🗄️ Multiple Memory Backends: In-memory, MongoDB, and SQL storage options
- 🔄 Conversation History: Persistent conversation management with user sessions
- 🧹 Memory Operations: Clear, retrieve, and manage conversation history
- 🚀 FastAPI Integration: Modern, async API with automatic OpenAPI documentation
- 🔌 Dependency Injection: Clean component management using the Injector pattern
- ⚡ Async Support: Full asynchronous operation for high-performance applications
- 🧪 Extensible Tool System: Easy integration of custom tools (web search, etc.)
- 📝 Comprehensive Logging: Detailed logging with configurable levels
- 🔒 Error Handling: Robust error management with custom middleware
- 🧪 Testing Infrastructure: Complete test suite with async support
- 🐳 Docker Support: Containerized deployment ready
- 🖥️ Multiple CLI Modes: Interactive command-line interfaces for different bot types
- 🎨 Colorized Output: Beautiful terminal interface with colored responses
- 📊 Streaming CLI: Real-time token-by-token response streaming
- Python 3.12+
- MongoDB (optional, for persistent memory)
- uv - Fast Python package installer and resolver
-
Clone the repository:
git clone <repository-url> cd backend
-
Install dependencies using uv:
# Create and activate virtual environment uv venv source .venv/bin/activate # On Unix/macOS # or .venv\Scripts\activate # On Windows # Install dependencies uv pip install -r requirements.txt
-
Set up environment variables:
# Core Configuration MODEL_TYPE=OPENAI # Options: OPENAI, LLAMA, AZUREOPENAI, VERTEX # OpenAI Configuration OPENAI_API_KEY=your_openai_key BASE_MODEL_NAME=gpt-3.5-turbo # Azure OpenAI Configuration (if using Azure) AZURE_CHAT_MODEL_KEY=your_azure_key AZURE_CHAT_MODEL_VERSION=2024-02-15-preview AZURE_CHAT_MODEL_ENDPOINT=your_endpoint AZURE_CHAT_MODEL_DEPLOYMENT=your_deployment # MongoDB Configuration (optional) MONGO_URI=mongodb://localhost:27017/chatbot MONGO_DATABASE=langchain_bot MONGO_COLLECTION=chatbot # Vector Database Configuration VECTOR_DATABASE_TYPE=CHROMA VECTOR_DATABASE_CHROMA_PATH=./chroma_db # Embedding Configuration (for RAG) EMBEDDING_TYPE=AZUREOPENAI AZURE_EMBEDDING_MODEL_KEY=your_azure_key AZURE_EMBEDDING_MODEL_ENDPOINT=your_endpoint AZURE_EMBEDDING_MODEL_DEPLOYMENT=your_deployment AZURE_EMBEDDING_MODEL_VERSION=2024-02-15-preview # Tavily Search (for DEEPRESEARCH) TAVILY_API_KEY=your_tavily_key # Server Configuration PORT=8080 HOST=0.0.0.0 LOG_LEVEL=INFO
Start the FastAPI server:
python app.py
The API will be available at http://localhost:8080 with automatic documentation at http://localhost:8080/docs
The framework provides a unified CLI with three specialized bot modes:
QNA Bot - Basic question-answering:
# Basic usage
python cli.py --mode qna
# With custom model and streaming
python cli.py --mode qna --model azureopenai --stream
RAG Bot - Document-aware chat:
# Basic RAG chat
python cli.py --mode rag
# Process a document and chat about it
python cli.py --mode rag --document path/to/document.pdf --stream
# With custom model
python cli.py --mode rag --model openai --conversation-id doc_session
DeepResearch Bot - Advanced research assistant:
# Research mode with web search capabilities
python cli.py --mode deepresearch
# With streaming and custom model
python cli.py --mode deepresearch --model azureopenai --stream
All CLI modes support:
exit
orquit
: Exit the chatclear
: Clear conversation historyCtrl+C
: Force exit--stream
: Enable real-time token streaming--model
: Choose LLM provider (openai, llama, azureopenai)--conversation-id
: Set custom session ID
RAG-specific features:
--document
: Process and index documents (PDF, TXT, etc.)- Document question-answering based on uploaded content
DeepResearch-specific features:
- Web search integration for current information
- Advanced reasoning and research capabilities
The REST API provides the following endpoints:
POST /api/v1/chat/message
- Send a message and get responsePOST /api/v1/chat/stream
- Stream chat responses (SSE)DELETE /api/v1/chat/clear
- Clear conversation history
GET /api/v1/experts/available
- List available expert typesPOST /api/v1/experts/switch
- Switch between expert types
POST /api/v1/rag/upload
- Upload and process documentsGET /api/v1/rag/documents
- List processed documentsDELETE /api/v1/rag/documents/{doc_id}
- Delete a document
GET /api/v1/health
- Health check endpointGET /
- API documentation redirect
backend/
├── api/ # FastAPI application layer
│ ├── app.py # FastAPI app factory and configuration
│ ├── middleware/ # Custom middleware (error handling)
│ └── v1/ # API v1 endpoints
│ ├── chat.py # Chat endpoints
│ ├── experts.py # Expert management
│ ├── rag.py # RAG operations
│ ├── health.py # Health checks
│ └── models.py # API request/response models
├── src/ # Core application logic
│ ├── base/ # Base components and factories
│ │ ├── brains/ # LLM brain implementations
│ │ ├── components/ # Core components (LLMs, embeddings, etc.)
│ │ └── memories/ # Memory backend implementations
│ ├── experts/ # Expert implementations
│ │ ├── qna/ # Basic Q&A expert
│ │ ├── rag_bot/ # RAG-powered expert
│ │ └── deepresearch_bot/ # Research expert with web search
│ ├── common/ # Shared utilities
│ │ ├── config.py # Configuration management
│ │ ├── logging.py # Logging setup
│ │ ├── exceptions.py # Custom exceptions
│ │ └── schemas.py # Data models
│ ├── chat_engine.py # Main chat engine
│ └── config_injector.py # Dependency injection setup
├── clis/ # Command line interfaces
│ ├── cli.py # QNA bot CLI
│ ├── rag_cli.py # RAG bot CLI
│ └── deepresearch_cli.py # DeepResearch bot CLI
├── tests/ # Test suite
├── docs/ # Documentation
├── app.py # Main application entry point
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── Dockerfile # Docker configuration
└── Makefile # Build and deployment scripts
The framework follows Clean Architecture principles with clear separation of concerns:
- Chat Engine: Orchestrates conversations between users and experts
- Experts: Specialized AI assistants (QNA, RAG, DeepResearch)
- Brains: LLM abstraction layer supporting multiple providers
- Memory: Conversation storage with pluggable backends
- Tools: External capabilities (web search, document processing)
The framework uses the Injector pattern for clean dependency management:
- Configuration-driven component initialization
- Easy testing with mock dependencies
- Modular and extensible architecture
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run integration tests
pytest -m integration
# Run linting
flake8 .
# Type checking
mypy src/
# Clean up build files
make clean
# Build Docker image
make docker-build
# Run Docker container
make docker-run
# Docker Compose (if available)
docker-compose up
The application uses environment variables for configuration. Key settings:
MODEL_TYPE
: LLM provider (OPENAI, AZUREOPENAI, LLAMA, VERTEX)OPENAI_API_KEY
or equivalent provider keys
MONGO_URI
: MongoDB connection for persistent memoryVECTOR_DATABASE_TYPE
: Vector database type (CHROMA, INMEMORY)TAVILY_API_KEY
: Web search API key for DeepResearchLOG_LEVEL
: Logging verbosity (DEBUG, INFO, WARNING, ERROR)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Make your changes following the existing code style
- Add tests for new functionality
- Ensure all tests pass (
pytest
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Follow Clean Architecture principles
- Write comprehensive tests
- Use type hints throughout
- Follow existing code style and patterns
- Update documentation for new features
This project is licensed under the MIT License - see the LICENSE file for details.
📚 Detailed documentation is available in the docs/ folder:
- API Documentation - Complete API reference and examples
- Folder Structure - Project organization and architecture
- Expert Switching - Guide to switching between bot types