Ambiguous user queries pose a significant challenge to Retrieval-Augmented Generation (RAG) systems, often resulting in unclear or irrelevant responses. ClariGen is an end-to-end Large Language Model (LLM)-driven pipeline for interactive ambiguity resolution in Question-Answering (QA) systems.
The system employs a dual-model "fast-slow" architecture: a lightweight instruction-tuned model performs low-latency binary ambiguity detection, while a larger model conducts in-depth ambiguity reasoning using a modified CLAMBER taxonomy and generates targeted clarifying questions. By resolving ambiguity before retrieval, ClariGen ensures that RAG systems retrieve grounded, relevant information.
ClariGen leverages a tiered LLM approach to balance computational efficiency with reasoning depth:
- Fast Path (Detection): A lightweight 8B model (Llama-3.1-8B) acts as a high-throughput gatekeeper. It performs binary ambiguity detection to filter out clear queries in milliseconds, minimizing latency for the majority of traffic.
- Reasoning Path (Clarification): Queries flagged as ambiguous are escalated to a 70B model (Llama-3.3-70B). This model handles complex ambiguity diagnosis, generates clarifying questions via Chain-of-Thought (CoT) prompting, and reformulates queries based on user feedback.
See assets/clarification-module-demo.mp4 for a demo of ClariGen in action.
The system has been evaluated on standard benchmarks and real-world case studies:
- Ambiguity Detection: Zero-shot prompting on Llama-3.1-8B proved highly effective, achieving an F1 score of 73.14% on the ClariQ benchmark. Experiments showed that zero-shot approaches outperformed few-shot prompting, which tended to introduce classification bias.
- Clarification Generation: The Ambiguity-Type Chain-of-Thought (AT-CoT) strategy achieved the highest semantic similarity to human-generated questions, with a BERTScore F1 of 0.9187 on ClariQ and 0.9093 on QULAC.
- Case Studies: In a domain-specific evaluation on fall prevention inquiries for older adults, ClariGen successfully resolved lexical and semantic ambiguities (e.g., clarifying "exercises" vs "home exercises"), leading to significantly more targeted and practical retrieval results compared to baseline RAG responses.
Get ClariGen up and running in three steps:
# Clone and enter the repo
cp .env.example .env
# Edit .env and set your model URLs and API keysClariGen requires two model servers (port 8368 and 8369 by default).
cd llm_hosting
./serve_models.sh # Requires vLLM and GPUsSee LLM Hosting for remote setup or alternative hosting.
Run the backend and frontend in separate terminals:
# Terminal A: Start API
uvicorn apps.api.main:app --port 8370
# Terminal B: Start UI
streamlit run apps/frontend/app.py --server.port 8501- Core Library: The heart of ClariGen (Detection & Orchestration).
- API Backend: FastAPI service providing programmatic access.
- Frontend: Streamlit-based interactive playground.
- Model Hosting: Scripts for serving Llama-3 models via vLLM.
- Deployment: Docker and SSH tunnel configurations for production-like setups.
- Evaluation: Tools to measure system performance on benchmark datasets.
Detailed environment settings can be found in ENVIRONMENT.md.
Key Settings:
CLARIFICATION_STRATEGY:at_standard(default),at_cot(advanced), orvanilla.MAX_CLARIFICATION_ATTEMPTS: Limits the iteration count.
Copyright © 2026, Wireless System Research Group (WiSeR), McMaster University. Licensed under CC BY-NC 4.0.