ClariGen: Ambiguity Detection and Clarification System

Project Description

Ambiguous user queries pose a significant challenge to Retrieval-Augmented Generation (RAG) systems, often resulting in unclear or irrelevant responses. ClariGen is an end-to-end Large Language Model (LLM)-driven pipeline for interactive ambiguity resolution in Question-Answering (QA) systems.

The system employs a dual-model "fast-slow" architecture: a lightweight instruction-tuned model performs low-latency binary ambiguity detection, while a larger model conducts in-depth ambiguity reasoning using a modified CLAMBER taxonomy and generates targeted clarifying questions. By resolving ambiguity before retrieval, ClariGen ensures that RAG systems retrieve grounded, relevant information.

Approach

ClariGen leverages a tiered LLM approach to balance computational efficiency with reasoning depth:

Fast Path (Detection): A lightweight 8B model (Llama-3.1-8B) acts as a high-throughput gatekeeper. It performs binary ambiguity detection to filter out clear queries in milliseconds, minimizing latency for the majority of traffic.
Reasoning Path (Clarification): Queries flagged as ambiguous are escalated to a 70B model (Llama-3.3-70B). This model handles complex ambiguity diagnosis, generates clarifying questions via Chain-of-Thought (CoT) prompting, and reformulates queries based on user feedback.

Demo

See assets/clarification-module-demo.mp4 for a demo of ClariGen in action.

Results

The system has been evaluated on standard benchmarks and real-world case studies:

Ambiguity Detection: Zero-shot prompting on Llama-3.1-8B proved highly effective, achieving an F1 score of 73.14% on the ClariQ benchmark. Experiments showed that zero-shot approaches outperformed few-shot prompting, which tended to introduce classification bias.
Clarification Generation: The Ambiguity-Type Chain-of-Thought (AT-CoT) strategy achieved the highest semantic similarity to human-generated questions, with a BERTScore F1 of 0.9187 on ClariQ and 0.9093 on QULAC.
Case Studies: In a domain-specific evaluation on fall prevention inquiries for older adults, ClariGen successfully resolved lexical and semantic ambiguities (e.g., clarifying "exercises" vs "home exercises"), leading to significantly more targeted and practical retrieval results compared to baseline RAG responses.

Quick Start

Get ClariGen up and running in three steps:

1. Set Up Environment

# Clone and enter the repo
cp .env.example .env
# Edit .env and set your model URLs and API keys

2. Start Model Servers

ClariGen requires two model servers (port 8368 and 8369 by default).

cd llm_hosting
./serve_models.sh  # Requires vLLM and GPUs

See LLM Hosting for remote setup or alternative hosting.

3. Launch Applications

Run the backend and frontend in separate terminals:

# Terminal A: Start API
uvicorn apps.api.main:app --port 8370

# Terminal B: Start UI
streamlit run apps/frontend/app.py --server.port 8501

Project Structure

Core Library: The heart of ClariGen (Detection & Orchestration).
API Backend: FastAPI service providing programmatic access.
Frontend: Streamlit-based interactive playground.
Model Hosting: Scripts for serving Llama-3 models via vLLM.
Deployment: Docker and SSH tunnel configurations for production-like setups.
Evaluation: Tools to measure system performance on benchmark datasets.

Configuration

Detailed environment settings can be found in ENVIRONMENT.md.

Key Settings:

CLARIFICATION_STRATEGY: at_standard (default), at_cot (advanced), or vanilla.
MAX_CLARIFICATION_ATTEMPTS: Limits the iteration count.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
apps		apps
assets		assets
core		core
deployment		deployment
docs		docs
evaluation		evaluation
llm_hosting		llm_hosting
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ClariGen_A_Multi_Stage_Pipeline_for_Detecting_and_Clarifying_Ambiguous_Queries_2026_Jan.pdf		ClariGen_A_Multi_Stage_Pipeline_for_Detecting_and_Clarifying_Ambiguous_Queries_2026_Jan.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClariGen: Ambiguity Detection and Clarification System

Project Description

Approach

Demo

Results

Quick Start

1. Set Up Environment

2. Start Model Servers

3. Launch Applications

Project Structure

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ClariGen: Ambiguity Detection and Clarification System

Project Description

Approach

Demo

Results

Quick Start

1. Set Up Environment

2. Start Model Servers

3. Launch Applications

Project Structure

Configuration

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages