📄 DocChat — Threaded RAG Document Assistant

A production-grade, multi-threaded Retrieval-Augmented Generation (RAG) application that lets you chat with your PDF and text documents using LLaMA 3.3 70B.

🎯 What Is This?

DocChat is a Retrieval-Augmented Generation (RAG) application that goes beyond the typical "upload a PDF and ask questions" pattern. It introduces concurrent multi-threaded document processing, Maximal Marginal Relevance (MMR) retrieval, and a polished dark-themed UI — built to demonstrate real-world engineering practices, not just a tutorial-level prototype.

Upload up to 5 documents → Documents are processed in parallel threads → Ask questions grounded strictly in your documents → Get accurate, hallucination-resistant answers.

🚀 What Makes This Different?

Most PDF chat applications follow a simple, sequential pipeline: load one file, chunk it, embed it, query it. DocChat is engineered differently.

⚡ 1. Multi-Threaded Document Processing

Traditional Approach          DocChat Approach
─────────────────────         ─────────────────────
File 1 → Load → Chunk        File 1 ─┐
File 2 → Load → Chunk        File 2 ─┤→ Parallel Threads
File 3 → Load → Chunk        File 3 ─┤   (concurrent I/O)
     (sequential)             File 4 ─┤
     Total: T1 + T2 + T3     File 5 ─┘
                              Total: max(T1..T5)

Each uploaded document is processed in its own dedicated thread (threading.Thread), eliminating the sequential bottleneck.
A shared threading.Lock ensures thread-safe aggregation of document chunks into a single unified list — no race conditions, no data corruption.
Daemon threads are used so background workers don't prevent graceful application shutdown.
Named threads (loader-{filename}) enable clean debugging and monitoring.

# One thread per file — true concurrent document processing
threads = []
for uf in uploaded_files:
    t = threading.Thread(
        target=_process_single_file,
        args=(uf, all_chunks, errors, lock),
        name=f"loader-{uf.name}",
        daemon=True,
    )
    threads.append(t)

for t in threads:
    t.start()
for t in threads:
    t.join()   # barrier: wait for ALL files before embedding

Why it matters: When processing 5 large PDFs, the sequential approach takes the sum of all processing times. With threading, it takes only as long as the slowest file. For I/O-bound PDF parsing, this is a significant real-world speedup.

🎯 2. Maximal Marginal Relevance (MMR) Retrieval

Unlike basic similarity search (k nearest neighbors), DocChat uses MMR — a retrieval strategy that balances relevance with diversity.

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "fetch_k": 10, "lambda_mult": 0.5}
)

Parameter	Value	Purpose
`search_type`	`"mmr"`	Activates Maximal Marginal Relevance
`k`	`3`	Final number of documents returned
`fetch_k`	`10`	Candidate pool size before diversity filtering
`lambda_mult`	`0.5`	Balance between relevance (1.0) and diversity (0.0)

Why it matters: Basic similarity search often returns near-duplicate chunks from the same paragraph. MMR ensures retrieved context covers different aspects of the query, leading to more comprehensive and accurate answers.

🛡️ 3. Strict Grounding — Zero Hallucinations by Design

The system prompt explicitly constrains the LLM to answer only from the provided context:

If the answer is not present in the context,
say: "I could not find the answer in the document."

This is a deliberate design choice for trustworthiness — the model will never fabricate information that isn't in your documents.

🎨 4. Production-Quality UI

Custom dark theme with DM Serif Display + DM Sans typography
Real-time status indicators (document count pills, processing states)
Session management with full conversation history
Responsive layout with sidebar controls and main chat area

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                    │
│          (Dark Theme · Chat UI · File Upload)           │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────▼──────────────┐
        │   Multi-Threaded Loader     │
        │  ┌────────┐  ┌────────┐     │
        │  │Thread 1│  │Thread 2│ ... │   ← One thread per file
        │  │PDF Load│  │TXT Load│     │
        │  │+ Chunk │  │+ Chunk │     │
        │  └───┬────┘  └───┬────┘     │
        │      └─────┬─────┘          │
        │      Lock-Protected         │
        │      Chunk Aggregation      │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │     Embedding Pipeline      │
        │  sentence-transformers/     │
        │  all-MiniLM-L6-v2           │
        │  (HuggingFace, local)       │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │   ChromaDB Vector Store     │
        │   (In-Memory / Persisted)   │
        │   MMR Retrieval (k=3)       │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │      Groq LLM Engine        │
        │  LLaMA 3.3 70B Versatile    │
        │  (Grounded Prompt Template) │
        └─────────────────────────────┘

🛠️ Tech Stack

Layer	Technology	Purpose
LLM	Groq Cloud + LLaMA 3.3 70B	Ultra-fast inference for document Q&A
Embeddings	HuggingFace `all-MiniLM-L6-v2`	Local, lightweight semantic embeddings
Vector Store	ChromaDB	Persistent vector storage + MMR retrieval
Framework	LangChain	Orchestration of RAG pipeline components
Concurrency	Python `threading`	Parallel document loading & chunking
Frontend	Streamlit	Interactive chat UI with custom theming
Text Splitting	RecursiveCharacterTextSplitter	Context-aware document chunking
Document Loaders	PyPDFLoader, TextLoader	Multi-format document ingestion
Config	python-dotenv	Secure API key management

📁 Project Structure

pdf_chat_threaded_documents/
├── rag_application/
│   ├── app.py                  # 🚀 Main Streamlit app (threaded RAG pipeline)
│   ├── main.py                 # 🖥️  CLI-based RAG interface
│   ├── create-database.py      # 🗄️  Standalone vector DB creation script
│   ├── data/                   # 📂 Sample documents (PDF, TXT)
│   ├── loaders/                # 🔌 Document loader experiments
│   │   ├── pdf.py              #     PyPDF loader
│   │   ├── page.py             #     Web page loader
│   │   ├── loader_exp.py       #     Text file loader
│   │   └── docling_load.py     #     Docling-based loader
│   ├── splitters/              # ✂️  Text splitting strategies
│   │   ├── recursiveSplitter.py#     Recursive character splitting
│   │   ├── splitter.py         #     Character-based splitting
│   │   └── splitter2.py        #     Token-based splitting
│   ├── VECTORSTORE/            # 💾 Vector store experiments
│   │   └── db.py               #     ChromaDB operations
│   └── .gitignore
├── requirements.txt
└── README.md

⚙️ Quick Start

Prerequisites

Python 3.10+
A Groq API Key (free tier available)

1. Clone the Repository

git clone https://github.com/<your-username>/pdf_chat_threaded_documents.git
cd pdf_chat_threaded_documents

2. Create a Virtual Environment

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file inside rag_application/:

GROQ_API_KEY=your_groq_api_key_here

5. Launch the Application

Option A — Streamlit Web UI (Recommended)

cd rag_application
streamlit run app.py

Option B — CLI Interface

# First, create the vector database
cd rag_application
python create-database.py

# Then run the CLI chatbot
python main.py

💡 Usage

Upload — Drag & drop up to 5 PDF or TXT files in the sidebar.
Process — Click ⚡ Process Documents to trigger threaded ingestion.
Chat — Ask questions in the chat input; answers are grounded in your documents.
Clear — Hit 🗑 Clear Session to reset and upload new documents.

🔑 Key Engineering Decisions

Decision	Rationale
Threading over Multiprocessing	Document loading is I/O-bound (file reads, PDF parsing), making threads ideal. Avoids the overhead of process spawning and IPC serialization.
Lock-based synchronization	A single `threading.Lock` protects the shared chunk list — minimal contention since each thread writes once after processing.
Daemon threads	Ensures clean shutdown if the Streamlit process is terminated mid-ingestion.
MMR over Similarity Search	Prevents redundant context chunks, improving answer quality for complex queries.
In-memory ChromaDB for web UI	Each session gets a fresh vectorstore — no stale data leaking between users.
Persisted ChromaDB for CLI	CLI mode persists the database for repeated querying without re-embedding.
`@st.cache_resource`	Embeddings model and LLM are loaded once and shared across reruns — no redundant initialization.

🧪 Experiments Included

The loaders/ and splitters/ directories contain standalone experiments exploring different LangChain components:

Loaders: PyPDF, TextLoader, WebBaseLoader, Docling (for URL-based academic papers)
Splitters: CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter
Vector Store: ChromaDB similarity search vs. retriever patterns

These experiments document the research process and demonstrate familiarity with the broader LangChain ecosystem.

📊 Performance Characteristics

Metric	Value
Document processing	Parallel — scales with file count, not linearly
Embedding model	Local — no API calls, no latency, no cost
LLM inference	Groq — sub-second response times via dedicated LPU hardware
Retrieval strategy	MMR — diverse, non-redundant context selection
Max concurrent files	5 (configurable)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

Built with ❤️ using LangChain, Groq, and Python Threading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 DocChat — Threaded RAG Document Assistant

🎯 What Is This?

🚀 What Makes This Different?

⚡ 1. Multi-Threaded Document Processing

🎯 2. Maximal Marginal Relevance (MMR) Retrieval

🛡️ 3. Strict Grounding — Zero Hallucinations by Design

🎨 4. Production-Quality UI

🏗️ Architecture

🛠️ Tech Stack

📁 Project Structure

⚙️ Quick Start

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Launch the Application

💡 Usage

🔑 Key Engineering Decisions

🧪 Experiments Included

📊 Performance Characteristics

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
VECTORSTORE		VECTORSTORE
data		data
loaders		loaders
splitters		splitters
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
create-database.py		create-database.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

📄 DocChat — Threaded RAG Document Assistant

🎯 What Is This?

🚀 What Makes This Different?

⚡ 1. Multi-Threaded Document Processing

🎯 2. Maximal Marginal Relevance (MMR) Retrieval

🛡️ 3. Strict Grounding — Zero Hallucinations by Design

🎨 4. Production-Quality UI

🏗️ Architecture

🛠️ Tech Stack

📁 Project Structure

⚙️ Quick Start

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Launch the Application

💡 Usage

🔑 Key Engineering Decisions

🧪 Experiments Included

📊 Performance Characteristics

🤝 Contributing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages