Skip to content

unthinkingFool/chat_pdf_threaded_application

Repository files navigation

📄 DocChat — Threaded RAG Document Assistant

A production-grade, multi-threaded Retrieval-Augmented Generation (RAG) application that lets you chat with your PDF and text documents using LLaMA 3.3 70B.

Python LangChain Streamlit Groq ChromaDB HuggingFace


Status License PRs Welcome

🎯 What Is This?

DocChat is a Retrieval-Augmented Generation (RAG) application that goes beyond the typical "upload a PDF and ask questions" pattern. It introduces concurrent multi-threaded document processing, Maximal Marginal Relevance (MMR) retrieval, and a polished dark-themed UI — built to demonstrate real-world engineering practices, not just a tutorial-level prototype.

Upload up to 5 documents → Documents are processed in parallel threads → Ask questions grounded strictly in your documents → Get accurate, hallucination-resistant answers.


🚀 What Makes This Different?

Most PDF chat applications follow a simple, sequential pipeline: load one file, chunk it, embed it, query it. DocChat is engineered differently.

⚡ 1. Multi-Threaded Document Processing

Traditional Approach          DocChat Approach
─────────────────────         ─────────────────────
File 1 → Load → Chunk        File 1 ─┐
File 2 → Load → Chunk        File 2 ─┤→ Parallel Threads
File 3 → Load → Chunk        File 3 ─┤   (concurrent I/O)
     (sequential)             File 4 ─┤
     Total: T1 + T2 + T3     File 5 ─┘
                              Total: max(T1..T5)
  • Each uploaded document is processed in its own dedicated thread (threading.Thread), eliminating the sequential bottleneck.
  • A shared threading.Lock ensures thread-safe aggregation of document chunks into a single unified list — no race conditions, no data corruption.
  • Daemon threads are used so background workers don't prevent graceful application shutdown.
  • Named threads (loader-{filename}) enable clean debugging and monitoring.
# One thread per file — true concurrent document processing
threads = []
for uf in uploaded_files:
    t = threading.Thread(
        target=_process_single_file,
        args=(uf, all_chunks, errors, lock),
        name=f"loader-{uf.name}",
        daemon=True,
    )
    threads.append(t)

for t in threads:
    t.start()
for t in threads:
    t.join()   # barrier: wait for ALL files before embedding

Why it matters: When processing 5 large PDFs, the sequential approach takes the sum of all processing times. With threading, it takes only as long as the slowest file. For I/O-bound PDF parsing, this is a significant real-world speedup.

🎯 2. Maximal Marginal Relevance (MMR) Retrieval

Unlike basic similarity search (k nearest neighbors), DocChat uses MMR — a retrieval strategy that balances relevance with diversity.

retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "fetch_k": 10, "lambda_mult": 0.5}
)
Parameter Value Purpose
search_type "mmr" Activates Maximal Marginal Relevance
k 3 Final number of documents returned
fetch_k 10 Candidate pool size before diversity filtering
lambda_mult 0.5 Balance between relevance (1.0) and diversity (0.0)

Why it matters: Basic similarity search often returns near-duplicate chunks from the same paragraph. MMR ensures retrieved context covers different aspects of the query, leading to more comprehensive and accurate answers.

🛡️ 3. Strict Grounding — Zero Hallucinations by Design

The system prompt explicitly constrains the LLM to answer only from the provided context:

If the answer is not present in the context,
say: "I could not find the answer in the document."

This is a deliberate design choice for trustworthiness — the model will never fabricate information that isn't in your documents.

🎨 4. Production-Quality UI

  • Custom dark theme with DM Serif Display + DM Sans typography
  • Real-time status indicators (document count pills, processing states)
  • Session management with full conversation history
  • Responsive layout with sidebar controls and main chat area

🏗️ Architecture

┌─────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                    │
│          (Dark Theme · Chat UI · File Upload)           │
└──────────────────────┬──────────────────────────────────┘
                       │
        ┌──────────────▼──────────────┐
        │   Multi-Threaded Loader     │
        │  ┌────────┐  ┌────────┐     │
        │  │Thread 1│  │Thread 2│ ... │   ← One thread per file
        │  │PDF Load│  │TXT Load│     │
        │  │+ Chunk │  │+ Chunk │     │
        │  └───┬────┘  └───┬────┘     │
        │      └─────┬─────┘          │
        │      Lock-Protected         │
        │      Chunk Aggregation      │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │     Embedding Pipeline      │
        │  sentence-transformers/     │
        │  all-MiniLM-L6-v2           │
        │  (HuggingFace, local)       │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │   ChromaDB Vector Store     │
        │   (In-Memory / Persisted)   │
        │   MMR Retrieval (k=3)       │
        └──────────────┬──────────────┘
                       │
        ┌──────────────▼──────────────┐
        │      Groq LLM Engine        │
        │  LLaMA 3.3 70B Versatile    │
        │  (Grounded Prompt Template) │
        └─────────────────────────────┘

🛠️ Tech Stack

Layer Technology Purpose
LLM Groq Cloud + LLaMA 3.3 70B Ultra-fast inference for document Q&A
Embeddings HuggingFace all-MiniLM-L6-v2 Local, lightweight semantic embeddings
Vector Store ChromaDB Persistent vector storage + MMR retrieval
Framework LangChain Orchestration of RAG pipeline components
Concurrency Python threading Parallel document loading & chunking
Frontend Streamlit Interactive chat UI with custom theming
Text Splitting RecursiveCharacterTextSplitter Context-aware document chunking
Document Loaders PyPDFLoader, TextLoader Multi-format document ingestion
Config python-dotenv Secure API key management

📁 Project Structure

pdf_chat_threaded_documents/
├── rag_application/
│   ├── app.py                  # 🚀 Main Streamlit app (threaded RAG pipeline)
│   ├── main.py                 # 🖥️  CLI-based RAG interface
│   ├── create-database.py      # 🗄️  Standalone vector DB creation script
│   ├── data/                   # 📂 Sample documents (PDF, TXT)
│   ├── loaders/                # 🔌 Document loader experiments
│   │   ├── pdf.py              #     PyPDF loader
│   │   ├── page.py             #     Web page loader
│   │   ├── loader_exp.py       #     Text file loader
│   │   └── docling_load.py     #     Docling-based loader
│   ├── splitters/              # ✂️  Text splitting strategies
│   │   ├── recursiveSplitter.py#     Recursive character splitting
│   │   ├── splitter.py         #     Character-based splitting
│   │   └── splitter2.py        #     Token-based splitting
│   ├── VECTORSTORE/            # 💾 Vector store experiments
│   │   └── db.py               #     ChromaDB operations
│   └── .gitignore
├── requirements.txt
└── README.md

⚙️ Quick Start

Prerequisites

1. Clone the Repository

git clone https://github.com/<your-username>/pdf_chat_threaded_documents.git
cd pdf_chat_threaded_documents

2. Create a Virtual Environment

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment Variables

Create a .env file inside rag_application/:

GROQ_API_KEY=your_groq_api_key_here

5. Launch the Application

Option A — Streamlit Web UI (Recommended)

cd rag_application
streamlit run app.py

Option B — CLI Interface

# First, create the vector database
cd rag_application
python create-database.py

# Then run the CLI chatbot
python main.py

💡 Usage

  1. Upload — Drag & drop up to 5 PDF or TXT files in the sidebar.
  2. Process — Click ⚡ Process Documents to trigger threaded ingestion.
  3. Chat — Ask questions in the chat input; answers are grounded in your documents.
  4. Clear — Hit 🗑 Clear Session to reset and upload new documents.

🔑 Key Engineering Decisions

Decision Rationale
Threading over Multiprocessing Document loading is I/O-bound (file reads, PDF parsing), making threads ideal. Avoids the overhead of process spawning and IPC serialization.
Lock-based synchronization A single threading.Lock protects the shared chunk list — minimal contention since each thread writes once after processing.
Daemon threads Ensures clean shutdown if the Streamlit process is terminated mid-ingestion.
MMR over Similarity Search Prevents redundant context chunks, improving answer quality for complex queries.
In-memory ChromaDB for web UI Each session gets a fresh vectorstore — no stale data leaking between users.
Persisted ChromaDB for CLI CLI mode persists the database for repeated querying without re-embedding.
@st.cache_resource Embeddings model and LLM are loaded once and shared across reruns — no redundant initialization.

🧪 Experiments Included

The loaders/ and splitters/ directories contain standalone experiments exploring different LangChain components:

  • Loaders: PyPDF, TextLoader, WebBaseLoader, Docling (for URL-based academic papers)
  • Splitters: CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter
  • Vector Store: ChromaDB similarity search vs. retriever patterns

These experiments document the research process and demonstrate familiarity with the broader LangChain ecosystem.


📊 Performance Characteristics

Metric Value
Document processing Parallel — scales with file count, not linearly
Embedding model Local — no API calls, no latency, no cost
LLM inference Groq — sub-second response times via dedicated LPU hardware
Retrieval strategy MMR — diverse, non-redundant context selection
Max concurrent files 5 (configurable)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


📜 License

This project is licensed under the MIT License — see the LICENSE file for details.



Built with ❤️ using LangChain, Groq, and Python Threading


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages