Skip to content

vis-hal-git/RAG-Based-Chatbot-System

Repository files navigation

🧠 Multimodal RAG-Based Chatbot System

A state-of-the-art Multimodal Retrieval-Augmented Generation (RAG) application built with Python, FastAPI, and OpenAI. It supports uploads of text, images, and PDFs, blending OCR, local vision embeddings, and dense-vector search into a conversational API.


🌟 Key Capabilities

  1. Multimodal Ingestion Pipeline:
    • Dynamically parses uploaded .pdf, .png, .jpg, and .txt files.
    • Leverages pdfplumber to scrape precise text chunks and bounding boxes, while simultaneously utilizing pytesseract to extract OCR text directly from embedded images and visual graphics mapping them precisely to metadata.
  2. Text & Vision Embeddings:
    • Pure text retrieval uses OpenAIEmbeddings alongside a local FAISS CPU index.
    • Native visual elements rely on local Hugging Face sentence-transformers models (using clip-ViT-B-32) to properly convert imagery into searchable vector dimensions without expensive third-party external API loops.
  3. Hybrid Search System (RRF):
    • Merges Sparse semantic tracking (using rank_bm25 lexical algorithms) alongside foundational Dense embedding searches (using FAISS).
    • Melds these candidates utilizing Reciprocal Rank Fusion (RRF) to organically return the best combined text matches.
  4. Cross-Modal Reranking Phase:
    • Both textual paragraphs and raw extracted image paths are scored specifically against the user's live query using cosine_similarity.
    • Ensures visually relevant diagrams are handed directly to the OpenAI generation head.
  5. Conversational Memory & State Persistence:
    • Real-time LLM interactions are maintained logically in memory natively mapping chat_history.
    • Connected seamlessly to a MongoDB Atlas Cloud Database, the tool autosaves every conversation you have. Previous dynamic chats are mapped securely inside a "Chat History" sidebar to reload sessions asynchronously.
  6. API-first design:
    • FastAPI endpoints for upload/chat/history plus Swagger docs at /docs.
  7. Per-Thread Context Persistence:
    • Each chat thread keeps its own retrieval context (vector index + image references) so users can reopen old chats and continue asking questions without re-uploading.
    • Original uploaded file bytes are not stored in chat history documents.

🛠️ Tech Stack & Requirements

Infrastructure & Backends

  • Python 3.11+
  • FastAPI (API server)
  • MongoDB (Database history persistence)

AI Tooling & Frameworks

  • OpenAI API (gpt-4o-mini for text/vision generation capability & text-embedding-3-small for dense vectors)
  • LangChain (Structuring FAISS and data abstractions)
  • Hugging Face (sentence-transformers) & torchvision (Underlying core algorithms utilized for evaluating visual cross-modals)

Parsing & Data Scraping

  • pytesseract, Pillow, pdf2image, pdfplumber

🚀 Setting Up the Project

1. Requirements

Ensure you have Tesseract OCR formally installed on your host machine to allow image-to-text processing.

Install all the application requirements via Virtual Environment:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

2. Environment Configurations

You must specify a few critical keys in an active .env file located at the root of the project (you can start from .env.example):

# Required for text-generation & core vector embeddings
OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
OPENAI_VISION_MODEL="gpt-4o-mini"

# Required for thread persistence in the UI 
MONGO_URI="mongodb+srv://<user>:<password>@cluster0...mongodb.net/YourDB"

3. Execution

Launch the API server locally:

uvicorn server:app --host 0.0.0.0 --port 8000

Open:


🗂️ Codebase Architecture

  • server.py: FastAPI app exposing upload/chat/history endpoints and serving index.html.
  • app.py: Compatibility entrypoint (Streamlit removed).
  • ingestion.py: Standard handler breaking down PDFs and Images. Stores physical file derivatives cleanly into an internal extracted_images buffer directory.
  • chunker.py: Utility built to slice enormous text scripts down to manageable 1200 token chunks keeping critical surrounding metadata intact.
  • hybrid_retriever.py: The core search algorithm dynamically intersecting BM25 hits alongside dense scoring rules.
  • reranker.py & llm_query.py: Injects sentence-transformers CLIP integrations scoring content and querying OpenAI's API sequentially via structured system prompts.
  • db_utils.py: Abstracted controller communicating to MongoDB to save/recall active thread logs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors