A document chat application that lets you have conversations with your documents using RAG (Retrieval Augmented Generation) method.
The backend uses LangGraph for orchestrating the RAG pipeline:
-
Document Processing
- Documents are uploaded and split into chunks
- Chunks are embedded and stored in Qdrant vector store
- Each document gets a unique collection name
-
RAG Pipeline (
rag_pipeline.py
)- Uses a state-based graph processing flow with two key nodes:
retrieve
: Fetches semantically similar documents from vector storegenerate
: Produces answers using retrieved context and LLM
- Document processing capabilities:
- Supports both semantic chunking (SemanticChunker) and recursive character splitting
- Configurable chunk size (default: 1000) and overlap (default: 200)
- DirectoryLoader for handling multiple document formats
- Vector store integration:
- Uses Qdrant for document storage and retrieval
- Configurable similarity search with top-k results (default: 10)
- Collection-based document organization with UUID naming
- LLM integration:
- Configurable model selection (default: GPT-4)
- Uses LangChain hub prompts (default: "rlm/rag-prompt")
- Maintains conversation context through state management
- Key methods:
add_documents()
: Processes and stores new documentsquery()
: Executes the RAG pipeline for question answering_setup_graph()
: Configures the directed processing flow_initialize_components()
: Sets up LLM, embeddings, and vector store
# Main components class RAGPipeline: def __init__(self, config, collection_name): # Initialize LLM, embeddings, and vector store self._setup_graph() # Sets up LangGraph processing flow def query(self, question): # 1. Retrieve relevant documents # 2. Generate answer using context return response
- Uses a state-based graph processing flow with two key nodes:
-
LangGraph Setup
- Creates a directed graph:
retrieve → generate
- On every user query, the graph retrieves the most relevant documents from the configured vector database and generates an answer using the configured LLM.
- Creates a directed graph:
A Streamlit application with two main components:
-
Document Upload
- Upload PDF or TXT files
- Files are processed and stored in the vector database
-
Chat Interface
- Ask questions about the uploaded document
- View AI-generated responses based on document content
-
Set up environment variables:
cp .env.example .env
Backend environment variables:
QDRANT_API_URL=your_qdrant_url QDRANT_API_KEY=your_qdrant_api_key QDRANT_API_PORT=port_number QDRANT_API_PREFIX=your_qdrant_prefix TFY_API_KEY=your_truefoundry_key TFY_LLM_GATEWAY_BASE_URL=your_gateway_url
Frontend environment variables:
# If ENVIRONMENT is set to 'production', PROD_API_URL will be used # If ENVIRONMENT is set to 'development', DEV_API_URL will be used ENVIRONMENT=development # or production DEV_API_URL=http://localhost:8000 PROD_API_URL=https://your-production-api-url
-
Install dependencies:
Backend:
cd backend pip install -r requirements.txt
Frontend:
cd frontend pip install -r requirements.txt
-
Start the Services:
# Start Backend cd backend uvicorn main:app --host 0.0.0.0 --port 8000 # Start Frontend cd ../frontend streamlit run main.py
-
Using the Application:
- Open the Streamlit interface (default: http://localhost:8501)
- Upload your PDF or TXT file
- Wait for processing confirmation
- Start asking questions about your document
Build and run the services using Docker:
Make sure to configure the environment variables as described in the Quick Start section.
cd backend
docker build -t rag-demo-api .
docker run -p 8000:8000 --env-file .env rag-demo-api
cd frontend
docker build -t rag-demo-frontend .
docker run -p 8501:8501 --env-file .env rag-demo-frontend