RAG Document Chat System

A modern web application that allows you to upload PDF documents and chat with them using AI. Built with Node.js, Express, and a beautiful responsive frontend.

Clean, modern interface ready for document upload

Features

📁 File Upload: Upload PDF documents with drag & drop support
🤖 AI Chat: Chat with your documents using OpenAI's GPT models
🗂️ Multiple Documents: Manage multiple documents with separate conversations
🎯 Namespace Isolation: Each document gets its own Pinecone namespace for accurate retrieval
💬 Conversation History: Maintain separate chat histories for each document
📱 Responsive Design: Beautiful, modern UI that works on all devices
⚡ Real-time Processing: Fast document processing and embedding generation
🌍 Multi-language Support: Ask questions in English, Vietnamese, or other languages
🔄 Auto-clean Data: Automatically clears old data on startup for fresh sessions

Tech Stack

Backend

Node.js with Express.js
LangChain for AI integration
OpenAI for embeddings and chat
Pinecone for vector database
Multer for file uploads
PDF-parse for document processing

Frontend

Vanilla JavaScript with modern ES6+
Responsive CSS with gradient design
Drag & Drop file upload
Real-time chat interface

Setup Instructions

1. Install Dependencies

npm install

2. Environment Variables

Create a .env file in the root directory with the following variables:

OPENAI_API_KEY=your_openai_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PORT=3001

3. Run the Application

# Start the server
npm start

# Or for development with auto-restart
npm run dev

The application will be available at http://localhost:3001

Demo

Example of chatting with a Vietnamese AWS Machine Learning document

How It Works

Document Upload: Users upload PDF files through the web interface
Processing: The system extracts text, chunks it, and generates embeddings
Storage: Embeddings are stored in Pinecone with unique namespaces per document
Chat: Users can select documents and ask questions
Retrieval: The system retrieves relevant chunks from the specific document's namespace
Generation: AI generates answers based on the retrieved context

API Endpoints

GET /api/files - Get list of uploaded files
POST /api/upload - Upload and process a PDF file
POST /api/chat - Send a chat message for a specific document
DELETE /api/files/:fileId - Delete a file and its data

File Structure

├── server.js              # Express server with API endpoints
├── public/
│   └── index.html         # Frontend application
├── src/
│   ├── chunk-texts.js     # Text chunking logic
│   ├── embed-texts.js     # Embedding generation
│   ├── generate-answer.js # AI answer generation
│   ├── parse-pdf.js       # PDF text extraction
│   └── vector-db.js       # Pinecone operations
├── uploads/               # Uploaded files storage
└── package.json

Usage

Upload Documents: Click "Upload Document" or drag & drop PDF files
Select Document: Click on any document in the sidebar to start chatting
Ask Questions: Type your questions in the chat input (supports multiple languages)
Switch Documents: Click different documents to have separate conversations
Delete Documents: Use the delete button to remove documents

Multi-language Support

The system supports cross-language queries:

Upload a Vietnamese document and ask questions in English
Upload an English document and ask questions in Vietnamese
The AI will respond in the same language as your question

Configuration

The system uses the following default configurations:

Chunk Size: 1000 characters
Chunk Overlap: 200 characters
Embedding Model: text-embedding-3-large (3072 dimensions)
Chat Model: gpt-4o-mini
Top K Results: 10 relevant chunks (with similarity filtering)
File Size Limit: 10MB
Similarity Threshold: 0.7 for cross-language search

Troubleshooting

Common Issues

"No matching chunks found": The document might not contain relevant information for your query
Upload fails: Check file size (max 10MB) and ensure it's a valid PDF
API errors: Verify your OpenAI and Pinecone API keys are correct
Dimension mismatch errors: The system automatically handles this by recreating the Pinecone index
Cross-language queries not working: Try asking in the same language as the document first

Environment Setup

Make sure you have:

Node.js 16+ installed
Valid OpenAI API key with sufficient credits
Valid Pinecone API key with an active project

Original Terminal Version

The original terminal-based version is still available in index.js. You can run it with:

node index.js

This will:

Create index if not exists
Process sample PDF for the first time
Accept queries from terminal and generate answers from PDF

License

MIT License - feel free to use this project for your own applications!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
pdfs		pdfs
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
test.js		test.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Document Chat System

Features

Tech Stack

Backend

Frontend

Setup Instructions

1. Install Dependencies

2. Environment Variables

3. Run the Application

Demo

How It Works

API Endpoints

File Structure

Usage

Multi-language Support

Configuration

Troubleshooting

Common Issues

Environment Setup

Original Terminal Version

License

About

Uh oh!

Releases

Packages

Languages

License

uyenvoaero/rag-langchain-nodejs

Folders and files

Latest commit

History

Repository files navigation

RAG Document Chat System

Features

Tech Stack

Backend

Frontend

Setup Instructions

1. Install Dependencies

2. Environment Variables

3. Run the Application

Demo

How It Works

API Endpoints

File Structure

Usage

Multi-language Support

Configuration

Troubleshooting

Common Issues

Environment Setup

Original Terminal Version

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages