A modern web application that allows you to upload PDF documents and chat with them using AI. Built with Node.js, Express, and a beautiful responsive frontend.
Clean, modern interface ready for document upload
- 📁 File Upload: Upload PDF documents with drag & drop support
- 🤖 AI Chat: Chat with your documents using OpenAI's GPT models
- 🗂️ Multiple Documents: Manage multiple documents with separate conversations
- 🎯 Namespace Isolation: Each document gets its own Pinecone namespace for accurate retrieval
- 💬 Conversation History: Maintain separate chat histories for each document
- 📱 Responsive Design: Beautiful, modern UI that works on all devices
- ⚡ Real-time Processing: Fast document processing and embedding generation
- 🌍 Multi-language Support: Ask questions in English, Vietnamese, or other languages
- 🔄 Auto-clean Data: Automatically clears old data on startup for fresh sessions
- Node.js with Express.js
- LangChain for AI integration
- OpenAI for embeddings and chat
- Pinecone for vector database
- Multer for file uploads
- PDF-parse for document processing
- Vanilla JavaScript with modern ES6+
- Responsive CSS with gradient design
- Drag & Drop file upload
- Real-time chat interface
npm install
Create a .env
file in the root directory with the following variables:
OPENAI_API_KEY=your_openai_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PORT=3001
# Start the server
npm start
# Or for development with auto-restart
npm run dev
The application will be available at http://localhost:3001
Example of chatting with a Vietnamese AWS Machine Learning document
- Document Upload: Users upload PDF files through the web interface
- Processing: The system extracts text, chunks it, and generates embeddings
- Storage: Embeddings are stored in Pinecone with unique namespaces per document
- Chat: Users can select documents and ask questions
- Retrieval: The system retrieves relevant chunks from the specific document's namespace
- Generation: AI generates answers based on the retrieved context
GET /api/files
- Get list of uploaded filesPOST /api/upload
- Upload and process a PDF filePOST /api/chat
- Send a chat message for a specific documentDELETE /api/files/:fileId
- Delete a file and its data
├── server.js # Express server with API endpoints
├── public/
│ └── index.html # Frontend application
├── src/
│ ├── chunk-texts.js # Text chunking logic
│ ├── embed-texts.js # Embedding generation
│ ├── generate-answer.js # AI answer generation
│ ├── parse-pdf.js # PDF text extraction
│ └── vector-db.js # Pinecone operations
├── uploads/ # Uploaded files storage
└── package.json
- Upload Documents: Click "Upload Document" or drag & drop PDF files
- Select Document: Click on any document in the sidebar to start chatting
- Ask Questions: Type your questions in the chat input (supports multiple languages)
- Switch Documents: Click different documents to have separate conversations
- Delete Documents: Use the delete button to remove documents
The system supports cross-language queries:
- Upload a Vietnamese document and ask questions in English
- Upload an English document and ask questions in Vietnamese
- The AI will respond in the same language as your question
The system uses the following default configurations:
- Chunk Size: 1000 characters
- Chunk Overlap: 200 characters
- Embedding Model:
text-embedding-3-large
(3072 dimensions) - Chat Model:
gpt-4o-mini
- Top K Results: 10 relevant chunks (with similarity filtering)
- File Size Limit: 10MB
- Similarity Threshold: 0.7 for cross-language search
- "No matching chunks found": The document might not contain relevant information for your query
- Upload fails: Check file size (max 10MB) and ensure it's a valid PDF
- API errors: Verify your OpenAI and Pinecone API keys are correct
- Dimension mismatch errors: The system automatically handles this by recreating the Pinecone index
- Cross-language queries not working: Try asking in the same language as the document first
Make sure you have:
- Node.js 16+ installed
- Valid OpenAI API key with sufficient credits
- Valid Pinecone API key with an active project
The original terminal-based version is still available in index.js
. You can run it with:
node index.js
This will:
- Create index if not exists
- Process sample PDF for the first time
- Accept queries from terminal and generate answers from PDF
MIT License - feel free to use this project for your own applications!