A fully custom chatbot built with Agentic RAG (Retrieval-Augmented Generation), combining Gemini models (free tier) with a local knowledge base for accurate, context-aware, and explainable responses. Features a lightweight, dependency-free frontend and a streamlined FastAPI backend for complete control and simplicity.
- Pure HTML/CSS/JavaScript frontend with no external dependencies
- FastAPI backend with Gemini integration
- Agentic RAG implementation with:
- Context retrieval using embeddings and cosine similarity
- Step-by-step reasoning with Chain of Thought
- Function calling for dynamic context retrieval
- Comprehensive error handling and logging
- Type-safe implementation with Python type hints
- Configurable through environment variables
agentic_rag/
├── backend/
│ ├── embeddings.py # Embedding and similarity functions
│ ├── rag_engine.py # Core RAG implementation
│ └── server.py # FastAPI server
├── frontend/
│ └── index.html # Web interface
├── requirements.txt # Python dependencies
├── .env.sample # Sample environment variables
└── README.md # Documentation
- Python 3.11 or higher
- Gemini API key (you can get one for free at google ai studio)
- Git (for version control)
- Clone the repository:
git clone https://github.com/AndrewNgo-ini/agentic_rag.git
cd agentic_rag
- Create and activate a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install Python dependencies:
pip install -r requirements.txt
- Set up environment variables:
cp .env.sample .env
Then edit .env
with your configuration:
GEMINI_API_KEY=your-api-key-here
GEMINI_MODEL=gemini-2.0-flash # or another compatible model
GEMINI_EMBEDDING_MODEL=text-embedding-3-small # or another compatible model
- Start the backend server:
cd backend
python server.py
- Access the frontend:
- Option 1: Open
frontend/index.html
directly in your web browser - Option 2: Serve using Python's built-in server:
cd frontend
python -m http.server 3000
Then visit http://localhost:3000 in your browser.
- Type your question in the input field
- The system will:
- Retrieve relevant context using embeddings
- Generate a step-by-step reasoning process
- Provide a final answer based on the context and reasoning
- View the intermediate steps and reasoning process in the response
The application can be configured through environment variables:
Variable | Description | Default |
---|---|---|
GEMINI_API_KEY | Your Gemini API key | Required |
GEMINI_MODEL | Gemini model to use | gemini-2.0-flash |
HOST | Backend server host | 0.0.0.0 |
PORT | Backend server port | 8000 |
The application includes comprehensive error handling:
- API errors are logged and returned with appropriate status codes
- Frontend displays user-friendly error messages
- Detailed logging for debugging and monitoring
- Graceful fallbacks for common failure scenarios
This project is licensed under the MIT License - see the LICENSE file for details.
Common issues and solutions:
-
Gemini API Error
- Verify your API key is correct
- Check your API usage limits
- Ensure the model name is valid
-
Backend Connection Failed
- Confirm the backend server is running
- Check the port is not in use
- Verify firewall settings
-
Embedding Errors
- Ensure input text is not empty
- Check for proper text encoding
- Verify numpy installation
- Gemini for their API and models
- FastAPI framework
- Contributors and maintainers