# Agentic AI and RAG Demo

This notebook demonstrates two key AI concepts:

1. **Agentic AI**: Direct interaction with Large Language Models (LLMs) via API
2. **RAG (Retrieval-Augmented Generation)**: Enhanced AI responses using document knowledge bases

---

## Part 1: Basic AI Agent

This section shows a simple AI agent that uses OpenRouter API to interact with free LLM models.


### How the AI Agent Works:

1. **API Setup**: Uses OpenRouter API to access multiple free LLM models
2. **Model Selection**: Choose from various free models (Grok, GPT-OSS, DeepSeek, etc.)
3. **Message Format**: Structures conversation with system and user messages
4. **API Call**: Sends request to OpenRouter with model, messages, and parameters
5. **Response Handling**: Parses and displays the AI's response

**Key Parameters:**
- `temperature`: Controls randomness (0.2 = more focused, 1.0 = more creative)
- `max_tokens`: Maximum length of the response
- `model`: The specific LLM to use (free models available)


In [1]:
import os
from dotenv import load_dotenv
import json
import requests
import textwrap


# Your OpenRouter API Key
load_dotenv()
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")


# Choose your model 
# Replace with any available model
MODEL = "x-ai/grok-4.1-fast:free"

'''
Free Models to use:
1. "openai/gpt-oss-20b:free" 
2. "tngtech/deepseek-r1t2-chimera:free"  
3. "google/gemma-3-27b-it:free"
4. "qwen/qwen3-coder:free"

'''

# Compose the conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me something about AI in Africa."}
]

# Send the request
response = requests.post(
    "https://openrouter.ai/api/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "model": MODEL,
        "messages": messages,
        "temperature": 0.2,
        "max_tokens": 4096
    }
)

# Parse and print
if response.ok:
    result = response.json()
    result_text = result["choices"][0]["message"]["content"]
    print(f"AI Response:\n {textwrap.fill(result_text, width=100)}")
else:
    print("Error:", response.status_code, response.text)

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

---

## Part 2: Function Calling (Tool Use) in Agentic AI

Function calling allows AI agents to use external tools and functions, making them more powerful and capable of performing actions beyond just text generation.

### How Function Calling Works:

1. **Define Functions**: Create functions that the AI can call (tools)
2. **Describe Functions**: Provide function schemas (name, description, parameters)
3. **AI Decides**: The LLM decides when and which function to call
4. **Execute Function**: Run the function with provided parameters
5. **Return Results**: Send function results back to the AI for final response

### Example Use Cases:
- Mathematical calculations
- Data lookups (databases, APIs)
- File operations
- External service integrations
- Custom business logic

### Note on Free Models:
Some free models may have limited or no function calling support. If you encounter errors, try:
- Using a different free model (e.g., "openai/gpt-oss-20b:free")
- The code includes proper error handling to show what went wrong
- Function calling format follows OpenRouter API specification


In [None]:
# Function Calling Demo: AI Agent with Tools

import os
from dotenv import load_dotenv
import json
import requests
import textwrap
from datetime import datetime
import math

load_dotenv()
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")

# Use a model that supports function calling
# Note: Some free models may have limited or no function calling support
# If you get errors, try switching to a different model
MODEL = "x-ai/grok-4.1-fast:free"  
# Alternative free models to try:
# MODEL = "openai/gpt-oss-20b:free"
# MODEL = "google/gemma-3-27b-it:free"

# ========== Define Tools (Functions) ==========

def calculate(expression):
    """
    Performs mathematical calculations safely.
    
    Args:
        expression: A mathematical expression as a string (e.g., "2 + 2", "sqrt(16)")
    
    Returns:
        The result of the calculation
    """
    try:
        # Safe evaluation with limited functions
        allowed_names = {
            k: v for k, v in math.__dict__.items() if not k.startswith("__")
        }
        allowed_names.update({"abs": abs, "round": round, "min": min, "max": max})
        result = eval(expression, {"__builtins__": {}}, allowed_names)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

def get_current_time():
    """
    Gets the current date and time.
    
    Returns:
        Current date and time as a string
    """
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

def text_uppercase(text):
    """
    Converts text to uppercase.
    
    Args:
        text: The text to convert
    
    Returns:
        Uppercase version of the text
    """
    return text.upper()

def text_word_count(text):
    """
    Counts the number of words in a text.
    
    Args:
        text: The text to analyze
    
    Returns:
        Number of words in the text
    """
    words = text.split()
    return f"Word count: {len(words)}"

# Map function names to actual functions
available_functions = {
    "calculate": calculate,
    "get_current_time": get_current_time,
    "text_uppercase": text_uppercase,
    "text_word_count": text_word_count,
}

# ========== Define Function Schemas ==========
# These describe the functions to the AI model
# OpenRouter API requires tools to be wrapped with "type": "function"

tools = [
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Performs mathematical calculations. Supports basic operations (+, -, *, /) and math functions like sqrt, sin, cos, etc.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate (e.g., '2 + 2', 'sqrt(16)', 'sin(3.14/2)')"
                    }
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Gets the current date and time",
            "parameters": {
                "type": "object",
                "properties": {},
                "required": []
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "text_uppercase",
            "description": "Converts text to uppercase letters",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The text to convert to uppercase"
                    }
                },
                "required": ["text"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "text_word_count",
            "description": "Counts the number of words in a given text",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The text to count words in"
                    }
                },
                "required": ["text"]
            }
        }
    }
]

# ========== Function Calling Handler ==========

def chat_with_functions(user_message, conversation_history=[]):
    """
    Handles conversation with function calling support.
    """
    messages = conversation_history.copy() if conversation_history else []
    messages.append({"role": "user", "content": user_message})
    
    # First API call: Let AI decide if it needs to call a function
    response = requests.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {OPENROUTER_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "model": MODEL,
            "messages": messages,
            "tools": tools,  # Provide available functions
            "tool_choice": "auto",  # Let AI decide when to use tools
            "temperature": 0.2,
            "max_tokens": 4096
        }
    )
    
    if not response.ok:
        error_text = response.text
        # Provide helpful error message
        if "function" in error_text.lower() or "tool" in error_text.lower():
            error_msg = f"Error: {response.status_code}\n"
            error_msg += f"The model '{MODEL}' may not support function calling.\n"
            error_msg += "Try switching to a different model (e.g., 'openai/gpt-oss-20b:free')\n"
            error_msg += f"Full error: {error_text}"
            return error_msg, messages
        return f"Error: {response.status_code} - {error_text}", messages
    
    result = response.json()
    assistant_message = result["choices"][0]["message"]
    messages.append(assistant_message)
    
    # Check if AI wants to call a function
    if "tool_calls" in assistant_message:
        print("üîß AI wants to use a function!")
        
        # Execute each function call
        for tool_call in assistant_message["tool_calls"]:
            function_name = tool_call["function"]["name"]
            function_args = json.loads(tool_call["function"]["arguments"])
            
            print(f"  ‚Üí Calling: {function_name}({function_args})")
            
            # Execute the function
            if function_name in available_functions:
                function_result = available_functions[function_name](**function_args)
                print(f"  ‚úì Result: {function_result}")
                
                # Add function result to conversation
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call["id"],
                    "name": function_name,
                    "content": str(function_result)
                })
            else:
                print(f"  ‚úó Function {function_name} not found!")
        
        # Second API call: Get final response with function results
        print("\nü§ñ Getting final response with function results...")
        response = requests.post(
            "https://openrouter.ai/api/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {OPENROUTER_API_KEY}",
                "Content-Type": "application/json"
            },
            json={
                "model": MODEL,
                "messages": messages,
                "tools": tools,
                "temperature": 0.2,
                "max_tokens": 4096
            }
        )
        
        if response.ok:
            result = response.json()
            final_message = result["choices"][0]["message"]["content"]
            messages.append({"role": "assistant", "content": final_message})
            return final_message, messages
        else:
            return f"Error: {response.status_code} - {response.text}", messages
    else:
        # No function calls needed, return direct response
        return assistant_message["content"], messages

# ========== Example Usage ==========

print("=" * 80)
print("Function Calling Demo")
print("=" * 80)

# Example 1: Mathematical calculation
print("\nüìù Example 1: Mathematical Calculation")
print("-" * 80)
query1 = "What is the square root of 144 plus 25?"
answer1, _ = chat_with_functions(query1)
print(f"\nüí¨ User: {query1}")
print(f"\nü§ñ AI Response:\n{textwrap.fill(answer1, width=80)}")

# Example 2: Get current time
print("\n\nüìù Example 2: Get Current Time")
print("-" * 80)
query2 = "What time is it now?"
answer2, _ = chat_with_functions(query2)
print(f"\nüí¨ User: {query2}")
print(f"\nü§ñ AI Response:\n{textwrap.fill(answer2, width=80)}")

# Example 3: Text processing
print("\n\nüìù Example 3: Text Processing")
print("-" * 80)
query3 = "Convert 'Hello World' to uppercase and count the words"
answer3, _ = chat_with_functions(query3)
print(f"\nüí¨ User: {query3}")
print(f"\nü§ñ AI Response:\n{textwrap.fill(answer3, width=80)}")


### Try Your Own Function Calls!

Modify the query below to test different function calls. The AI will automatically decide which functions to use.


In [None]:
# Interactive Function Calling - Try your own queries!
your_query = "Calculate 15 * 8 and tell me what time it is"  # Modify this query

answer, conversation = chat_with_functions(your_query)
print(f"\nüí¨ User: {your_query}")
print(f"\nü§ñ AI Response:\n{textwrap.fill(answer, width=80)}")


---

## Part 3: RAG (Retrieval-Augmented Generation) Implementation


# RAG (Retrieval-Augmented Generation) Implementation

This section implements RAG to query the ETIIAC_2025_forRAG.pdf document using free models.

## What is RAG?

**Retrieval-Augmented Generation (RAG)** is a technique that enhances AI responses by:
1. **Retrieving** relevant information from a knowledge base (your documents)
2. **Augmenting** the user's query with this retrieved context
3. **Generating** more accurate and context-aware answers

### RAG Workflow:
```
User Question ‚Üí Search Vector Store ‚Üí Retrieve Relevant Chunks ‚Üí 
Combine with Question ‚Üí Send to LLM ‚Üí Get Context-Aware Answer
```

### Key Components:
- **Vector Store**: Stores document chunks as embeddings (numerical representations)
- **Embeddings**: Convert text into vectors that capture semantic meaning
- **Similarity Search**: Find the most relevant document chunks for a query
- **LLM Integration**: Use retrieved context to generate accurate answers


## Step 1: Create Vector Store from Documents

This step processes your PDF documents and creates a searchable vector store.

### What happens here:
1. **Load Documents**: Reads PDF files from the `docs` folder
2. **Chunk Text**: Splits documents into smaller, manageable pieces (500 tokens with 50 token overlap)
3. **Create Embeddings**: Converts each chunk into a vector using OpenAI's embedding model
4. **Save Vector Store**: Stores embeddings, text chunks, and metadata in a JSON file

### Why chunking?
- LLMs have token limits, so we break documents into smaller pieces
- Overlapping chunks ensure context isn't lost at boundaries
- Smaller chunks allow for more precise retrieval

### Why embeddings?
- Embeddings convert text into numerical vectors
- Similar text has similar vectors (close in vector space)
- Enables semantic search (finding meaning, not just keywords)


In [None]:
# ---------------- Import the needed libraries ----------------
from VectorStore_v2 import VectorStore
import os

# Your OpenAI API Key
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# ---------------- Example usage ----------------
kb_folder = r"docs"

vectorstore_path = "vectorstore"
vectorstore_name = "vector_store.json"

# ‚úÖ Create folder only
os.makedirs(vectorstore_path, exist_ok=True)

# ‚úÖ Join folder + file name
vector_store_path = os.path.join(vectorstore_path, vectorstore_name)

# Initialize Vector Store
store = VectorStore(api_key=os.getenv("OPENAI_API_KEY"), chunk_size=500, chunk_overlap=50)

# Extract text, create and save Vector Store
store.exract_save_vector_store(store, kb_folder, vector_store_path)

## Step 2: Query the Knowledge Base with RAG

This step demonstrates how to ask questions and get answers based on your documents.

### What happens here:
1. **Load Vector Store**: Loads the previously created vector store from JSON
2. **Embed Query**: Converts your question into an embedding vector
3. **Similarity Search**: Finds the top-k most relevant document chunks using cosine similarity
4. **Retrieve Context**: Gets the actual text from the most similar chunks
5. **Generate Answer**: Sends query + context to the LLM for a context-aware response

### How similarity search works:
- **Cosine Similarity**: Measures the angle between query and document vectors
- **Top-k Retrieval**: Returns the k most similar chunks (here, k=3)
- **Context Assembly**: Combines retrieved chunks to provide comprehensive context

### Benefits:
- ‚úÖ Answers are grounded in your actual documents
- ‚úÖ Can cite specific sources (chunks used)
- ‚úÖ Reduces hallucinations (AI making up information)
- ‚úÖ Works with documents larger than LLM context windows


## Try Your Own Questions!

Modify the query below to ask questions about the ETIIAC_2025 document. The RAG system will:
- Find the most relevant sections from the document
- Use that context to generate an accurate answer
- Show you which chunks were used (for transparency)


In [None]:
# ---------------- Import the needed libraries ----------------
import os
from querykb_v2 import RAG
import textwrap

# ---------------- Example usage ----------------
vectorstore_path = "vectorstore"
vectorstore_name = "vector_store.json"

vectorstore_full_path = os.path.join(vectorstore_path, vectorstore_name)

rag = RAG(vector_store_path=vectorstore_full_path)


system_prompt = "You are a helpful assistant"
user_msg = "What is this document about?"

answer, used_context = rag.askAI(user_msg, system_prompt, k=3)

print("\nContext used:\n", used_context)
print("\n")
print(f"Reply Text:\n {textwrap.fill(answer, width=80)}")

## Summary

### What We've Demonstrated:

1. **Agentic AI**: Direct API interaction with LLMs using OpenRouter
   - Simple, stateless conversation
   - Access to multiple free models
   - No document context (general knowledge only)

2. **RAG System**: Enhanced AI with document knowledge
   - Document processing and chunking
   - Vector embeddings for semantic search
   - Context-aware question answering
   - Grounded in your specific documents

### Key Differences:

| Feature | Basic AI Agent | RAG System |
|---------|---------------|------------|
| Knowledge Source | Pre-trained model | Your documents |
| Accuracy | General knowledge | Document-specific |
| Hallucinations | Possible | Reduced |
| Context Window | Limited | Can handle large docs |
| Use Case | General Q&A | Domain-specific Q&A |

### Next Steps:
- Try different questions with the RAG system
- Experiment with different chunk sizes and overlap
- Adjust the `k` parameter (number of chunks retrieved)
- Try different free LLM models for comparison
