# RAG System Tutorial

This notebook provides a detailed explanation of how our RAG (Retrieval-Augmented Generation) system works. We'll go through each component and explain its role in the system.

## 1. Introduction to RAG

RAG (Retrieval-Augmented Generation) is a technique that combines two main components:

1. **Retrieval**: Finding relevant information from a knowledge base
2. **Generation**: Using a language model to generate answers based on the retrieved information

This approach helps language models provide more accurate and up-to-date information by grounding their responses in specific documents.

## 2. System Architecture

Our RAG system consists of several key components:

1. **Document Processor**: Handles loading and processing different types of documents
2. **Vector Store**: Manages the storage and retrieval of document embeddings
3. **RAG System**: Combines retrieval and generation
4. **User Interface**: Streamlit app for interacting with the system

Let's look at each component in detail.

## 3. Document Processing

The Document Processor (`document_processor.py`) handles:

- Loading different types of documents (PDF, TXT, web pages)
- Splitting documents into chunks
- Preparing text for embedding

Let's see how it works:

In [None]:
from document_processor import DocumentProcessor

# Initialize the processor
processor = DocumentProcessor()

# Example: Process a text file
documents = processor.process_file("example.txt")
print(f"Number of chunks: {len(documents)}")
print(f"First chunk: {documents[0].page_content[:100]}...")

## 4. Vector Store

The Vector Store (`vector_store.py`) manages:

- Document embeddings using Ollama
- Storage in Qdrant vector database
- Similarity search for retrieval

Here's how it works:

In [None]:
from vector_store import VectorStore

# Initialize the vector store
vector_store = VectorStore()

# Add documents
vector_store.add_documents(documents)

# Search for similar documents
results = vector_store.similarity_search("What is the main topic?")
print(f"Found {len(results)} relevant documents")

## 5. RAG System

The RAG System (`rag_system.py`) combines:

- Document processing
- Vector storage and retrieval
- Language model generation

Let's see it in action:

In [None]:
from rag_system import RAGSystem

# Initialize the RAG system
rag = RAGSystem()

# Add documents
rag.add_documents(["example.txt"])

# Ask a question
response = rag.query("What is the main topic?")
print(f"Answer: {response['answer']}")
print(f"Sources: {len(response['sources'])} relevant documents found")

## 6. User Interface

The Streamlit interface (`app.py`) provides:

- Document upload
- URL input
- Chat interface
- Source display

To run the interface:
```bash
streamlit run app.py
```

## 7. How It All Works Together

1. User uploads documents or adds URLs
2. Documents are processed and split into chunks
3. Chunks are embedded and stored in Qdrant
4. When user asks a question:
   - System retrieves relevant document chunks
   - Language model generates answer using retrieved context
   - Answer and sources are displayed to user

## 8. Key Concepts Explained

### Embeddings
- Numerical representations of text
- Capture semantic meaning
- Enable similarity search

### Vector Database
- Stores document embeddings
- Enables fast similarity search
- Qdrant is our chosen solution

### Language Model
- Generates human-like text
- Uses retrieved context
- We use Ollama with Llama 2

## 9. Best Practices

1. **Document Chunking**
   - Use appropriate chunk sizes
   - Include overlap between chunks
   - Maintain context

2. **Prompt Engineering**
   - Clear instructions
   - Context formatting
   - Source attribution

3. **System Design**
   - Modular components
   - Clear interfaces
   - Error handling

## 10. Next Steps

1. Try different document types
2. Experiment with chunk sizes
3. Modify the prompt template
4. Add more features to the UI
5. Explore different language models