# Guía Completa de Técnicas de Prompting

## 1. Técnicas Básicas

### 1.1 Claridad y Especificidad
**Principio**: Ser claro y específico en lugar de vago.

**❌ Prompt vago:**
```
Escribe sobre marketing
```

**✅ Prompt específico:**
```
Escribe un artículo de 800 palabras sobre estrategias de marketing digital para pequeñas empresas de e-commerce, enfocándose en redes sociales, SEO y email marketing.
```

### 1.2 Contexto Detallado
**Principio**: Proporcionar contexto relevante para obtener respuestas más precisas.

**Ejemplo:**
```
Actúa como un consultor de recursos humanos con 10 años de experiencia. Una startup tecnológica de 25 empleados necesita implementar un sistema de evaluación de desempeño. Proporciona un plan detallado considerando el presupuesto limitado y la cultura informal de la empresa.
```

## 2. Técnicas de Estructura

### 2.1 Chain of Thought (Cadena de Pensamiento)
**Principio**: Pedir al AI que muestre su razonamiento paso a paso.

**Ejemplo:**
```
Resuelve este problema paso a paso:
Una tienda vende camisetas a $15 cada una. Si compras 3 o más, hay un 20% de descuento. Si compras 5 o más, hay un 30% de descuento. ¿Cuánto costaría comprar 7 camisetas?

Piensa paso a paso:
1. Identifica qué descuento aplica
2. Calcula el precio con descuento
3. Calcula el total
```

### 2.2 Few-Shot Learning (Aprendizaje con Pocos Ejemplos)
**Principio**: Proporcionar ejemplos para establecer el patrón deseado.

**Ejemplo:**
```
Convierte estas descripciones a formato de producto e-commerce:

Ejemplo 1:
Input: "Zapatos deportivos Nike rojos talla 42"
Output: "Zapatillas Nike Running - Rojo | Talla 42 | Ideales para correr y entrenar"

Ejemplo 2:
Input: "Laptop HP 8GB RAM 256GB SSD"
Output: "Laptop HP Pavilion - 8GB RAM, 256GB SSD | Perfecta para trabajo y estudio"

Ahora convierte:
Input: "Auriculares inalámbricos Sony cancelación ruido"
```

### 2.3 Role Playing (Interpretación de Rol)
**Principio**: Asignar un rol específico al AI para obtener respuestas más especializadas.

**Ejemplo:**
```
Actúa como un chef profesional con especialización en cocina mediterránea. Un cliente quiere un menú para una cena romántica de 3 platos usando ingredientes de temporada de primavera. El presupuesto es moderado y uno de los comensales es vegetariano.
```

## 3. Técnicas Avanzadas

### 3.1 Tree of Thoughts (Árbol de Pensamientos)
**Principio**: Explorar múltiples líneas de razonamiento antes de llegar a una conclusión.

**Ejemplo:**
```
Necesito elegir una plataforma para mi tienda online. Analiza estas opciones considerando múltiples perspectivas:

Opciones: Shopify, WooCommerce, Magento

Evalúa desde estas perspectivas:
1. Perspectiva técnica (facilidad de uso, personalización)
2. Perspectiva financiera (costos iniciales y recurrentes)
3. Perspectiva de crecimiento (escalabilidad)
4. Perspectiva de marketing (herramientas integradas)

Para cada perspectiva, analiza los pros y contras de cada opción, luego sintetiza una recomendación final.
```

### 3.2 Constitutional AI
**Principio**: Establecer principios o reglas que deben seguirse en la respuesta.

**Ejemplo:**
```
Crea un plan de marketing para una aplicación de fitness siguiendo estos principios:
1. Debe ser ético y no usar tácticas manipuladoras
2. Debe ser inclusivo para todos los tipos de cuerpo
3. Debe enfocarse en bienestar general, no solo en apariencia
4. Debe ser honesto sobre los resultados esperados
5. Debe promover hábitos sostenibles a largo plazo
```

### 3.3 Prompt Chaining (Encadenamiento)
**Principio**: Dividir tareas complejas en pasos secuenciales.

**Ejemplo:**
```
Paso 1: Analiza las tendencias actuales del mercado de aplicaciones móviles de salud mental
Paso 2: Basándote en el análisis anterior, identifica 3 nichos específicos con potencial
Paso 3: Para el nicho más prometedor, desarrolla un concepto de aplicación
Paso 4: Crea un plan de lanzamiento para esa aplicación
```

## 4. Técnicas de Control de Salida

### 4.1 Formato Específico
**Principio**: Especificar exactamente cómo debe estructurarse la respuesta.

**Ejemplo:**
```
Analiza la estrategia de contenido de una marca de ropa sostenible y presenta tu análisis en este formato:

## Análisis de Estrategia de Contenido

**Marca:** [Nombre]
**Audiencia objetivo:** [Descripción]

### Fortalezas
- [Lista de 3-4 puntos]

### Debilidades
- [Lista de 3-4 puntos]

### Recomendaciones
1. **Corto plazo (1-3 meses):** [Acciones específicas]
2. **Mediano plazo (3-6 meses):** [Acciones específicas]
3. **Largo plazo (6+ meses):** [Acciones específicas]

### Métricas de éxito
- [3-4 KPIs específicos]
```

### 4.2 Restricciones de Longitud y Tono
**Ejemplo:**
```
Explica el concepto de inteligencia artificial en exactamente 100 palabras, usando un tono conversacional como si le hablaras a un adolescente de 15 años. Evita jerga técnica y usa analogías simples.
```

### 4.3 Formato JSON/Estructurado
**Ejemplo:**
```
Analiza este producto y devuelve el resultado en formato JSON:

{
  "producto": "[nombre del producto]",
  "categoria": "[categoría principal]",
  "puntos_fuertes": ["[lista de fortalezas]"],
  "puntos_debiles": ["[lista de debilidades]"],
  "precio_sugerido": "[rango de precio]",
  "mercado_objetivo": "[descripción del público]",
  "puntuacion_viabilidad": "[1-10]"
}
```

## 5. Técnicas de Optimización

### 5.1 Prompts Negativos
**Principio**: Especificar qué NO hacer para evitar resultados no deseados.

**Ejemplo:**
```
Escribe un artículo sobre inversión para principiantes.

NO incluyas:
- Consejos específicos de acciones o criptomonedas
- Promesas de ganancias garantizadas
- Jerga financiera compleja sin explicar
- Recomendaciones de productos financieros específicos

SÍ incluye:
- Conceptos básicos explicados claramente
- Principios generales de inversión
- Advertencias sobre riesgos
- Pasos prácticos para empezar
```

### 5.2 Iteración y Refinamiento
**Ejemplo de proceso iterativo:**
```
Primera iteración:
"Crea un plan de contenido para redes sociales"

Segunda iteración:
"El plan anterior es muy genérico. Créalo específicamente para una panadería artesanal local, enfocándose en Instagram y Facebook, con 3 publicaciones semanales durante 1 mes"

Tercera iteración:
"Perfecto. Ahora incluye las mejores horas para publicar y hashtags específicos para cada publicación"
```

### 5.3 Meta-Prompting
**Principio**: Pedir al AI que ayude a mejorar el prompt.

**Ejemplo:**
```
Quiero crear un prompt para generar ideas de contenido para mi blog de viajes. Mi prompt actual es: "Dame ideas para mi blog de viajes"

¿Cómo puedo mejorar este prompt para obtener ideas más específicas y útiles? Sugiere una versión mejorada y explica por qué sería más efectiva.
```

## 6. Técnicas para Casos Específicos

### 6.1 Análisis y Síntesis
```
Actúa como un analista de mercado. Analiza la siguiente información sobre la industria de food delivery y sintetiza:

[Datos/información]

Estructura tu análisis en:
1. Tendencias principales (3-4 puntos clave)
2. Oportunidades identificadas
3. Riesgos y desafíos
4. Recomendación estratégica (1 párrafo)
```

### 6.2 Creatividad Guiada
```
Genera 5 conceptos creativos para una campaña publicitaria de una marca de café orgánico. Para cada concepto incluye:

- Título de la campaña
- Concepto central en 1 línea
- Canal principal (digital, tradicional, experiencial)
- Público objetivo específico
- Elemento diferenciador

Criterios: Debe ser auténtico, sostenible y memorable.
```

### 6.3 Resolución de Problemas
```
Problema: Una pequeña empresa de software tiene alta rotación de empleados (40% anual).

Usando el método de los "5 Por Qués", analiza las posibles causas raíz y luego propone 3 soluciones específicas y accionables, priorizándolas por impacto y facilidad de implementación.
```

## 7. Mejores Prácticas

### 7.1 Checklist de un Buen Prompt
- [ ] Contexto claro establecido
- [ ] Objetivo específico definido
- [ ] Formato de salida especificado
- [ ] Ejemplos incluidos (si es necesario)
- [ ] Restricciones y limitaciones mencionadas
- [ ] Criterios de éxito establecidos

### 7.2 Errores Comunes a Evitar
1. **Ser demasiado vago:** "Ayúdame con marketing"
2. **No dar contexto:** Asumir que el AI conoce tu situación específica
3. **Pedir demasiado en un solo prompt:** Intentar resolver múltiples problemas complejos
4. **No especificar formato:** Dejar que el AI decida cómo estructurar la respuesta
5. **No iterar:** Conformarse con la primera respuesta sin refinar

### 7.3 Consejos para Prompts Más Efectivos
- **Usa verbos de acción específicos:** "Analiza", "Compara", "Diseña", "Evalúa"
- **Incluye métricas cuando sea relevante:** "Aumentar engagement en 25%"
- **Especifica el nivel de detalle:** "Explicación básica", "Análisis profundo"
- **Menciona el público objetivo:** "Para directivos", "Para principiantes"
- **Establece deadline o urgencia si es relevante:** "Para implementar la próxima semana"

## 8. Plantillas Útiles

### Plantilla para Análisis de Competencia
```
Analiza a [COMPETIDOR] en la industria de [INDUSTRIA] considerando:

**Contexto:** [Tu empresa/situación]
**Objetivo:** [Qué buscas lograr con este análisis]

**Áreas a analizar:**
1. Propuesta de valor
2. Estrategia de pricing
3. Canales de distribución
4. Estrategia de marketing
5. Fortalezas y debilidades

**Formato de entrega:** [Especificar formato deseado]
**Próximos pasos:** Incluir 3 recomendaciones accionables
```

### Plantilla para Brainstorming
```
Necesito generar ideas para [PROBLEMA/DESAFÍO].

**Contexto:**
- Industria: [industria]
- Audiencia: [descripción]
- Presupuesto: [rango]
- Restricciones: [limitaciones]

**Tipo de ideas buscadas:** [específico, creativo, práctico, etc.]
**Cantidad:** [número de ideas]
**Formato:** Para cada idea incluir [título, descripción, pros/contras, nivel de dificultad]
```

### Plantilla para Resolución de Problemas
```
**Problema:** [Descripción clara del problema]
**Impacto:** [Cómo afecta al negocio/proyecto]
**Contexto:** [Información relevante]
**Recursos disponibles:** [Tiempo, presupuesto, equipo]
**Restricciones:** [Limitaciones]

**Proceso solicitado:**
1. Análisis de causas raíz
2. 3-5 opciones de solución
3. Evaluación de cada opción (pros/contras/recursos)
4. Recomendación final con plan de implementación
```

---


## 🧠 What Is RAG and Why Should You Care?

**RAG (Retrieval Augmented Generation)** is one of the most powerful techniques in modern AI applications. Let's break it down:

| Component | What It Does | Why It Matters |
|-----------|--------------|----------------|
| **Retrieval** | Finds relevant information from your documents | Ensures answers come from *your* data, not just the AI's training |
| **Augmentation** | Enhances the AI's knowledge with this specific information | Makes responses accurate and up-to-date |
| **Generation** | Creates human-like responses using the retrieved information | Delivers insights in natural, easy-to-understand language |

<div style="background-color: #effaf5; border: 1px solid #0d9488; padding: 15px; margin: 20px 0; border-radius: 5px;">
<h4 style="color: #000000; margin-top: 0;">💡 Real-World Analogy</h4>
<p style="color: #000000;">Think of RAG as the difference between:</p>
<ul style="color: #000000;">
<li><strong>A general knowledge expert</strong> who studied years ago (standard LLM)</li>
<li><strong>A specialist with your documents open</strong> in front of them, referencing exact paragraphs as they answer your questions (RAG system)</li>
</ul>
</div>

## 🛠️ Our Exciting Toolkit

We'll be using several cutting-edge tools to build our RAG system:

| Tool | What It Is | Why It's Amazing |
|------|------------|------------------|
| **Ollama** | An open-source platform that runs AI models locally on your computer | Privacy (your data never leaves your machine), no API costs, and complete control |
| **ChromaDB** | A specialized database for storing and searching "vector embeddings" | Lightning-fast semantic search that understands meaning, not just keywords |
| **LangChain** | A framework that connects AI components together like building blocks | Makes complex AI workflows simple and customizable |
| **Gradio** | A tool for creating web interfaces for AI models | Turns your code into a professional-looking application in minutes |

In [None]:
#!pip install langchain langchain_ollama gradio chromadb pypdf langchain_community

# 🎯 What We'll Build Together

By the end of this tutorial, you'll have created:

```
📄 Documents → 🔪 Chunker → 🧮 Vector DB → 🔍 Retriever → 🤖 LLM → 💬 Answer
```

A complete RAG system that can:

1. **Process PDF documents** of your choice
2. **Break them into smart chunks** that preserve meaning
3. **Transform text into vectors** that capture semantic meaning
4. **Store everything efficiently** for lightning-fast retrieval
5. **Find the most relevant information** for any question
6. **Generate accurate, helpful responses** with proper citations

<div style="background-color: #ffe4e6; border-left: 6px solid #be123c; padding: 15px; margin: 20px 0; border-radius: 5px;">
<h3 style="color: #000000; margin-top: 0;">🔥 Why This Matters For Your Career</h3>
<p style="color: #000000;">RAG systems are at the forefront of practical AI applications. At MAIA Academy, we've seen how companies are rapidly adopting this technology to:</p>
<ul style="color: #000000;">
<li>Build intelligent document assistants</li>
<li>Create knowledge bases that actually answer questions</li>
<li>Develop customer support systems that handle complex queries</li>
<li>Implement research tools that synthesize information from multiple sources</li>
</ul>
<p style="color: #000000;">The skills you'll learn today are directly transferable to real-world AI projects and align perfectly with our <strong>Foundations of AI Development</strong> and <strong>Deep Learning & LLMs</strong> modules!</p>
</div>

In [1]:
# Standard imports
import os
import logging
import time
import sys
import tempfile
from typing import List, Dict, Any

# LangChain imports
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Gradio for web interface
import gradio as gr

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

  from .autonotebook import tqdm as notebook_tqdm


## 2. Setting the Stage: Configuration

<div style="background-color: #2d333b; padding: 20px; border-radius: 8px; margin-bottom: 20px; border-left: 6px solid #58a6ff;">
  <h3 style="color: #ffffff; margin-top: 0;">System Configuration Parameters</h3>
  <p style="color: #ffffff;">Before we build our RAG system, we need to configure some important settings—like tuning a new instrument before a performance. These parameters will determine how our system processes and interacts with documents.</p>
</div>

<table style="width: 100%; border-collapse: collapse; margin: 20px 0; background-color: #22272e;">
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56; width: 200px;"><strong style="color: #58a6ff;">PERSIST_DIRECTORY</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">Where we'll store our "data safe" (the vector database) on disk. This allows our system to remember what it learned even after restarting.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">CHUNK_SIZE</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">How big each text piece will be (in characters). This affects how much context the AI has when answering questions.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">CHUNK_OVERLAP</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">How much the pieces overlap to maintain context between chunks and ensure no information is lost at the boundaries.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">PDF_URLS</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The documents we'll use as our knowledge base (our "reference library"). These are the sources the system will learn from.</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">LLM_MODEL</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The "brain" that processes the context and generates answers (like llama3 or other models available in Ollama).</td>
  </tr>
  <tr>
    <td style="padding: 15px; border: 1px solid #444c56;"><strong style="color: #58a6ff;">EMBEDDING_MODEL</strong></td>
    <td style="padding: 15px; border: 1px solid #444c56; color: #adbac7;">The "translator" that converts text into numerical vectors that capture meaning. Different models balance between speed and accuracy.</td>
  </tr>
</table>

<div style="background-color: #2d333b; border: 1px solid #444c56; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h3 style="color: #58a6ff; margin-top: 0;">💡 What's This Chunk Stuff?</h3>
  <p style="color: #adbac7;">Think of cutting a big sandwich. If the pieces are huge, you get more filling but it's hard to bite. If they're tiny, you bite easy but might miss the full flavor. Overlap is like leaving a bit of the last bite on the next one so you don't lose track of the overall taste.</p>
  
  <div style="display: flex; justify-content: space-between; margin-top: 20px; text-align: center;">
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Large Chunks (2000+)</strong></p>
      <p style="color: #adbac7;">✅ More context<br>✅ Better for complex topics<br>❌ Less precise retrieval<br>❌ Slower processing</p>
    </div>
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Medium Chunks (800-1200)</strong></p>
      <p style="color: #adbac7;">✅ Balanced approach<br>✅ Good for most cases<br>✅ Reasonable speed<br>✅ Decent precision</p>
    </div>
    <div style="flex: 1; margin: 0 10px;">
      <p style="color: #58a6ff;"><strong>Small Chunks (300-500)</strong></p>
      <p style="color: #adbac7;">✅ Very precise retrieval<br>✅ Fast processing<br>❌ Limited context<br>❌ May miss broader concepts</p>
    </div>
  </div>
</div>

<div style="text-align: center; background-color: #22272e; padding: 10px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #ff7b72; font-weight: bold;">⚠️ Warning</p>
  <p style="color: #adbac7;">This notebook is designed to be read with a dark background. If you program with a white background, just know that you're a complete psychopath and a danger to society.</p>
</div>

In [None]:
# Configuration parameters
PERSIST_DIRECTORY = "chroma_db"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200
PDF_URLS = [ 
    "https://www.ine.es/daco/daco42/ecp/ecp0123.pdf",
    "https://fundacionalternativas.org/wp-content/uploads/2023/10/PERSONAS_MIGRANTES_v02.pdf"
]
LLM_MODEL = "llama3.2:1b"  # Using a small Llama model for faster responses
EMBEDDING_MODEL = "all-minilm"  # Small, fast embedding model (22M parameters)
TEMPERATURE = 0.1  # Lower temperature for more deterministic outputs

## 3. The Heart of It: RAGSystem Class

Now, let’s create the main “robot” that does all the work: the RAGSystem class. This robot gets ready with all the tools it needs.

In [4]:
class RAGSystem:
    def __init__(self, pdf_urls: List[str], persist_directory: str = PERSIST_DIRECTORY):
        self.pdf_urls = pdf_urls
        self.persist_directory = persist_directory
        self.documents = []
        self.vectorstore = None
        self.llm = None
        self.chain = None
        
        # Initialize the LLM with streaming capability
        callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
        self.llm = ChatOllama(
            model=LLM_MODEL,
            temperature=TEMPERATURE,
            callback_manager=callback_manager
        )
        
        # Initialize embeddings
        self.embeddings = OllamaEmbeddings(model=EMBEDDING_MODEL)
        
        logger.info(f"Initialized RAG system with {len(pdf_urls)} PDFs")

## 3. Building Our RAG System

<div style="background-color: #2d333b; padding: 5px; border-radius: 4px; margin-bottom: 10px;">
  <h3 style="color: #58a6ff; margin: 10px;">3.1 Loading and Chopping Documents</h3>
</div>

<div style="background-color: #22272e; padding: 15px; border-radius: 8px; border-left: 4px solid #7ee787; margin-bottom: 20px;">
  <p style="color: #adbac7;">The first crucial step in our RAG pipeline is to read PDFs and slice them into manageable chunks. This process transforms raw documents into pieces our system can effectively process.</p>
</div>

<div style="background-color: #2d333b; border: 1px solid #444c56; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h4 style="color: #7ee787; margin-top: 0;">📚 Why Do We Chop?</h4>
  
  <p style="color: #adbac7;">Imagine a huge cake: you can't eat it all at once, so you cut it into slices. Same with documents:</p>
  
  <ul style="color: #adbac7; margin-left: 20px;">
    <li><strong style="color: #d2a8ff;">AI has a "small tummy"</strong> (a context window that limits how much text it can process at once)</li>
    <li><strong style="color: #d2a8ff;">Small chunks help find exact answers fast</strong> (better retrieval precision)</li>
    <li><strong style="color: #d2a8ff;">It makes the system quicker and less likely to choke</strong> (more efficient processing)</li>
  </ul>
  
  <div style="margin-top: 25px; background-color: #22272e; padding: 15px; border-radius: 5px; border: 1px dashed #444c56;">
    <h5 style="color: #58a6ff; margin-top: 0;">🔍 Technical Insight: The Chunking Process</h5>
    <p style="color: #adbac7;">Our system uses a <code style="background-color: #2d333b; padding: 2px 5px; border-radius: 3px; color: #ff7b72;">RecursiveCharacterTextSplitter</code> that intelligently divides text based on:</p>
    <ul style="color: #adbac7;">
      <li>Natural boundaries (paragraphs, sentences)</li>
      <li>Configured chunk size (how many characters per chunk)</li>
      <li>Strategic overlap to maintain context between chunks</li>
    </ul>
    <p style="color: #adbac7;">This ensures that each chunk contains coherent, meaningful information rather than arbitrary text divisions.</p>
  </div>
</div>

<div style="display: flex; background-color: #22272e; border-radius: 8px; overflow: hidden; margin: 20px 0;">
  <div style="flex: 1; padding: 15px; border-right: 1px solid #444c56;">
    <p style="color: #58a6ff; font-weight: bold; margin-top: 0;">Document Loading</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Reading PDFs using PyPDFLoader</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Extracting text and metadata</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Handling multiple documents</p>
  </div>
  <div style="flex: 1; padding: 15px; border-right: 1px solid #444c56;">
    <p style="color: #58a6ff; font-weight: bold; margin-top: 0;">Document Chunking</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Splitting into smaller pieces</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Maintaining logical boundaries</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Creating overlapping sections</p>
  </div>
  <div style="flex: 1; padding: 15px;">
    <p style="color: #58a6ff; font-weight: bold; margin-top: 0;">Result</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Dozens or hundreds of chunks</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Each ~1000 characters long</p>
    <p style="color: #adbac7; margin-bottom: 0;">👉 Ready for embedding creation</p>
  </div>
</div>

<div style="background-color: #2d333b; border-left: 4px solid #f97583; padding: 15px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #adbac7; margin: 0;"><strong style="color: #f97583;">⚠️ Common Pitfall:</strong> Setting your chunk size too small (under 300 characters) or too large (over 2000 characters) can severely impact your system's performance. Start with ~1000 and adjust based on your specific documents and query needs.</p>
</div>

In [5]:
def load_documents(self) -> None:
    """Load and split PDF documents"""
    logger.info("Loading and processing PDFs...")
    
    # Text splitter for chunking documents
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=CHUNK_SIZE,
        chunk_overlap=CHUNK_OVERLAP,
        separators=["\n\n", "\n", ". ", " ", ""]
    )
    
    all_pages = []
    for url in self.pdf_urls:
        try:
            loader = PyPDFLoader(url)
            pages = loader.load()
            logger.info(f"Loaded {len(pages)} pages from {url}")
            all_pages.extend(pages)
        except Exception as e:
            logger.error(f"Error loading PDF from {url}: {e}")
    
    # Split the documents into chunks
    self.documents = text_splitter.split_documents(all_pages)
    logger.info(f"Created {len(self.documents)} document chunks")

<div style="background-color: #2d333b; padding: 5px; border-radius: 4px; margin-bottom: 10px;">
  <h3 style="color: #58a6ff; margin: 10px;">3.2 Storing in a Vector Database</h3>
</div>

<div style="background-color: #22272e; padding: 15px; border-radius: 8px; border-left: 4px solid #f0883e; margin-bottom: 20px;">
  <p style="color: #adbac7;">After chunking our documents, we need to store them in a way that allows for intelligent searching. This is where vectors and ChromaDB come into play.</p>
</div>

<div style="background-color: #2d333b; border: 1px solid #444c56; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h4 style="color: #f0883e; margin-top: 0;">🧮 What Are Vectors?</h4>
  
  <p style="color: #adbac7;">Think of each chunk as a person, and we give it a unique "fingerprint" based on what it says. These fingerprints are actually lists of numbers that capture meaning.</p>
  
  <div style="display: flex; margin-top: 20px; background-color: #22272e; padding: 15px; border-radius: 8px;">
    <div style="flex: 1; padding-right: 15px;">
      <p style="color: #adbac7; font-style: italic; margin-top: 0;">"I like the sun"</p>
      <p style="color: #d2a8ff; font-family: monospace; font-size: 0.9em;">[0.12, -0.33, 0.65, ...]</p>
    </div>
    <div style="flex: 1; padding-left: 15px; border-left: 1px dashed #444c56;">
      <p style="color: #adbac7; font-style: italic; margin-top: 0;">"I love the heat"</p>
      <p style="color: #d2a8ff; font-family: monospace; font-size: 0.9em;">[0.15, -0.28, 0.61, ...]</p>
    </div>
  </div>
  
  <p style="color: #adbac7; margin-top: 20px;">These sentences get similar vector "fingerprints" because they express similar concepts. This lets us search by <strong>meaning</strong>, not just exact words.</p>
</div>

<div style="display: flex; background-color: #22272e; border-radius: 8px; overflow: hidden; margin: 20px 0; border: 1px solid #444c56;">
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center; border-right: 1px solid #444c56;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">1</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Convert</p>
    <p style="color: #adbac7; margin: 5px 0;">Text → Vector</p>
  </div>
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center; border-right: 1px solid #444c56;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">2</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Store</p>
    <p style="color: #adbac7; margin: 5px 0;">In ChromaDB</p>
  </div>
  <div style="flex: 1; padding: 15px; display: flex; flex-direction: column; align-items: center; text-align: center;">
    <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
      <span style="color: #58a6ff; font-weight: bold; font-size: 1.5em;">3</span>
    </div>
    <p style="color: #f0883e; font-weight: bold; margin: 5px 0;">Retrieve</p>
    <p style="color: #adbac7; margin: 5px 0;">By Similarity</p>
  </div>
</div>

<div style="background-color: #22272e; padding: 20px; border-radius: 8px; margin: 20px 0; border: 1px solid #444c56;">
  <h4 style="color: #58a6ff; margin-top: 0;">In Plain English:</h4>
  
  <p style="color: #adbac7;">1. <strong>We transform text into numbers</strong> using the embedding model (all-minilm)</p>
  <p style="color: #adbac7;">2. <strong>We store these numbers in ChromaDB</strong> along with the original text</p>
  <p style="color: #adbac7;">3. <strong>When you ask a question</strong>, we convert your question to a vector too</p>
  <p style="color: #adbac7;">4. <strong>ChromaDB finds chunks with similar vectors</strong> to your question</p>
  <p style="color: #adbac7;">5. <strong>These similar chunks</strong> likely contain the answer you need</p>
</div>

<div style="display: flex; background-color: #22272e; border-radius: 8px; margin: 20px 0;">
  <div style="flex: 1; padding: 20px;">
    <h5 style="color: #7ee787; margin-top: 0;">💡 Why This Is Cool</h5>
    <ul style="color: #adbac7; list-style-type: none; padding-left: 0;">
      <li style="margin-bottom: 8px;">✅ <strong>Finds similar concepts</strong>, even with different words</li>
      <li style="margin-bottom: 8px;">✅ <strong>Lightning-fast search</strong> of large document collections</li>
      <li style="margin-bottom: 8px;">✅ <strong>Works across languages</strong> (Spanish "sol" ≈ English "sun")</li>
      <li>✅ <strong>More accurate</strong> than keyword searching</li>
    </ul>
  </div>
</div>

<div style="background-color: #2d333b; border-left: 4px solid #d2a8ff; padding: 15px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #adbac7; margin: 0;"><strong style="color: #d2a8ff;">🚀 Pro Tip:</strong> Think of it like searching a music library - you find songs that "sound similar" to the one you like, not just songs with the exact same title.</p>
</div>

In [6]:
def create_vectorstore(self) -> None:
    """Create a fresh vector database"""
    # Remove any existing database
    if os.path.exists(self.persist_directory):
        import shutil
        logger.info(f"Removing existing vectorstore at {self.persist_directory}")
        shutil.rmtree(self.persist_directory, ignore_errors=True)
    
    # Create a new vectorstore
    logger.info("Creating new vectorstore...")
    if not self.documents:
        self.load_documents()
    
    # Create a temporary directory for the database
    # This helps avoid permission issues on some systems
    temp_dir = tempfile.mkdtemp()
    logger.info(f"Using temporary directory for initial database creation: {temp_dir}")
    
    try:
        # First create in temp directory
        self.vectorstore = Chroma.from_documents(
            documents=self.documents,
            embedding=self.embeddings,
            persist_directory=temp_dir
        )
        
        # Now create the real directory
        if not os.path.exists(self.persist_directory):
            os.makedirs(self.persist_directory)
            
        # And create the final vectorstore
        self.vectorstore = Chroma.from_documents(
            documents=self.documents,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )
        self.vectorstore.persist()
        
        logger.info(f"Vectorstore created successfully with {len(self.documents)} documents")
    except Exception as e:
        logger.error(f"Error creating vectorstore: {e}")
        raise
    finally:
        # Clean up temp directory
        if os.path.exists(temp_dir):
            import shutil
            shutil.rmtree(temp_dir, ignore_errors=True)

<div style="background-color: #2d333b; padding: 5px; border-radius: 4px; margin-bottom: 10px;">
  <h3 style="color: #58a6ff; margin: 10px;">3.3 Building the RAG Chain</h3>
</div>

<div style="background-color: #22272e; padding: 15px; border-radius: 8px; border-left: 4px solid #79c0ff; margin-bottom: 20px;">
  <p style="color: #adbac7;">Here's where our system turns into a "detective." We connect all the components into a sequence that transforms questions into accurate answers.</p>
</div>

<div style="background-color: #1c2128; padding: 20px; border-radius: 8px; margin: 20px 0; border: 1px solid #444c56;">
  <h4 style="color: #79c0ff; margin-top: 0; text-align: center; margin-bottom: 20px;">The RAG Chain Components</h4>
  
  <!-- Retriever Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">🔍</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Searcher (Retriever)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Turns your question into a fingerprint and finds the closest matches in the database.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">retriever = vectorstore.as_retriever(search_kwargs={"k": 5})</code>
      </div>
    </div>
  </div>
  
  <!-- Prompt Template Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">📝</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Instructions (Prompt)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Like a recipe: "Be nice, use the chunks, cite your sources." This keeps answers helpful and trustworthy.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">prompt = PromptTemplate.from_template(template)</code>
      </div>
    </div>
  </div>
  
  <!-- LLM Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 15px; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">🧠</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The AI (LLM)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Writes the final response based on the instructions and retrieved chunks.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">llm = ChatOllama(model="llama3", temperature=0.1)</code>
      </div>
    </div>
  </div>
  
  <!-- Output Parser Component -->
  <div style="background-color: #22272e; border-radius: 8px; padding: 15px; margin-bottom: 0; border: 1px solid #444c56;">
    <div style="display: flex; align-items: center;">
      <div style="background-color: #2d333b; width: 50px; height: 50px; border-radius: 50%; display: flex; align-items: center; justify-content: center; margin-right: 15px;">
        <span style="font-size: 1.5em;">✨</span>
      </div>
      <div>
        <p style="color: #79c0ff; margin: 0; font-weight: bold;">The Formatter (Parser)</p>
      </div>
    </div>
    <div style="margin-top: 10px; padding-left: 65px;">
      <p style="color: #adbac7; margin: 0;">Makes the response neat and clear for the user to read.</p>
      <div style="background-color: #2d333b; padding: 8px; border-radius: 4px; margin-top: 10px;">
        <code style="color: #d2a8ff; font-size: 0.9em;">StrOutputParser()</code>
      </div>
    </div>
  </div>
</div>

<!-- Flow diagram -->
<div style="background-color: #22272e; padding: 20px; border-radius: 8px; margin: 20px 0;">
  <h4 style="color: #79c0ff; margin-top: 0; text-align: center;">How It All Flows Together</h4>
  
  <div style="display: flex; justify-content: center; align-items: center; flex-wrap: wrap; margin: 20px 0;">
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">❓</div>
      <div style="color: #adbac7;">Question</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">🔍</div>
      <div style="color: #adbac7;">Retriever</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">📝</div>
      <div style="color: #adbac7;">Prompt</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">🧠</div>
      <div style="color: #adbac7;">LLM</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">✨</div>
      <div style="color: #adbac7;">Parser</div>
    </div>
    <div style="font-size: 1.5em; margin: 0 10px; color: #adbac7;">→</div>
    <div style="text-align: center; background-color: #2d333b; padding: 15px; border-radius: 8px; margin: 5px;">
      <div style="font-size: 2em; margin-bottom: 5px;">💡</div>
      <div style="color: #adbac7;">Answer</div>
    </div>
  </div>
  
  <div style="background-color: #1c2128; padding: 15px; border-radius: 8px; margin-top: 20px;">
    <p style="color: #adbac7; margin: 0; text-align: center;">This entire chain is created with just a few lines of code:</p>
    <div style="background-color: #2d333b; border-radius: 5px; padding: 15px; margin-top: 10px; font-family: monospace;">
      <pre style="color: #d2a8ff; margin: 0; overflow-x: auto; font-size: 0.9em;">self.chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | self.llm
    | StrOutputParser()
)</pre>
    </div>
  </div>
</div>

<div style="background-color: #2d333b; border-left: 4px solid #58a6ff; padding: 15px; border-radius: 5px; margin-top: 20px;">
  <p style="color: #adbac7; margin: 0;"><strong style="color: #adbac7;">💡 Pro Tip:</strong> The key to a good RAG system is balance. A great prompt template with poor retrieval won't work well, and perfect retrieval with bad instructions will still give bad answers. All pieces need to work together!</p>
</div>

In [7]:
def setup_chain(self) -> None:
    """Set up the RAG chain for question answering"""
    if not self.vectorstore:
        self.create_vectorstore()
    
    # Create retriever with search parameters
    retriever = self.vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={"k": 5}  # Return top 5 most relevant chunks
    )
    
    # Define the prompt template
    template = """
    ### INSTRUCTIONS: 
    You are an AI assistant dedicated to answering questions in a polite and professional manner. You must provide a helpful response to the user.
    
    (1) Be attentive to details: read the question and context thoroughly before answering.
    (2) Begin your response with a friendly tone and reiterate the question to ensure you understood it.
    (3) If the context allows you to answer the question, write a detailed, helpful, and easy-to-understand response, with sources referenced in the text. IF NOT: if you cannot find the answer, respond with an explanation, starting with: "I couldn't find the information in the documents I have access to."
    (4) Below your response, please list all referenced sources (i.e., document sections that support your claims).
    (5) Review your answer to ensure you answered the question, the response is helpful and professional, and it's formatted to be easily readable.
    
    THINK STEP BY STEP
    
    Answer the following question using the provided context.
    ### Question: {question} ###
    ### Context: {context} ###     
    ### Helpful Answer with Sources:
    """
    
    prompt = PromptTemplate.from_template(template)
    
    # Create the chain
    self.chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | self.llm
        | StrOutputParser()
    )
    
    logger.info("RAG chain setup complete")

### 3.4 Answering Questions

Time to shine! The robot takes your question, processes it, and gives you an answer. If something goes wrong, it politely lets you know.

In [8]:
def answer_question(self, question: str) -> str:
    """
    Answer a question using the RAG chain
    
    Args:
        question: The question to answer
        
    Returns:
        The answer to the question
    """
    if not self.chain:
        self.setup_chain()
    
    logger.info(f"Answering question: {question}")
    try:
        answer = self.chain.invoke(question)
        return answer
    except Exception as e:
        logger.error(f"Error answering question: {e}")
        return f"Error processing your question: {str(e)}"

## 4. A Window to the World: Gradio Interface

Let’s make our robot user-friendly with a web interface.

### 🌐 Why Gradio?
It’s like building an app with Lego blocks: easy, fast, and you can use it from your phone or computer.

In [9]:
def create_gradio_interface(rag_system: RAGSystem) -> gr.Interface:
    """
    Create a Gradio interface for the RAG system
    
    Args:
        rag_system: The RAG system to use
        
    Returns:
        A Gradio interface
    """
    def get_answer(question: str) -> str:
        """Wrapper function for the Gradio interface"""
        return rag_system.answer_question(question)
    
    # Gradio interface configuration
    interface = gr.Interface(
        fn=get_answer,
        inputs=gr.Textbox(
            placeholder="Ask a question about immigration...",
            label="Your Question"
        ),
        outputs=gr.Markdown(label="Answer"),
        title="Document Intelligence System with LLM",
        description="Ask any question about immigration based on the loaded documents",
        theme=gr.themes.Soft(),
        allow_flagging="never",
        examples=[
            "How many immigrants arrive each year?",
            "What are the main countries of origin?",
            "What economic impact does immigration have?"
        ]
    )
    
    return interface

## 5. Let’s Get It Running!

The “start button” checks everything, tests a question, and opens the interface.

In [10]:
def main() -> None:
    """Main function to run the RAG system"""
    try:
        # Display available models
        print("\n==== CHECKING OLLAMA MODELS ====")
        try:
            import requests
            response = requests.get("http://localhost:11434/api/tags")
            print("Available Ollama models:")
            if response.status_code == 200:
                for model in response.json().get("models", []):
                    print(f"- {model['name']}")
            else:
                print(f"Error checking Ollama models: {response.status_code}")
        except Exception as e:
            print(f"Error connecting to Ollama: {e}")
        
        print(f"\nUsing LLM model: {LLM_MODEL}")
        print(f"Using embedding model: {EMBEDDING_MODEL}")
        print("Make sure these models are available with 'ollama pull' commands.")
        
        # Create and initialize the RAG system
        rag_system = RAGSystem(pdf_urls=PDF_URLS)
        
        # Load documents and create vectorstore
        rag_system.load_documents()
        rag_system.create_vectorstore()
        
        # Test with a control question
        logger.info("Testing with a control question...")
        test_answer = rag_system.answer_question("How many immigrants arrive each year?")
        logger.info(f"Control answer received (length: {len(test_answer)})")
        
        # Create and launch Gradio interface
        logger.info("Launching Gradio interface...")
        interface = create_gradio_interface(rag_system)
        interface.launch(share=False)  # Set share=True to create a public link
    
    except Exception as e:
        logger.error(f"An error occurred in the main function: {e}")
        print(f"\n\nERROR: {str(e)}\n\n")
        print("\nTROUBLESHOOTING TIPS:")
        print("1. Make sure Ollama is running: 'ollama serve'")
        print(f"2. Make sure you have pulled the required models:")
        print(f"   - ollama pull {LLM_MODEL}")
        print(f"   - ollama pull {EMBEDDING_MODEL}")
        print("3. If you're still having dimension issues, try using a different embedding model by changing EMBEDDING_MODEL")
        print("4. Check that you have the required Python packages installed")

## 6. Putting It Together

Time to assemble our robot and make it run!

In [11]:
RAGSystem.load_documents = load_documents
RAGSystem.create_vectorstore = create_vectorstore
RAGSystem.setup_chain = setup_chain
RAGSystem.answer_question = answer_question

In [None]:
# Run the system
if __name__ == "__main__":
    main()
else:
    # If running in a notebook
    main()


==== CHECKING OLLAMA MODELS ====
Available Ollama models:
- all-minilm:latest
- llama3.2:1b
- codeqwen:7b
- terminator:latest
- qwen3:0.6b
- qwen3:1.7b

Using LLM model: llama3.2:1b
Using embedding model: all-minilm
Make sure these models are available with 'ollama pull' commands.


  rag_system = RAGSystem(pdf_urls=PDF_URLS)
2025-05-24 01:26:56,649 - __main__ - INFO - Initialized RAG system with 2 PDFs
2025-05-24 01:26:56,651 - __main__ - INFO - Loading and processing PDFs...
2025-05-24 01:26:59,419 - __main__ - INFO - Loaded 5 pages from https://www.ine.es/daco/daco42/ecp/ecp0123.pdf
2025-05-24 01:27:03,324 - __main__ - INFO - Loaded 37 pages from https://fundacionalternativas.org/wp-content/uploads/2023/10/PERSONAS_MIGRANTES_v02.pdf
2025-05-24 01:27:03,331 - __main__ - INFO - Created 165 document chunks
2025-05-24 01:27:03,333 - __main__ - INFO - Creating new vectorstore...
2025-05-24 01:27:03,334 - __main__ - INFO - Using temporary directory for initial database creation: C:\Users\demst\AppData\Local\Temp\tmpkcgs9sqf
2025-05-24 01:27:05,178 - chromadb.telemetry.product.posthog - INFO - Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
2025-05-24 01:27:17,659 - httpx - INFO - HTTP Request: POST http

I understand that you are seeking information on the number of immigrants who arrive each year in Spain.

According to the provided document, the estimated number of migrants arriving in Spain is around 180,000 per year (Figura 5: Número de adquisiciones de nacionalidad española de personas residentes, 2013-2022). This data is based on the information presented in Figure 6: Población extranjera por comunidades autónomas, 2022.

The document also mentions that the number of migrants arriving in Spain has been steadily increasing over the years, with a significant increase from 2013 to 2022. The exact breakdown of migrant arrivals by year is not provided, but it is mentioned that there were 225,793 migrants in 2013 and 205,880 in 2022.

It's worth noting that the document also provides information on the number of migrants arriving in different regions of Spain, with some areas having significantly higher numbers than others. For example, the region of Galicia has an estimated population

2025-05-24 01:27:54,695 - __main__ - INFO - Control answer received (length: 1485)
2025-05-24 01:27:54,696 - __main__ - INFO - Launching Gradio interface...


* Running on local URL:  http://127.0.0.1:7860


2025-05-24 01:27:55,594 - httpx - INFO - HTTP Request: GET http://127.0.0.1:7860/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-05-24 01:27:55,651 - httpx - INFO - HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"


* To create a public link, set `share=True` in `launch()`.


2025-05-24 01:27:56,057 - httpx - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2025-05-24 01:29:00,045 - __main__ - INFO - Answering question: cual fue la nacionalidad de inmigrantes que mas recibimos
2025-05-24 01:29:00,115 - httpx - INFO - HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
2025-05-24 01:29:17,700 - httpx - INFO - HTTP Request: POST http://127.0.0.1:11434/api/chat "HTTP/1.1 200 OK"


¡Hola! Me alegra poder ayudarte con tu pregunta.

La nacionalidad de los inmigrantes que más recibieron es un tema complejo y multifacético. Según el documento proporcionado, la participación de los inmigrantes en la sociedad de acogida y su integración en las instituciones públicas son fundamentales para hacer visibles sus contribuciones.

De acuerdo con el texto, "la integración implica el respeto de los valores básicos de la Unión Europea" (CoE 1991), lo que sugiere que la nacionalidad no es un factor determinante en la integración. Además, se menciona que "el primer PECI incorporaba la idea de establecer un 'sistema de recepción' para personas inmigrantes recién llegadas y aquellos en situaciones especialmente vulnerables" (CoE 1991).

En cuanto a las políticas públicas, particularmente en educación, empleo, servicios sociales, salud y vivienda, se enfatiza la necesidad de asegurar el acceso de la población inmigrante a estos servicios en igualdad de condiciones con la población au