<a href="https://colab.research.google.com/github/tivon-x/all-rag-techniques/blob/main/11_feedback_loop_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG 中的反馈循环

在本笔记本中，我实现了一个带有反馈循环机制的 RAG 系统，该系统能够随着时间的推移不断改进。通过收集并整合用户反馈，我们的系统能够在每次交互中学习提供更相关、更高质量的回复。

传统的 RAG 系统是静态的——它们仅基于嵌入相似度检索信息。通过引入反馈循环，我们创建了一个动态系统，能够实现以下功能：

- 记住哪些有效（以及哪些无效）
- 随时间调整文档相关性得分
- 将成功的问答对纳入其知识库
- 在每次用户交互中变得更智能

该方法的优势
1. 持续改进：系统从每次交互中学习，逐步提升性能。
2. 个性化：通过整合用户反馈，系统可以随时间适应个人或群体偏好。
3. 增加相关性：反馈循环有助于在未来检索中优先考虑更相关的文档。
4. 质量控制：随着系统的演变，低质量或不相关的响应不太可能被重复。
5. 适应性：系统可以随时间适应用户需求或文档内容的变化。


<div style="text-align: center;">

<img src="https://github.com/tivon-x/all-rag-techniques/blob/main/images/retrieval_with_feedback_loop.svg?raw=1" alt="retrieval with feedback loop" style="width:40%; height:auto;">
</div>

## 环境配置

In [None]:
# fitz库需要从pymudf那里安装
%pip install --quiet --force-reinstall pymupdf

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import fitz
import os
import numpy as np
import json
from openai import OpenAI
from datetime import datetime

## 提取文本

In [None]:
def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file.

    Args:
    pdf_path (str): Path to the PDF file.

    Returns:
    str: Extracted text from the PDF.
    """
    # Open the PDF file
    mypdf = fitz.open(pdf_path)
    all_text = ""  # Initialize an empty string to store the extracted text

    # Iterate through each page in the PDF
    for page_num in range(mypdf.page_count):
        page = mypdf[page_num]  # Get the page
        text = page.get_text("text")  # Extract text from the page
        all_text += text  # Append the extracted text to the all_text string

    return all_text  # Return the extracted text

## 分块

In [None]:
def chunk_text(text, n, overlap):
    """
    Chunks the given text into segments of n characters with overlap.

    Args:
    text (str): The text to be chunked.
    n (int): The number of characters in each chunk.
    overlap (int): The number of overlapping characters between chunks.

    Returns:
    List[str]: A list of text chunks.
    """
    chunks = []  # Initialize an empty list to store the chunks

    # Loop through the text with a step size of (n - overlap)
    for i in range(0, len(text), n - overlap):
        # Append a chunk of text from index i to i + n to the chunks list
        chunks.append(text[i:i + n])

    return chunks  # Return the list of text chunks

## OpenAI client

In [None]:
# colab环境
from google.colab import userdata
# 使用火山引擎
api_key = userdata.get("ARK_API_KEY")
base_url = userdata.get("ARK_BASE_URL")

In [None]:
model_name = "doubao-lite-128k-240828"
embedding_model = "doubao-embedding-text-240715"

In [None]:
client = OpenAI(
    base_url=base_url,
    api_key=api_key
)

## 向量数据库

In [None]:
class SimpleVectorStore:
    """
    A simple vector store implementation using NumPy.

    This class provides an in-memory storage and retrieval system for
    embedding vectors and their corresponding text chunks and metadata.
    It supports basic similarity search functionality using cosine similarity.
    """
    def __init__(self):
        """
        Initialize the vector store with empty lists for vectors, texts, and metadata.

        The vector store maintains three parallel lists:
        - vectors: NumPy arrays of embedding vectors
        - texts: Original text chunks corresponding to each vector
        - metadata: Optional metadata dictionaries for each item
        """
        self.vectors = []  # List to store embedding vectors
        self.texts = []    # List to store original text chunks
        self.metadata = [] # List to store metadata for each text chunk

    def add_item(self, text, embedding, metadata=None):
        """
        Add an item to the vector store.

        Args:
            text (str): The original text chunk to store.
            embedding (List[float]): The embedding vector representing the text.
            metadata (dict, optional): Additional metadata for the text chunk,
                                      such as source, timestamp, or relevance scores.
        """
        self.vectors.append(np.array(embedding))  # Convert and store the embedding
        self.texts.append(text)                   # Store the original text
        self.metadata.append(metadata or {})      # Store metadata (empty dict if None)

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        """
        Find the most similar items to a query embedding using cosine similarity.

        Args:
            query_embedding (List[float]): Query embedding vector to compare against stored vectors.
            k (int): Number of most similar results to return.
            filter_func (callable, optional): Function to filter results based on metadata.
                                             Takes metadata dict as input and returns boolean.

        Returns:
            List[Dict]: Top k most similar items, each containing:
                - text: The original text
                - metadata: Associated metadata
                - similarity: Raw cosine similarity score
                - relevance_score: Either metadata-based relevance or calculated similarity

        Note: Returns empty list if no vectors are stored or none pass the filter.
        """
        if not self.vectors:
            return []  # Return empty list if vector store is empty

        # Convert query embedding to numpy array for vector operations
        query_vector = np.array(query_embedding)

        # Calculate cosine similarity between query and each stored vector
        similarities = []
        for i, vector in enumerate(self.vectors):
            # Skip items that don't pass the filter criteria
            if filter_func and not filter_func(self.metadata[i]):
                continue

            # Calculate cosine similarity: dot product / (norm1 * norm2)
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append((i, similarity))  # Store index and similarity score

        # Sort results by similarity score in descending order
        similarities.sort(key=lambda x: x[1], reverse=True)

        # Construct result dictionaries for the top k matches
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],
                "metadata": self.metadata[idx],
                "similarity": score,
                # Use pre-existing relevance score from metadata if available, otherwise use similarity
                "relevance_score": self.metadata[idx].get("relevance_score", score)
            })

        return results

## 创建嵌入向量



In [None]:
def create_embeddings(text, model = None, batch_size = 10):
    """
    Creates embeddings for the given text.

    Args:
    text (str or List[str]): The input text(s) for which embeddings are to be created.
    model (str): The model to be used for creating embeddings.
    batch_size (int): batch size

    Returns:
    List[float] or List[List[float]]: The embedding vector(s).
    """
    if not model:
      model = embedding_model

    # Convert single string to list for uniform processing
    input_text = text if isinstance(text, list) else [text]

    all_embeddings = []

    for i in range(0, len(input_text), batch_size):
      batch = input_text[i : i + batch_size]
      # Create embeddings for the batch using the specified model
      response = client.embeddings.create(
          model=model,
          input=batch
      )
      all_embeddings.extend(item.embedding for item in response.data)

    return all_embeddings if isinstance(text, list) else all_embeddings[0]

## 反馈系统功能

现在我们将实现反馈系统的核心组件。

In [None]:
def get_user_feedback(query, response, relevance, quality, comments=""):
    """
    将用户反馈格式化为字典。

    Args:
        query (str): 用户查询
        response (str): 系统响应
        relevance (int): 相关性得分 (1-5)
        quality (int): 质量得分 (1-5)
        comments (str): 可选的反馈评论

    Returns:
        Dict: 格式化反馈
    """
    return {
        "query": query,
        "response": response,
        "relevance": int(relevance),
        "quality": int(quality),
        "comments": comments,
        "timestamp": datetime.now().isoformat()
    }

In [None]:
def store_feedback(feedback, feedback_file="feedback_data.json"):
    """
    存储反馈到一个 JSON 文件中.

    Args:
        feedback (Dict): 反馈数据
        feedback_file (str): 反馈文件路径，默认为 feedback_data.json
    """
    with open(feedback_file, "a") as f:
        json.dump(feedback, f)
        f.write("\n")

In [None]:
def load_feedback_data(feedback_file="feedback_data.json"):
    """
    从文件中加载反馈数据。

    Args:
        feedback_file (str): 反馈文件路径

    Returns:
        List[Dict]: 反馈数据列表
    """
    feedback_data = []
    try:
        with open(feedback_file, "r") as f:
            for line in f:
                if line.strip():
                    feedback_data.append(json.loads(line.strip()))
    except FileNotFoundError:
        print("No feedback data file found. Starting with empty feedback.")

    return feedback_data

## 带有反馈意识的文档处理

In [None]:
def process_document(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    为带反馈循环的检索增强生成（RAG）处理文档。
    该函数处理完整的文档处理流程：
    1. 从 PDF 中提取文本
    2. 带重叠的文本分块
    3. 为分块创建嵌入
    4. 将其与元数据一起存储在向量数据库中

    Args:
    pdf_path (str): 要处理的 PDF 文件路径
    chunk_size (int): 每个文本块的字符数量
    chunk_overlap (int): 连续文本块之间重叠的字符数量

    Returns:
    Tuple[List[str], SimpleVectorStore]: A tuple 包含:
        - 文本块列表
        - 填充了嵌入向量和元数据的向量存储库
    """
    # 步骤1：从PDF文档中提取原始文本内容
    print("Extracting text from PDF...")
    extracted_text = extract_text_from_pdf(pdf_path)

    # 步骤2：将文本分割成易于管理且有重叠的块，以更好地保留上下文
    print("Chunking text...")
    chunks = chunk_text(extracted_text, chunk_size, chunk_overlap)
    print(f"Created {len(chunks)} text chunks")

    # 步骤3：为每个文本块生成向量嵌入
    print("Creating embeddings for chunks...")
    chunk_embeddings = create_embeddings(chunks)

    # 步骤4：初始化向量数据库
    store = SimpleVectorStore()

    # 步骤5：将每个块及其嵌入添加到向量存储库中
    # 包括用于基于反馈改进的元数据
    for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
        store.add_item(
            text=chunk,
            embedding=embedding,
            metadata={
                "index": i,                # 在源文档的位置
                "source": pdf_path,        # 源文件路径
                "relevance_score": 1.0,    # 初始相关性得分（将根据反馈进行更新）
                "feedback_count": 0        # 此块收到的反馈数量
            }
        )

    print(f"Added {len(chunks)} chunks to the vector store")
    return chunks, store

## 基于反馈的相关性调整

In [None]:
def assess_feedback_relevance(query, doc_text, feedback):
    """
    使用大语言模型评估过去的反馈条目是否与当前查询和文档相关。

    此功能通过将当前查询、过去的查询+反馈以及文档内容发送给大语言模型进行相关性评估，
    从而帮助确定哪些过去的反馈应该影响当前的检索。

    Args:
        query (str): 需要进行检索的当前用户查询
        doc_text (str): 需要被评估的文本内容
        feedback (Dict): 包含“query”和“response”键的先前反馈数据

    Returns:
        bool: 如果反馈被认为与当前查询/文档相关，则为True，否则为False
    """
    # 定义 system prompt，指示大语言模型仅进行二元相关性判断
    system_prompt = """You are an AI system that determines if a past feedback is relevant to a current query and document.
    Answer with ONLY 'yes' or 'no'. Your job is strictly to determine relevance, not to provide explanations."""

    # user prompt，如果上下文窗口不够，截断 doc_text 和 feedback
    user_prompt = f"""
    Current query: {query}
    Past query that received feedback: {feedback['query']}
    Document content: {doc_text}
    Past response that received feedback: {feedback['response']}

    Is this past feedback relevant to the current query and document? (yes/no)
    """

    # Call the LLM API with zero temperature for deterministic output
    response = client.chat.completions.create(
        model=model_name,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0  # Use temperature=0 for consistent, deterministic responses
    )

    answer = response.choices[0].message.content.strip().lower()
    return 'yes' in answer

In [None]:
def adjust_relevance_scores(query, results, feedback_data):
    """
    根据历史反馈调整文档相关性得分，以提高检索质量。

    此函数分析过去的用户反馈，动态调整检索文档的相关性得分。
    它识别与当前查询上下文相关的反馈，根据相关性评分计算得分调整因子，
    并相应地重新排序结果。

    Args:
        query (str): 当前用户查询
        results (List[Dict]): 检索到的文档及其原始相似性得分
        feedback_data (List[Dict]): 包含用户评分的历史反馈

    Returns:
        List[Dict]: 调整后的相关性得分，经过排序
    """
    # 如果没有反馈数据，则返回原始结果
    if not feedback_data:
        return results

    print("Adjusting relevance scores based on feedback history...")

    for i, result in enumerate(results):
        document_text = result["text"]
        relevant_feedback = []

        # 通过查询大语言模型来评估每个历史反馈项的相关性，
        # 为这个特定的文档和查询组合找到相关的反馈
        for feedback in feedback_data:
            is_relevant = assess_feedback_relevance(query, document_text, feedback)
            if is_relevant:
                relevant_feedback.append(feedback)

        # 调整得分
        if relevant_feedback:
            # 计算所有反馈条目的平均相关性评分
            # 该评分采用1-5级评分（1表示不相关，5表示高度相关）
            avg_relevance = sum(f['relevance'] for f in relevant_feedback) / len(relevant_feedback)

            # 将平均相关性转换为0.5-1.5范围内的得分调整因子
            # - 低于3/5的评分将降低原始相似度（调整因子 < 1.0）
            # - 高于3/5的评分将提高原始相似度（调整因子 > 1.0）
            modifier = 0.5 + (avg_relevance / 5.0)

            # 调整原始相似度得分
            original_score = result["similarity"]
            adjusted_score = original_score * modifier

            # 更新数据
            result["original_similarity"] = original_score  # 原始分数
            result["similarity"] = adjusted_score           # 新的分数
            result["relevance_score"] = adjusted_score      # 新的相关性得分
            result["feedback_applied"] = True               # 标记已经根据反馈进行调整
            result["feedback_count"] = len(relevant_feedback)  # 使用的反馈数量

            print(f"  Document {i+1}: Adjusted score from {original_score:.4f} to {adjusted_score:.4f} based on {len(relevant_feedback)} feedback(s)")

    # 排序，从大到小
    results.sort(key=lambda x: x["similarity"], reverse=True)

    return results

## 使用反馈调整索引

In [None]:
def fine_tune_index(current_store, chunks, feedback_data):
    """
    通过高质量反馈增强向量存储库，以提高检索质量。

    该函数通过以下方式实现持续学习过程：
    1. 识别高质量反馈（评分很高的问答对）
    2. 从成功的交互中创建新的检索项
    3. 将这些项以增强的相关性权重添加到向量存储库中

    Args:
        current_store (SimpleVectorStore): 当前向量存储
        chunks (List[str]): 原始的文档分块
        feedback_data (List[Dict]): 包含相关性和质量评分的历史用户反馈

    Returns:
        SimpleVectorStore: 调整后的向量数据库
    """
    print("Fine-tuning index with high-quality feedback...")

    # 筛选高质量的反馈（相关性和质量大于等于 4）
    # 这能确保我们从最成功的交互中学习
    good_feedback = [f for f in feedback_data if f['relevance'] >= 4 and f['quality'] >= 4]

    if not good_feedback:
        print("No high-quality feedback found for fine-tuning.")
        return current_store

    # 新的向量数据库
    new_store = SimpleVectorStore()

    # 迁移数据
    for i in range(len(current_store.texts)):
        new_store.add_item(
            text=current_store.texts[i],
            embedding=current_store.vectors[i],
            metadata=current_store.metadata[i].copy()  # 使用copy进行复制
        )

    # 使用好的反馈创建增强的内容
    for feedback in good_feedback:
        # 格式化一个新的文档，将问题及其高质量答案结合起来
        # 这创建了可以直接解决用户查询的可检索内容
        enhanced_text = f"Question: {feedback['query']}\nAnswer: {feedback['response']}"

        # 生成合成文档的嵌入向量
        embedding = create_embeddings(enhanced_text)

        new_store.add_item(
            text=enhanced_text,
            embedding=embedding,
            metadata={
                "type": "feedback_enhanced",  # 标记来自于反馈
                "query": feedback["query"],   # 原始查询
                "relevance_score": 1.2,       # 提高初始相关性，以优先考虑这些项
                "feedback_count": 1,          # 跟踪反馈的整合情况
                "original_feedback": feedback # 完整的反馈记录
            }
        )

        print(f"Added enhanced content from feedback: {feedback['query'][:50]}...")

    print(f"Fine-tuned index now has {len(new_store.texts)} items (original: {len(chunks)})")
    return new_store

## 基于反馈循环的 RAG

In [None]:
def generate_response(query, context, model=None):
    """
    Generate a response based on the query and context.

    Args:
        query (str): User query
        context (str): Context text from retrieved documents
        model (str): LLM model to use

    Returns:
        str: Generated response
    """
    if not model:
      model = mode_name

    # Define the system prompt to guide the AI's behavior
    system_prompt = """You are a helpful AI assistant. Answer the user's question based only on the provided context. If you cannot find the answer in the context, state that you don't have enough information."""

    # Create the user prompt by combining the context and the query
    user_prompt = f"""
        Context:
        {context}

        Question: {query}

        Please provide a comprehensive answer based only on the context above.
    """

    # Call the OpenAI API to generate a response based on the system and user prompts
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        temperature=0  # Use temperature=0 for consistent, deterministic responses
    )

    # Return the generated response content
    return response.choices[0].message.content

In [None]:
def rag_with_feedback_loop(query, vector_store, feedback_data, k=5, model=None):
    """
    Complete RAG pipeline incorporating feedback loop.

    Args:
        query (str): User query
        vector_store (SimpleVectorStore): Vector store with document chunks
        feedback_data (List[Dict]): History of feedback
        k (int): Number of documents to retrieve
        model (str): LLM model for response generation

    Returns:
        Dict: Results including query, retrieved documents, and response
    """
    if not model:
      model = model_name

    print(f"\n=== Processing query with feedback-enhanced RAG ===")
    print(f"Query: {query}")

    # Step 1: Create query embedding
    query_embedding = create_embeddings(query)

    # Step 2: Perform initial retrieval based on query embedding
    results = vector_store.similarity_search(query_embedding, k=k)

    # Step 3: Adjust relevance scores of retrieved documents based on feedback
    adjusted_results = adjust_relevance_scores(query, results, feedback_data)

    # Step 4: Extract texts from adjusted results for context building
    retrieved_texts = [result["text"] for result in adjusted_results]

    # Step 5: Build context for response generation by concatenating retrieved texts
    context = "\n\n---\n\n".join(retrieved_texts)

    # Step 6: Generate response using the context and query
    print("Generating response...")
    response = generate_response(query, context, model)

    # Step 7: Compile the final result
    result = {
        "query": query,
        "retrieved_documents": adjusted_results,
        "response": response
    }

    print("\n=== Response ===")
    print(response)

    return result

## 完整工作流：从初始设置到反馈收集

In [None]:
def full_rag_workflow(pdf_path, query, feedback_data=None, feedback_file="feedback_data.json", fine_tune=False):
    """
    执行一个完整的带反馈整合的检索增强生成（RAG）工作流，以实现持续改进。

    该函数协调整个检索增强生成过程：
    1. 加载历史反馈数据
    2. 处理和分块文档
    3. 根据之前的反馈可选地微调向量索引
    4. 使用反馈调整的相关性得分执行检索和生成
    5. 收集新的用户反馈以供未来的改进
    6. 存储反馈以实现系统的持续学习

    Args:
        pdf_path (str): PDF 文档路径
        query (str): 用户查询
        feedback_data (List[Dict], optional): 预加载的反馈数据，如果为 None，从文件加载
        feedback_file (str): 存储反馈历史的 JSON 文件路径
        fine_tune (bool): 是否用成功的过去问答对增强索引

    Returns:
        Dict: 包含响应和检索元数据的结果
    """
    # Step 1: Load historical feedback for relevance adjustment if not explicitly provided
    if feedback_data is None:
        feedback_data = load_feedback_data(feedback_file)
        print(f"Loaded {len(feedback_data)} feedback entries from {feedback_file}")

    # Step 2: Process document through extraction, chunking and embedding pipeline
    chunks, vector_store = process_document(pdf_path)

    # Step 3: Fine-tune the vector index by incorporating high-quality past interactions
    # This creates enhanced retrievable content from successful Q&A pairs
    if fine_tune and feedback_data:
        vector_store = fine_tune_index(vector_store, chunks, feedback_data)

    # Step 4: Execute core RAG with feedback-aware retrieval
    # Note: This depends on the rag_with_feedback_loop function which should be defined elsewhere
    result = rag_with_feedback_loop(query, vector_store, feedback_data)

    # Step 5: Collect user feedback to improve future performance
    print("\n=== Would you like to provide feedback on this response? ===")
    print("Rate relevance (1-5, with 5 being most relevant):")
    relevance = input()

    print("Rate quality (1-5, with 5 being highest quality):")
    quality = input()

    print("Any comments? (optional, press Enter to skip)")
    comments = input()

    # Step 6: Format feedback into structured data
    feedback = get_user_feedback(
        query=query,
        response=result["response"],
        relevance=int(relevance),
        quality=int(quality),
        comments=comments
    )

    # Step 7: Persist feedback to enable continuous system learning
    store_feedback(feedback, feedback_file)
    print("Feedback recorded. Thank you!")

    return result

## 评估反馈循环

In [None]:
def evaluate_feedback_loop(pdf_path, test_queries, reference_answers=None):
    """
    通过比较反馈整合前后的性能，评估反馈循环对RAG质量的影响。

    该函数运行一个对照实验，以衡量整合反馈对检索和生成的影响：
    1. 第一轮：运行所有测试查询，不使用反馈
    2. 根据参考答案（如果提供）生成合成反馈
    3. 第二轮：使用反馈增强的检索运行相同的查询
    4. 比较两轮之间的结果，以量化反馈的影响

    Args:
        pdf_path (str): Path to the PDF document used as the knowledge base
        test_queries (List[str]): List of test queries to evaluate system performance
        reference_answers (List[str], optional): Reference/gold standard answers for evaluation
                                                and synthetic feedback generation

    Returns:
        Dict: Evaluation results containing:
            - round1_results: Results without feedback
            - round2_results: Results with feedback
            - comparison: Quantitative comparison metrics between rounds
    """
    print("=== Evaluating Feedback Loop Impact ===")

    # Create a temporary feedback file for this evaluation session only
    temp_feedback_file = "temp_evaluation_feedback.json"

    # Initialize feedback collection (empty at the start)
    feedback_data = []

    # ----------------------- FIRST EVALUATION ROUND -----------------------
    # Run all queries without any feedback influence to establish baseline performance
    print("\n=== ROUND 1: NO FEEDBACK ===")
    round1_results = []

    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")

        # Process document to create initial vector store
        chunks, vector_store = process_document(pdf_path)

        # Execute RAG without feedback influence (empty feedback list)
        result = rag_with_feedback_loop(query, vector_store, [])
        round1_results.append(result)

        # Generate synthetic feedback if reference answers are available
        # This simulates user feedback for training the system
        if reference_answers and i < len(reference_answers):
            # Calculate synthetic feedback scores based on similarity to reference answer
            similarity_to_ref = calculate_similarity(result["response"], reference_answers[i])
            # Convert similarity (0-1) to rating scale (1-5)
            relevance = max(1, min(5, int(similarity_to_ref * 5)))
            quality = max(1, min(5, int(similarity_to_ref * 5)))

            # Create structured feedback entry
            feedback = get_user_feedback(
                query=query,
                response=result["response"],
                relevance=relevance,
                quality=quality,
                comments=f"Synthetic feedback based on reference similarity: {similarity_to_ref:.2f}"
            )

            # Add to in-memory collection and persist to temporary file
            feedback_data.append(feedback)
            store_feedback(feedback, temp_feedback_file)

    # ----------------------- SECOND EVALUATION ROUND -----------------------
    # Run the same queries with feedback incorporation to measure improvement
    print("\n=== ROUND 2: WITH FEEDBACK ===")
    round2_results = []

    # Process document and enhance with feedback-derived content
    chunks, vector_store = process_document(pdf_path)
    vector_store = fine_tune_index(vector_store, chunks, feedback_data)

    for i, query in enumerate(test_queries):
        print(f"\nQuery {i+1}: {query}")

        # Execute RAG with feedback influence
        result = rag_with_feedback_loop(query, vector_store, feedback_data)
        round2_results.append(result)

    # ----------------------- RESULTS ANALYSIS -----------------------
    # Compare performance metrics between the two rounds
    comparison = compare_results(test_queries, round1_results, round2_results, reference_answers)

    # Clean up temporary evaluation artifacts
    if os.path.exists(temp_feedback_file):
        os.remove(temp_feedback_file)

    return {
        "round1_results": round1_results,
        "round2_results": round2_results,
        "comparison": comparison
    }

## 辅助函数

In [None]:
def calculate_similarity(text1, text2):
    """
    Calculate semantic similarity between two texts using embeddings.

    Args:
        text1 (str): First text
        text2 (str): Second text

    Returns:
        float: Similarity score between 0 and 1
    """
    # Generate embeddings for both texts
    embedding1 = create_embeddings(text1)
    embedding2 = create_embeddings(text2)

    # Convert embeddings to numpy arrays
    vec1 = np.array(embedding1)
    vec2 = np.array(embedding2)

    # Calculate cosine similarity between the two vectors
    similarity = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

    return similarity

In [None]:
def compare_results(queries, round1_results, round2_results, reference_answers=None):
    """
    Compare results from two rounds of RAG.

    Args:
        queries (List[str]): Test queries
        round1_results (List[Dict]): Results from round 1
        round2_results (List[Dict]): Results from round 2
        reference_answers (List[str], optional): Reference answers

    Returns:
        str: Comparison analysis
    """
    print("\n=== COMPARING RESULTS ===")

    # System prompt to guide the AI's evaluation behavior
    system_prompt = """You are an expert evaluator of RAG systems. Compare responses from two versions:
        1. Standard RAG: No feedback used
        2. Feedback-enhanced RAG: Uses a feedback loop to improve retrieval

        Analyze which version provides better responses in terms of:
        - Relevance to the query
        - Accuracy of information
        - Completeness
        - Clarity and conciseness
    """

    comparisons = []

    # Iterate over each query and its corresponding results from both rounds
    for i, (query, r1, r2) in enumerate(zip(queries, round1_results, round2_results)):
        # Create a prompt for comparing the responses
        comparison_prompt = f"""
        Query: {query}

        Standard RAG Response:
        {r1["response"]}

        Feedback-enhanced RAG Response:
        {r2["response"]}
        """

        # Include reference answer if available
        if reference_answers and i < len(reference_answers):
            comparison_prompt += f"""
            Reference Answer:
            {reference_answers[i]}
            """

        comparison_prompt += """
        Compare these responses and explain which one is better and why.
        Focus specifically on how the feedback loop has (or hasn't) improved the response quality.
        """

        # Call the OpenAI API to generate a comparison analysis
        response = client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": comparison_prompt}
            ],
            temperature=0
        )

        # Append the comparison analysis to the results
        comparisons.append({
            "query": query,
            "analysis": response.choices[0].message.content
        })

        # Print a snippet of the analysis for each query
        print(f"\nQuery {i+1}: {query}")
        print(f"Analysis: {response.choices[0].message.content[:200]}...")

    return comparisons

## 评估

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# AI Document Path
pdf_path = "./drive/MyDrive/colab_data/AI_Information.pdf"

# Define test queries
test_queries = [
    "What is a neural network and how does it function?",

    #################################################################################
    ### Commented out queries to reduce the number of queries for testing purposes ###

    # "Describe the process and applications of reinforcement learning.",
    # "What are the main applications of natural language processing in today's technology?",
    # "Explain the impact of overfitting in machine learning models and how it can be mitigated."
]

# Define reference answers for evaluation
reference_answers = [
    "A neural network is a series of algorithms that attempt to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. It consists of layers of nodes, with each node representing a neuron. Neural networks function by adjusting the weights of connections between nodes based on the error of the output compared to the expected result.",

    ############################################################################################
    #### Commented out reference answers to reduce the number of queries for testing purposes ###

#     "Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions in an environment to maximize cumulative reward. It involves exploration, exploitation, and learning from the consequences of actions. Applications include robotics, game playing, and autonomous vehicles.",
#     "The main applications of natural language processing in today's technology include machine translation, sentiment analysis, chatbots, information retrieval, text summarization, and speech recognition. NLP enables machines to understand and generate human language, facilitating human-computer interaction.",
#     "Overfitting in machine learning models occurs when a model learns the training data too well, capturing noise and outliers. This results in poor generalization to new data, as the model performs well on training data but poorly on unseen data. Mitigation techniques include cross-validation, regularization, pruning, and using more training data."
]

# Run the evaluation
evaluation_results = evaluate_feedback_loop(
    pdf_path=pdf_path,
    test_queries=test_queries,
    reference_answers=reference_answers
)

=== Evaluating Feedback Loop Impact ===

=== ROUND 1: NO FEEDBACK ===

Query 1: What is a neural network and how does it function?
Extracting text from PDF...
Chunking text...
Created 42 text chunks
Creating embeddings for chunks...
Added 42 chunks to the vector store

=== Processing query with feedback-enhanced RAG ===
Query: What is a neural network and how does it function?
Generating response...

=== Response ===
A neural network is a type of artificial intelligence model inspired by the structure and function of the human brain. It uses multiple layers (deep neural networks) to analyze data.

Convolutional Neural Networks (CNNs) are a specific type of neural network particularly effective for processing images and videos. They use convolutional layers to automatically learn features from the input data and are widely used in object detection, facial recognition, and medical image analysis.

Recurrent Neural Networks (RNNs) are designed to process sequential data such as text and t

In [None]:
########################################
# # Run a full RAG workflow
########################################

# # Run an interactive example
# print("\n\n=== INTERACTIVE EXAMPLE ===")
# print("Enter your query about AI:")
# user_query = input()

# # Load accumulated feedback
# all_feedback = load_feedback_data()

# # Run full workflow
# result = full_rag_workflow(
#     pdf_path=pdf_path,
#     query=user_query,
#     feedback_data=all_feedback,
#     fine_tune=True
# )

########################################
# # Run a full RAG workflow
########################################

## 可视化反馈的影响

In [None]:
# Extract the comparison data which contains the analysis of feedback impact
comparisons = evaluation_results['comparison']

# Print out the analysis results to visualize feedback impact
print("\n=== FEEDBACK IMPACT ANALYSIS ===\n")
for i, comparison in enumerate(comparisons):
    print(f"Query {i+1}: {comparison['query']}")
    print(f"\nAnalysis of feedback impact:")
    print(comparison['analysis'])
    print("\n" + "-"*50 + "\n")

# Additionally, we can compare some metrics between rounds
round_responses = [evaluation_results[f'round{round_num}_results'] for round_num in range(1, len(evaluation_results) - 1)]
response_lengths = [[len(r["response"]) for r in round] for round in round_responses]

print("\nResponse length comparison (proxy for completeness):")
avg_lengths = [sum(lengths) / len(lengths) for lengths in response_lengths]
for round_num, avg_len in enumerate(avg_lengths, start=1):
    print(f"Round {round_num}: {avg_len:.1f} chars")

if len(avg_lengths) > 1:
    changes = [(avg_lengths[i] - avg_lengths[i-1]) / avg_lengths[i-1] * 100 for i in range(1, len(avg_lengths))]
    for round_num, change in enumerate(changes, start=2):
        print(f"Change from Round {round_num-1} to Round {round_num}: {change:.1f}%")


=== FEEDBACK IMPACT ANALYSIS ===

Query 1: What is a neural network and how does it function?

Analysis of feedback impact:
In terms of relevance to the query, both the standard RAG response and the feedback-enhanced RAG response are highly relevant. They provide a clear and comprehensive explanation of what a neural network is and how different types (CNNs and RNNs) function.

In terms of accuracy of information, both responses are accurate. They cover the main concepts and characteristics of neural networks, convolutional neural networks, and recurrent neural networks.

In terms of completeness, both responses are quite complete. They provide detailed explanations of the different types of neural networks and their applications.

However, in terms of clarity and conciseness, the reference answer might be slightly better. It provides a more concise and straightforward description of how neural networks function by emphasizing the adjustment of weights based on error. The standard RAG