# 用于RAG的分级索引

实现一种用于RAG系统的分级索引方法(Hierarchical Indices)。这种技术通过使用两级搜索方法来提高检索效果：首先通过摘要识别相关的文档部分，然后从这些部分中检索具体细节。

传统的RAG方法将所有文本块一视同仁，这可能导致：

- 当文本块过小时，上下文信息丢失
- 当文档集合较大时，检索结果无关
- 在整个语料库中搜索效率低下

分级检索解决了这些问题，具体方式如下：

- 为较大的文档部分创建简洁的摘要
- 首先搜索这些摘要以确定相关部分
- 然后仅从这些部分中检索详细信息
- 在保留具体细节的同时保持上下文信息

实现步骤：

- 从 PDF 中提取页面
- 为每一页创建摘要，将摘要文本和元数据添加到摘要列表中
- 为每一页创建详细块，将页面的文本切分为块
- 为以上两个创建嵌入，并行其存入向量存储中
- 使用查询分层检索相关块：先检索相关的摘要，收集来自相关摘要的页面，然后过滤掉不是相关页面的块，从这些相关页面检索详细块
- 根据检索到的块生成回答

<div style="text-align: center;">

<img src="images/hierarchical_indices.svg" alt="hierarchical_indices" style="width:50%; height:auto;">
</div>

<div style="text-align: center;">

<img src="images/hierarchical_indices_example.svg" alt="hierarchical_indices" style="width:100%; height:auto;">
</div>

## 环境配置

In [1]:
import os
import numpy as np
import json
import fitz
from openai import OpenAI
import re
import pickle

## OpenAI Client

In [None]:
# colab环境
from google.colab import userdata
# 使用火山引擎
api_key = userdata.get("ARK_API_KEY")
base_url = userdata.get("ARK_BASE_URL")

In [None]:
text_model = "doubao-1-5-lite-32k-250115"
image_model = "doubao-1.5-vision-lite-250315"
embedding_model = "doubao-embedding-large-text-240915"

In [None]:
client = OpenAI(
    base_url=base_url,
    api_key=api_key
)

## 文档处理函数

In [None]:
def extract_text_from_pdf(pdf_path):
    """
    从PDF文件中提取文本内容，并按页分离。

    Args:
        pdf_path (str): PDF文件的路径

    Returns:
        List[Dict]: 包含文本内容和元数据的页面列表
    """
    print(f"正在提取文本 {pdf_path}...")  # 打印正在处理的PDF路径
    pdf = fitz.open(pdf_path)  # 使用PyMuPDF打开PDF文件
    pages = []  # 初始化一个空列表，用于存储包含文本内容的页面

    # 遍历PDF中的每一页
    for page_num in range(len(pdf)):
        page = pdf[page_num]  # 获取当前页
        text = page.get_text()  # 从当前页提取文本

        # 跳过文本非常少的页面（少于50个字符）
        if len(text.strip()) > 50:
            # 将页面文本和元数据添加到列表中
            pages.append({
                "text": text,
                "metadata": {
                    "source": pdf_path,  # 源文件路径
                    "page": page_num + 1  # 页面编号（从1开始）
                }
            })

    print(f"已提取 {len(pages)} 页的内容")  # 打印已提取的页面数量
    return pages  # 返回包含文本内容和元数据的页面列表

In [None]:
def chunk_text(text, metadata, chunk_size=1000, overlap=200):
    """
    将文本分割为重叠的块，同时保留元数据。

    Args:
        text (str): 要分割的输入文本
        metadata (Dict): 要保留的元数据
        chunk_size (int): 每个块的大小（以字符为单位）
        overlap (int): 块之间的重叠大小（以字符为单位）

    Returns:
        List[Dict]: 包含元数据的文本块列表
    """
    chunks = []  # 初始化一个空列表，用于存储块

    # 按指定的块大小和重叠量遍历文本
    for i in range(0, len(text), chunk_size - overlap):
        chunk_text = text[i:i + chunk_size]  # 提取文本块

        # 跳过非常小的块（少于50个字符）
        if chunk_text and len(chunk_text.strip()) > 50:
            # 创建元数据的副本，并添加块特定的信息
            chunk_metadata = metadata.copy()
            chunk_metadata.update({
                "chunk_index": len(chunks),  # 块的索引
                "start_char": i,  # 块的起始字符索引
                "end_char": i + len(chunk_text),  # 块的结束字符索引
                "is_summary": False  # 标志，表示这不是摘要
            })

            # 将带有元数据的块添加到列表中
            chunks.append({
                "text": chunk_text,
                "metadata": chunk_metadata
            })

    return chunks  # 返回带有元数据的块列表

## 向量存储

In [None]:
class SimpleVectorStore:
    """
    使用NumPy实现的简单向量存储。
    """

    def __init__(self):
        """
        初始化向量存储。
        """
        self.vectors = []  # 用于存储嵌入向量的列表
        self.texts = []  # 用于存储原始文本的列表
        self.metadata = []  # 用于存储每个文本元数据的列表

    def add_item(self, text, embedding, metadata=None):
        """
        向向量存储中添加一个项目。

        Args:
            text (str): 原始文本。
            embedding (List[float]): 嵌入向量。
            metadata (dict, optional): 额外的元数据。
        """
        self.vectors.append(np.array(embedding))  # 将嵌入转换为numpy数组并添加到向量列表中
        self.texts.append(text)  # 将原始文本添加到文本列表中
        self.metadata.append(metadata or {})  # 添加元数据到元数据列表中，如果没有提供则使用空字典

    def similarity_search(self, query_embedding, k=5, filter_func=None):
        """
        查找与查询嵌入最相似的项目。

        Args:
            query_embedding (List[float]): 查询嵌入向量。
            k (int): 返回的结果数量。
            filter_func (callable, optional): 可选的过滤函数，用于筛选元数据。

        Returns:
            List[Dict]: 包含文本和元数据的前k个最相似项。
        """
        if not self.vectors:
            return []  # 如果没有存储向量，则返回空列表

        # 将查询嵌入转换为numpy数组
        query_vector = np.array(query_embedding)

        # 使用余弦相似度计算相似度
        similarities = []
        for i, vector in enumerate(self.vectors):
            # 如果存在过滤函数且该元数据不符合条件，则跳过该项
            if filter_func and not filter_func(self.metadata[i]):
                continue
            # 计算查询向量与存储向量之间的余弦相似度
            similarity = np.dot(query_vector, vector) / (np.linalg.norm(query_vector) * np.linalg.norm(vector))
            similarities.append((i, similarity))  # 添加索引和相似度分数

        # 按相似度排序（降序）
        similarities.sort(key=lambda x: x[1], reverse=True)

        # 返回前k个结果
        results = []
        for i in range(min(k, len(similarities))):
            idx, score = similarities[i]
            results.append({
                "text": self.texts[idx],  # 添加对应的文本
                "metadata": self.metadata[idx],  # 添加对应的元数据
                "similarity": score  # 添加相似度分数
            })

        return results  # 返回前k个最相似项的列表

## 创建嵌入

In [None]:
def create_embeddings(texts, model=None):
    """
    为给定文本创建嵌入向量。

    Args:
        texts (List[str]): 输入文本列表
        model (str): 嵌入模型名称

    Returns:
        List[List[float]]: 嵌入向量列表
    """

    model = model or embedding_model  # 如果未指定模型，则使用默认的嵌入模型

    # 处理空输入的情况
    if not texts:
        return []

    # 分批次处理（OpenAI API 的限制）
    batch_size = 10
    all_embeddings = []

    # 遍历输入文本，按批次生成嵌入
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]  # 获取当前批次的文本

        # 调用 OpenAI 接口生成嵌入
        response = client.embeddings.create(
            model=model,
            input=batch
        )

        # 提取当前批次的嵌入向量
        batch_embeddings = [item.embedding for item in response.data]
        all_embeddings.extend(batch_embeddings)  # 将当前批次的嵌入向量加入总列表

    return all_embeddings  # 返回所有嵌入向量

## 摘要函数

In [None]:
def generate_page_summary(page_text):
    """
    生成页面的简洁摘要。

    Args:
        page_text (str): 页面的文本内容

    Returns:
        str: 生成的摘要
    """
    # Define the system prompt to instruct the summarization model
    system_prompt = """You are an expert summarization system.
    Create a detailed summary of the provided text. 
    Focus on capturing the main topics, key information, and important facts.
    Your summary should be comprehensive enough to understand what the page contains
    but more concise than the original."""

    # 如果输入文本超过最大令牌限制，则截断
    max_tokens = 6000
    truncated_text = page_text[:max_tokens] if len(page_text) > max_tokens else page_text

    # 向OpenAI API发出请求以生成摘要
    response = client.chat.completions.create(
        model=text_model,  # Specify the model to use
        messages=[
            {"role": "system", "content": system_prompt},  # System message to guide the assistant
            {"role": "user", "content": f"Please summarize this text:\n\n{truncated_text}"}  # User message with the text to summarize
        ],
        temperature=0.3  # Set the temperature for response generation
    )
    
    # Return the generated summary content
    return response.choices[0].message.content

## 分级文档处理

In [None]:
def process_document_hierarchically(pdf_path, chunk_size=1000, chunk_overlap=200):
    """
    将文档处理为分层索引。

    Args:
        pdf_path (str): PDF 文件的路径
        chunk_size (int): 每个详细块的大小
        chunk_overlap (int): 块之间的重叠量

    Returns:
        Tuple[SimpleVectorStore, SimpleVectorStore]: 摘要和详细向量存储
    """
    # Extract pages from PDF
    pages = extract_text_from_pdf(pdf_path)
    
    # Create summaries for each page
    print("Generating page summaries...")
    summaries = []
    for i, page in enumerate(pages):
        print(f"Summarizing page {i+1}/{len(pages)}...")
        summary_text = generate_page_summary(page["text"])
        
        # Create summary metadata
        summary_metadata = page["metadata"].copy()
        summary_metadata.update({"is_summary": True})
        
        # Append the summary text and metadata to the summaries list
        summaries.append({
            "text": summary_text,
            "metadata": summary_metadata
        })
    
    # Create detailed chunks for each page
    detailed_chunks = []
    for page in pages:
        # Chunk the text of the page
        page_chunks = chunk_text(
            page["text"], 
            page["metadata"], 
            chunk_size, 
            chunk_overlap
        )
        # Extend the detailed_chunks list with the chunks from the current page
        detailed_chunks.extend(page_chunks)
    
    print(f"Created {len(detailed_chunks)} detailed chunks")
    
    # Create embeddings for summaries
    print("Creating embeddings for summaries...")
    summary_texts = [summary["text"] for summary in summaries]
    summary_embeddings = create_embeddings(summary_texts)
    
    # Create embeddings for detailed chunks
    print("Creating embeddings for detailed chunks...")
    chunk_texts = [chunk["text"] for chunk in detailed_chunks]
    chunk_embeddings = create_embeddings(chunk_texts)
    
    # Create vector stores
    summary_store = SimpleVectorStore()
    detailed_store = SimpleVectorStore()
    
    # Add summaries to summary store
    for i, summary in enumerate(summaries):
        summary_store.add_item(
            text=summary["text"],
            embedding=summary_embeddings[i],
            metadata=summary["metadata"]
        )
    
    # Add chunks to detailed store
    for i, chunk in enumerate(detailed_chunks):
        detailed_store.add_item(
            text=chunk["text"],
            embedding=chunk_embeddings[i],
            metadata=chunk["metadata"]
        )
    
    print(f"Created vector stores with {len(summaries)} summaries and {len(detailed_chunks)} chunks")
    return summary_store, detailed_store

## 分级检索

In [None]:
def retrieve_hierarchically(query, summary_store, detailed_store, k_summaries=3, k_chunks=5):
    """
    使用分层索引检索信息。

    Args:
        query (str): 用户查询
        summary_store (SimpleVectorStore): 文档摘要存储
        detailed_store (SimpleVectorStore): 详细块存储
        k_summaries (int): 要检索的摘要数量
        k_chunks (int): 每个摘要要检索的块数量

    Returns:
        List[Dict]: 检索到的带有相关性分数的块
    """
    print(f"Performing hierarchical retrieval for query: {query}")
    
    # Create query embedding
    query_embedding = create_embeddings(query)
    
    # First, retrieve relevant summaries
    summary_results = summary_store.similarity_search(
        query_embedding, 
        k=k_summaries
    )
    
    print(f"Retrieved {len(summary_results)} relevant summaries")
    
    # Collect pages from relevant summaries
    relevant_pages = [result["metadata"]["page"] for result in summary_results]
    
    # Create a filter function to only keep chunks from relevant pages
    def page_filter(metadata):
        return metadata["page"] in relevant_pages
    
    # Then, retrieve detailed chunks from only those relevant pages
    detailed_results = detailed_store.similarity_search(
        query_embedding, 
        k=k_chunks * len(relevant_pages),
        filter_func=page_filter
    )
    
    print(f"Retrieved {len(detailed_results)} detailed chunks from relevant pages")
    
    # For each result, add which summary/page it came from
    for result in detailed_results:
        page = result["metadata"]["page"]
        matching_summaries = [s for s in summary_results if s["metadata"]["page"] == page]
        if matching_summaries:
            result["summary"] = matching_summaries[0]["text"]
    
    return detailed_results

## 利用上下文生成回答

In [10]:
def generate_response(query, retrieved_chunks):
    """
    Generate a response based on the query and retrieved chunks.
    
    Args:
        query (str): User query
        retrieved_chunks (List[Dict]): Retrieved chunks from hierarchical search
        
    Returns:
        str: Generated response
    """
    # Extract text from chunks and prepare context parts
    context_parts = []
    
    for i, chunk in enumerate(retrieved_chunks):
        page_num = chunk["metadata"]["page"]  # Get the page number from metadata
        context_parts.append(f"[Page {page_num}]: {chunk['text']}")  # Format the chunk text with page number
    
    # Combine all context parts into a single context string
    context = "\n\n".join(context_parts)
    
    # Define the system message to guide the AI assistant
    system_message = """You are a helpful AI assistant answering questions based on the provided context.
Use the information from the context to answer the user's question accurately.
If the context doesn't contain relevant information, acknowledge that.
Include page numbers when referencing specific information."""

    # Generate the response using the OpenAI API
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",  # Specify the model to use
        messages=[
            {"role": "system", "content": system_message},  # System message to guide the assistant
            {"role": "user", "content": f"Context:\n\n{context}\n\nQuestion: {query}"}  # User message with context and query
        ],
        temperature=0.2  # Set the temperature for response generation
    )
    
    # Return the generated response content
    return response.choices[0].message.content

## 用分级检索实现完整的RAG流程

In [11]:
def hierarchical_rag(query, pdf_path, chunk_size=1000, chunk_overlap=200, 
                    k_summaries=3, k_chunks=5, regenerate=False):
    """
    Complete hierarchical RAG pipeline.
    
    Args:
        query (str): User query
        pdf_path (str): Path to the PDF document
        chunk_size (int): Size of each detailed chunk
        chunk_overlap (int): Overlap between chunks
        k_summaries (int): Number of summaries to retrieve
        k_chunks (int): Number of chunks to retrieve per summary
        regenerate (bool): Whether to regenerate vector stores
        
    Returns:
        Dict: Results including response and retrieved chunks
    """
    # Create store filenames for caching
    summary_store_file = f"{os.path.basename(pdf_path)}_summary_store.pkl"
    detailed_store_file = f"{os.path.basename(pdf_path)}_detailed_store.pkl"
    
    # Process document and create stores if needed
    if regenerate or not os.path.exists(summary_store_file) or not os.path.exists(detailed_store_file):
        print("Processing document and creating vector stores...")
        # Process the document to create hierarchical indices and vector stores
        summary_store, detailed_store = process_document_hierarchically(
            pdf_path, chunk_size, chunk_overlap
        )
        
        # Save the summary store to a file for future use
        with open(summary_store_file, 'wb') as f:
            pickle.dump(summary_store, f)
        
        # Save the detailed store to a file for future use
        with open(detailed_store_file, 'wb') as f:
            pickle.dump(detailed_store, f)
    else:
        # Load existing summary store from file
        print("Loading existing vector stores...")
        with open(summary_store_file, 'rb') as f:
            summary_store = pickle.load(f)
        
        # Load existing detailed store from file
        with open(detailed_store_file, 'rb') as f:
            detailed_store = pickle.load(f)
    
    # Retrieve relevant chunks hierarchically using the query
    retrieved_chunks = retrieve_hierarchically(
        query, summary_store, detailed_store, k_summaries, k_chunks
    )
    
    # Generate a response based on the retrieved chunks
    response = generate_response(query, retrieved_chunks)
    
    # Return results including the query, response, retrieved chunks, and counts of summaries and detailed chunks
    return {
        "query": query,
        "response": response,
        "retrieved_chunks": retrieved_chunks,
        "summary_count": len(summary_store.texts),
        "detailed_count": len(detailed_store.texts)
    }

## 标准 RAG（非分级，用于对比）

In [12]:
def standard_rag(query, pdf_path, chunk_size=1000, chunk_overlap=200, k=15):
    """
    Standard RAG pipeline without hierarchical retrieval.
    
    Args:
        query (str): User query
        pdf_path (str): Path to the PDF document
        chunk_size (int): Size of each chunk
        chunk_overlap (int): Overlap between chunks
        k (int): Number of chunks to retrieve
        
    Returns:
        Dict: Results including response and retrieved chunks
    """
    # Extract pages from the PDF document
    pages = extract_text_from_pdf(pdf_path)
    
    # Create chunks directly from all pages
    chunks = []
    for page in pages:
        # Chunk the text of the page
        page_chunks = chunk_text(
            page["text"], 
            page["metadata"], 
            chunk_size, 
            chunk_overlap
        )
        # Extend the chunks list with the chunks from the current page
        chunks.extend(page_chunks)
    
    print(f"Created {len(chunks)} chunks for standard RAG")
    
    # Create a vector store to hold the chunks
    store = SimpleVectorStore()
    
    # Create embeddings for the chunks
    print("Creating embeddings for chunks...")
    texts = [chunk["text"] for chunk in chunks]
    embeddings = create_embeddings(texts)
    
    # Add chunks to the vector store
    for i, chunk in enumerate(chunks):
        store.add_item(
            text=chunk["text"],
            embedding=embeddings[i],
            metadata=chunk["metadata"]
        )
    
    # Create an embedding for the query
    query_embedding = create_embeddings(query)
    
    # Retrieve the most relevant chunks based on the query embedding
    retrieved_chunks = store.similarity_search(query_embedding, k=k)
    print(f"Retrieved {len(retrieved_chunks)} chunks with standard RAG")
    
    # Generate a response based on the retrieved chunks
    response = generate_response(query, retrieved_chunks)
    
    # Return the results including the query, response, and retrieved chunks
    return {
        "query": query,
        "response": response,
        "retrieved_chunks": retrieved_chunks
    }

## 评估函数

In [13]:
def compare_approaches(query, pdf_path, reference_answer=None):
    """
    Compare hierarchical and standard RAG approaches.
    
    Args:
        query (str): User query
        pdf_path (str): Path to the PDF document
        reference_answer (str, optional): Reference answer for evaluation
        
    Returns:
        Dict: Comparison results
    """
    print(f"\n=== Comparing RAG approaches for query: {query} ===")
    
    # Run hierarchical RAG
    print("\nRunning hierarchical RAG...")
    hierarchical_result = hierarchical_rag(query, pdf_path)
    hier_response = hierarchical_result["response"]
    
    # Run standard RAG
    print("\nRunning standard RAG...")
    standard_result = standard_rag(query, pdf_path)
    std_response = standard_result["response"]
    
    # Compare results from hierarchical and standard RAG
    comparison = compare_responses(query, hier_response, std_response, reference_answer)
    
    # Return a dictionary with the comparison results
    return {
        "query": query,  # The original query
        "hierarchical_response": hier_response,  # Response from hierarchical RAG
        "standard_response": std_response,  # Response from standard RAG
        "reference_answer": reference_answer,  # Reference answer for evaluation
        "comparison": comparison,  # Comparison analysis
        "hierarchical_chunks_count": len(hierarchical_result["retrieved_chunks"]),  # Number of chunks retrieved by hierarchical RAG
        "standard_chunks_count": len(standard_result["retrieved_chunks"])  # Number of chunks retrieved by standard RAG
    }

In [14]:
def compare_responses(query, hierarchical_response, standard_response, reference=None):
    """
    Compare responses from hierarchical and standard RAG.
    
    Args:
        query (str): User query
        hierarchical_response (str): Response from hierarchical RAG
        standard_response (str): Response from standard RAG
        reference (str, optional): Reference answer
        
    Returns:
        str: Comparison analysis
    """
    # Define the system prompt to instruct the model on how to evaluate the responses
    system_prompt = """You are an expert evaluator of information retrieval systems. 
Compare the two responses to the same query, one generated using hierarchical retrieval
and the other using standard retrieval.

Evaluate them based on:
1. Accuracy: Which response provides more factually correct information?
2. Comprehensiveness: Which response better covers all aspects of the query?
3. Coherence: Which response has better logical flow and organization?
4. Page References: Does either response make better use of page references?

Be specific in your analysis of the strengths and weaknesses of each approach."""

    # Create the user prompt with the query and both responses
    user_prompt = f"""Query: {query}

Response from Hierarchical RAG:
{hierarchical_response}

Response from Standard RAG:
{standard_response}"""

    # If a reference answer is provided, include it in the user prompt
    if reference:
        user_prompt += f"""

Reference Answer:
{reference}"""

    # Add the final instruction to the user prompt
    user_prompt += """

Please provide a detailed comparison of these two responses, highlighting which approach performed better and why."""

    # Make a request to the OpenAI API to generate the comparison analysis
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",
        messages=[
            {"role": "system", "content": system_prompt},  # System message to guide the assistant
            {"role": "user", "content": user_prompt}  # User message with the query and responses
        ],
        temperature=0  # Set the temperature for response generation
    )
    
    # Return the generated comparison analysis
    return response.choices[0].message.content

In [15]:
def run_evaluation(pdf_path, test_queries, reference_answers=None):
    """
    Run a complete evaluation with multiple test queries.
    
    Args:
        pdf_path (str): Path to the PDF document
        test_queries (List[str]): List of test queries
        reference_answers (List[str], optional): Reference answers for queries
        
    Returns:
        Dict: Evaluation results
    """
    results = []  # Initialize an empty list to store results
    
    # Iterate over each query in the test queries
    for i, query in enumerate(test_queries):
        print(f"Query: {query}")  # Print the current query
        
        # Get reference answer if available
        reference = None
        if reference_answers and i < len(reference_answers):
            reference = reference_answers[i]  # Retrieve the reference answer for the current query
        
        # Compare hierarchical and standard RAG approaches
        result = compare_approaches(query, pdf_path, reference)
        results.append(result)  # Append the result to the results list
    
    # Generate overall analysis of the evaluation results
    overall_analysis = generate_overall_analysis(results)
    
    return {
        "results": results,  # Return the individual results
        "overall_analysis": overall_analysis  # Return the overall analysis
    }

In [16]:
def generate_overall_analysis(results):
    """
    Generate an overall analysis of the evaluation results.
    
    Args:
        results (List[Dict]): Results from individual query evaluations
        
    Returns:
        str: Overall analysis
    """
    # Define the system prompt to instruct the model on how to evaluate the results
    system_prompt = """You are an expert at evaluating information retrieval systems.
Based on multiple test queries, provide an overall analysis comparing hierarchical RAG 
with standard RAG.

Focus on:
1. When hierarchical retrieval performs better and why
2. When standard retrieval performs better and why
3. The overall strengths and weaknesses of each approach
4. Recommendations for when to use each approach"""

    # Create a summary of the evaluations
    evaluations_summary = ""
    for i, result in enumerate(results):
        evaluations_summary += f"Query {i+1}: {result['query']}\n"
        evaluations_summary += f"Hierarchical chunks: {result['hierarchical_chunks_count']}, Standard chunks: {result['standard_chunks_count']}\n"
        evaluations_summary += f"Comparison summary: {result['comparison'][:200]}...\n\n"

    # Define the user prompt with the evaluations summary
    user_prompt = f"""Based on the following evaluations comparing hierarchical vs standard RAG across {len(results)} queries, 
provide an overall analysis of these two approaches:

{evaluations_summary}

Please provide a comprehensive analysis of the relative strengths and weaknesses of hierarchical RAG 
compared to standard RAG, with specific focus on retrieval quality and response generation."""

    # Make a request to the OpenAI API to generate the overall analysis
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.2-3B-Instruct",
        messages=[
            {"role": "system", "content": system_prompt},  # System message to guide the assistant
            {"role": "user", "content": user_prompt}  # User message with the evaluations summary
        ],
        temperature=0  # Set the temperature for response generation
    )
    
    # Return the generated overall analysis
    return response.choices[0].message.content

## 分级RAG与标准RAG方法的评估

In [17]:
# Path to the PDF document containing AI information
pdf_path = "data/AI_Information.pdf"

# Example query about AI for testing the hierarchical RAG approach
query = "What are the key applications of transformer models in natural language processing?"
result = hierarchical_rag(query, pdf_path)

print("\n=== Response ===")
print(result["response"])

# Test query for formal evaluation (using only one query as requested)
test_queries = [
    "How do transformers handle sequential data compared to RNNs?"
]

# Reference answer for the test query to enable comparison
reference_answers = [
    "Transformers handle sequential data differently from RNNs by using self-attention mechanisms instead of recurrent connections. This allows transformers to process all tokens in parallel rather than sequentially, capturing long-range dependencies more efficiently and enabling better parallelization during training. Unlike RNNs, transformers don't suffer from vanishing gradient problems with long sequences."
]

# Run the evaluation comparing hierarchical and standard RAG approaches
evaluation_results = run_evaluation(
    pdf_path=pdf_path,
    test_queries=test_queries,
    reference_answers=reference_answers
)

# Print the overall analysis of the comparison
print("\n=== OVERALL ANALYSIS ===")
print(evaluation_results["overall_analysis"])

Processing document and creating vector stores...
Extracting text from data/AI_Information.pdf...
Extracted 15 pages with content
Generating page summaries...
Summarizing page 1/15...
Summarizing page 2/15...
Summarizing page 3/15...
Summarizing page 4/15...
Summarizing page 5/15...
Summarizing page 6/15...
Summarizing page 7/15...
Summarizing page 8/15...
Summarizing page 9/15...
Summarizing page 10/15...
Summarizing page 11/15...
Summarizing page 12/15...
Summarizing page 13/15...
Summarizing page 14/15...
Summarizing page 15/15...
Created 47 detailed chunks
Creating embeddings for summaries...
Creating embeddings for detailed chunks...
Created vector stores with 15 summaries and 47 chunks
Performing hierarchical retrieval for query: What are the key applications of transformer models in natural language processing?


  "similarity": float(score)  # Add the similarity score


Retrieved 3 relevant summaries
Retrieved 10 detailed chunks from relevant pages

=== Response ===
I couldn't find any information about transformer models in the provided context. The context appears to focus on various applications of Artificial Intelligence (AI) and Machine Learning (ML), including computer vision, deep learning, reinforcement learning, and more. However, transformer models are not mentioned.

If you're looking for information on transformer models, I'd be happy to try and help you find it. Alternatively, if you have any other questions based on the provided context, I'd be happy to try and assist you.
Query: How do transformers handle sequential data compared to RNNs?

=== Comparing RAG approaches for query: How do transformers handle sequential data compared to RNNs? ===

Running hierarchical RAG...
Loading existing vector stores...
Performing hierarchical retrieval for query: How do transformers handle sequential data compared to RNNs?
Retrieved 3 relevant summari