# Router Query Engine



 Lets start with the simplest form of agentic RAG, a router. 
 
 Given a query, the router will pick one of several query engines to execute a query. 
 
 We will build a simple router over a single document that can handle both question answering as well as summarization. Let's dive in.

In [4]:
!pip3 uninstall llama-index
!pip3 install llama-index --upgrade --no-cache-dir --force-reinstall

Found existing installation: llama-index 0.10.33
Uninstalling llama-index-0.10.33:
  Would remove:
    /Users/wenda/anaconda3/bin/llamaindex-cli
    /Users/wenda/anaconda3/lib/python3.11/site-packages/llama_index-0.10.33.dist-info/*
    /Users/wenda/anaconda3/lib/python3.11/site-packages/llama_index/_bundle/*
Proceed (Y/n)? ^C
[31mERROR: Operation cancelled by user[0m[31m
[0mCollecting llama-index
  Obtaining dependency information for llama-index from https://files.pythonhosted.org/packages/5c/e4/bd320411b7fbc09b6f6efcbaa785ce8ad5d645ae234c612fb1c9ec4bec6f/llama_index-0.10.43-py3-none-any.whl.metadata
  Downloading llama_index-0.10.43-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Obtaining dependency information for llama-index-agent-openai<0.3.0,>=0.1.4 from https://files.pythonhosted.org/packages/98/74/489ef00d285843bb1bddfd9c628997c68ab27ed04a4be1124a517260576c/llama_index_agent_openai-0.2.7-py3-none-any.whl.metadata
  

In [13]:
# installation if version issues
"""
!pip install pydantic==2.7.0
!pip install pydantic_core==2.18.1
!pip install llama_index_readers_file==0.1.19
"""

'\n!pip install pydantic==2.7.0\n!pip install pydantic_core==2.18.1\n!pip install llama_index_readers_file==0.1.19\n'

- import nest_asyncio: Required for async compatibility in Jupyter notebooks.

In [1]:
from dotenv import load_dotenv
import nest_asyncio
import os
nest_asyncio.apply()
load_dotenv()

True

In [2]:
# Access variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

**Loading data**
- Load Sample Document: Use GPT2 paper.
- Read PDF with LlamaIndex: Use a directory reader module.


In [3]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["language_models_are_unsupervised_multitask_learners.pdf"]).load_data()

  from pandas.core import (


**Define LLM and Embedding model**
- Sentence Splitter: Use LlamaIndex to split the document into nodes (*parse document representation*).
  - In order to split these documents into even-sized chunks, we'll split on the order of sentences.
- Split Document: Set chunk size to 1024, and call `splitter.get_nodes_from_documents` to split these documents into nodes.




In [4]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

**Optional Step**
- Alignment and Embedding Model: Set global config (default: 3.5 turbo, tex-embedding 002).
- This allows you to have the groundwork to inject your own LLMs as well as embeddings. 
- We define the settings object and `settings.llm=OpenAI` and `settings.embed_model=OpenAI.Embedding`.

In [5]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

**Define Summary Index and Vector Index over the Same Data**
- We can think of an index as a set of metadata over our data.
- You can query an index, and different indexes will have different retrieval behaviors.
- Summary Index: Returns all nodes in the index.
- Vector Index: 
  - Indexes nodes via text embeddings and is a core abstraction in LlamaIndex and a core abstraction for building any sort of RAG system.
  - Returns nodes by embedding similarity.
- Query Engines: 
  - Also a very simple index, convert indexes into query engines.
  - querying it will return all the nodes currently in the index, so it doesn't necessarily depend on the user query but will return all the nodes currently in the index.

In [6]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

**Define Query Engines and Set Metadata**
- Now let's turn these indexes into query engines and then query tools.
- Each query engine represents an overall query interface over the data stored in this index and combines retrieval with LLM synthesis.
- Each query engine is good for a certain type of question, and this is a great use case for a router, which can route dynamically between these different query engines.
- A query tool now is just the query engine with **metadata**, specifically a description of what types of questions the tool can answer.
- Summary Query Engine: Faster query generation using async.
- Vector Query Engine: For retrieving specific context from the document.
- For the summary query engine, we set use_async to true to enforce faster query generation by leveraging async capabilities.


In [7]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

- A query tool is just the query engine with metadata, specifically a description of what types of questions the tool can answer. 
- We'll define a query tool for both the summary and vector query engines. 
- summary tool description is useful for summarization questions related to document and 
- vector tool description is useful for retrieving specific context from the document.

In [8]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to GPT2 paper"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the GPT2 paper."
    ),
)

**Define Router Query Engine**
- Now that we have our query engines and tools, we're ready to define our router. 
- LlamaIndex provides several different types of selectors to enable you to build a router. Each of these selectors has distinct attributes. 
  - Option1: LLM selector,  it involves prompting an LLM to output a JSON that is then parsed, and the corresponding indexes are queried. 
  - Option2: Pydantic selectors, here instead of directly prompting the LLM with text, we use the function calling APIs supported by models like OpenAI to produce pydantic selection objects, rather than parsing raw JSON.
- For each of these types of selectors, we also have the dynamic capabilities to let you select one index to route to, or multiple.

Let's try an LLM-powered single selector called LLM single selector. We import two modules: 
- Router query engine
  - The router query engine takes in a selector type as well as a set of query engine tools.
- LLM single selector
  - The selector type is the LLM single selector, which means it prompts the LLM, makes a single selection, and the query engine tools include the summarization tool and the vector tool.

In [9]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [10]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to GPT2 paper.
[0mThe document covers the training and performance analysis of large language models, specifically focusing on the GPT-2 model trained on diverse datasets. It discusses the model's zero-shot capabilities across various NLP tasks, its potential for multitask learning, and the analysis of data overlap between training and test sets. Additionally, it explores the model's performance, fine-tuning potential, and the need for further research in understanding language models' capabilities.


### TLDR

In [11]:

from utils import get_router_query_engine
query_engine = get_router_query_engine("language_models_are_unsupervised_multitask_learners.pdf")


In [12]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: Ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results were not explicitly mentioned in the provided context information.
