# bRAG: Routing and Query Construction

![route_query_construction](./image/route_query_construction.png)

## Pre-requisites (optional but recommended)

### Only do the first step if you have never created a virtual environment for this repository. Otherwise, make sure that the Python Kernel that you selected is from your `venv/` folder.

In [31]:
# Create virtual environment
! python3 -m venv ../venv

In [32]:
# Activate virtual Python environment
! source ../venv/bin/activate

In [33]:
# If your Python is not from your venv path, ensure that your IDE's kernel selection (on the top right corner) is set to the correct path 
# (your path output should contain "...venv/bin/python")

! which python

/Users/taha/Desktop/bRAGAI/code/gh/bRAG-langchain/venv/bin/python


In [34]:
# Install all packages
! pip3 install -r ../requirements.txt --quiet

### * If you choose to skip the pre-requisites and install only the packages specific to this notebook using your global Python path environment, execute the command below; otherwise, proceed to the next step.

In [35]:
! pip3 install --quiet langchain_community tiktoken langchain-openai langchainhub chromadb langchain youtube-transcript-api pytube yt_dlp

## Environment

`(1) Packages`

In [1]:
import os
from dotenv import load_dotenv

# Load all environment variables from .env file
load_dotenv()

# Access the environment variables
langchain_tracing_v2 = os.getenv('LANGCHAIN_TRACING_V2')
langchain_endpoint = os.getenv('LANGCHAIN_ENDPOINT')
langchain_api_key = os.getenv('LANGCHAIN_API_KEY')

## LLM
openai_api_key = os.getenv('OPENAI_API_KEY')


`(2) LangSmith`

https://docs.smith.langchain.com/

In [2]:
os.environ['LANGCHAIN_TRACING_V2'] = langchain_tracing_v2
os.environ['LANGCHAIN_ENDPOINT'] = langchain_endpoint
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

`(3) API Keys`

In [30]:
os.environ['OPENAI_API_KEY'] = openai_api_key
openai_model = "gpt-3.5-turbo"

## bRAG: Logical and Semantic routing 

Use function-calling for classification.

Flow: 

![routing](./image/routing.png)

Docs:

https://python.langchain.com/docs/how_to/routing/ 

In [5]:
from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

# Data model
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""

    datasource: Literal["python_docs", "js_docs", "golang_docs"] = Field(
        ...,
        description="Given a user question choose which datasource would be most relevant for answering their question",
    )

# LLM with function call 
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(RouteQuery)

# Prompt 
system = """You are an expert at routing a user question to the appropriate data source.

Based on the programming language the question is referring to, route it to the relevant data source."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)

# Define router 
router = prompt | structured_llm


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


Note: we used function calling to produce structured output.

![structured_output](./image/structured_output.png)

In [6]:
question = """Why doesn't the following code work:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(["human", "speak in {language}"])
prompt.invoke("french")
"""

result = router.invoke({"question": question})

In [7]:
result

RouteQuery(datasource='python_docs')

In [8]:
result.datasource

'python_docs'

Once we have this, it is trivial to define a branch that uses `result.datasource`

https://python.langchain.com/docs/how_to/routing/

In [11]:
def choose_route(result):
    if "python_docs" in result.datasource.lower():
        ### Logic here 
        return "chain for python_docs"
    elif "js_docs" in result.datasource.lower():
        ### Logic here 
        return "chain for js_docs"
    else:
        ### Logic here 
        return "golang_docs"

from langchain_core.runnables import RunnableLambda

full_chain = router | RunnableLambda(choose_route)

In [12]:
full_chain.invoke({"question": question})

'chain for python_docs'

Trace:

https://smith.langchain.com/public/c2ca61b4-3810-45d0-a156-3d6a73e9ee2a/r

### Semantic routing

Flow:

![semantic_routing](./image/semantic_routing.png)

Docs:

https://python.langchain.com/docs/how_to/routing/

In [13]:
from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Two prompts
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

# Embed prompts
embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)

# Route question to prompt 
def prompt_router(input):
    # Embed question
    query_embedding = embeddings.embed_query(input["query"])
    # Compute similarity
    similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
    most_similar = prompt_templates[similarity.argmax()]
    # Chosen prompt 
    print("Using MATH" if most_similar == math_template else "Using PHYSICS")
    return PromptTemplate.from_template(most_similar)


chain = (
    {"query": RunnablePassthrough()}
    | RunnableLambda(prompt_router)
    | ChatOpenAI()
    | StrOutputParser()
)

print(chain.invoke("What's a black hole"))

Using PHYSICS
A black hole is a region in space where the gravitational pull is so strong that nothing, not even light, can escape from it. It is formed when a massive star collapses in on itself. The center of a black hole is called a singularity, where the mass is concentrated in an infinitely small space. Black holes can be of various sizes, with the supermassive ones found at the centers of galaxies being the largest.


Trace: 

https://smith.langchain.com/public/98c25405-2631-4de8-b12a-1891aded3359/r

# bRAG: Query Construction

![query_construction](./image/query_construction.png)

For graph and SQL, see helpful resources:

https://blog.langchain.dev/query-construction/

https://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/

## bRAG: Query structuring for metadata filters

Flow:

![metadata](./image/metadata.png)

Many vectorstores contain metadata fields. 

This makes it possible to filter for specific chunks based on metadata.

Let's look at some example metadata we might see in a database of YouTube transcripts.

Docs:

https://python.langchain.com/v0.1/docs/use_cases/query_analysis/

In [21]:
from langchain_community.document_loaders import YoutubeLoader
import yt_dlp

def fetch_video_info(url: str):
    """Fetch metadata of a YouTube video using yt-dlp."""
    ydl_opts = {
        "quiet": True,
        "format": "best",
    }
    with yt_dlp.YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=False)
    return info

# Fetch metadata using yt-dlp
video_url = "https://www.youtube.com/watch?v=pbAd8O1Lvm4"
video_info = fetch_video_info(video_url)

# Add metadata to the YoutubeLoader
loader = YoutubeLoader.from_youtube_url(
    video_url, add_video_info=False
)

# Manually attach metadata from yt-dlp
docs = loader.load()
for doc in docs:
    doc.metadata.update({
        "title": video_info.get("title", "Unknown"),
        "description": video_info.get("description", "No description available"),
        "uploader": video_info.get("uploader", "Unknown uploader"),
        "upload_date": video_info.get("upload_date", "Unknown date"),
    })

# Print the metadata
print(docs[0].metadata)


{'source': 'pbAd8O1Lvm4', 'title': 'Self-reflective RAG with LangGraph: Self-RAG and CRAG', 'description': 'Self-reflection can greatly enhance RAG, enabling correction of poor quality retrieval or generations. Several recent RAG papers focus on this theme, but implementing the ideas can be tricky. Here, we show that LangGraph can be easily used for "flow engineering" of self-reflective RAG pipelines. We provide cookbooks for implementing ideas from two interesting papers, Self-RAG and C-RAG.\n\nCode:\nhttps://github.com/langchain-ai/langgraph/tree/main/examples/rag', 'uploader': 'LangChain', 'upload_date': '20240207'}


Let’s assume we’ve built an index that:

1. Allows us to perform unstructured search over the `contents` and `title` of each document
2. And to use range filtering on `view count`, `publication date`, and `length`.

We want to convert natural langugae into structured search queries.

We can define a schema for structured search queries.

In [22]:
import datetime
from typing import Literal, Optional, Tuple
from langchain_core.pydantic_v1 import BaseModel, Field

class TutorialSearch(BaseModel):
    """Search over a database of tutorial videos about a software library."""

    content_search: str = Field(
        ...,
        description="Similarity search query applied to video transcripts.",
    )
    title_search: str = Field(
        ...,
        description=(
            "Alternate version of the content search query to apply to video titles. "
            "Should be succinct and only include key words that could be in a video "
            "title."
        ),
    )
    min_view_count: Optional[int] = Field(
        None,
        description="Minimum view count filter, inclusive. Only use if explicitly specified.",
    )
    max_view_count: Optional[int] = Field(
        None,
        description="Maximum view count filter, exclusive. Only use if explicitly specified.",
    )
    earliest_publish_date: Optional[datetime.date] = Field(
        None,
        description="Earliest publish date filter, inclusive. Only use if explicitly specified.",
    )
    latest_publish_date: Optional[datetime.date] = Field(
        None,
        description="Latest publish date filter, exclusive. Only use if explicitly specified.",
    )
    min_length_sec: Optional[int] = Field(
        None,
        description="Minimum video length in seconds, inclusive. Only use if explicitly specified.",
    )
    max_length_sec: Optional[int] = Field(
        None,
        description="Maximum video length in seconds, exclusive. Only use if explicitly specified.",
    )

    def pretty_print(self) -> None:
        for field in self.__fields__:
            if getattr(self, field) is not None and getattr(self, field) != getattr(
                self.__fields__[field], "default", None
            ):
                print(f"{field}: {getattr(self, field)}")

Now, we prompt the LLM to produce queries.

In [23]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a database query optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
structured_llm = llm.with_structured_output(TutorialSearch)
query_analyzer = prompt | structured_llm



In [26]:
query_analyzer.invoke({"question": "bRAGAI - Generative AI Platform coming soon"}).pretty_print()

content_search: bRAGAI Generative AI Platform
title_search: coming soon


In [27]:
query_analyzer.invoke(
    {"question": "videos on chat langchain published in 2023"}
).pretty_print()

content_search: chat langchain
title_search: 2023
earliest_publish_date: 2023-01-01
latest_publish_date: 2024-01-01


In [28]:
query_analyzer.invoke(
    {"question": "videos that are focused on the topic of chat langchain that are published before 2024"}
).pretty_print()

content_search: chat langchain
title_search: chat langchain
earliest_publish_date: 2024-01-01


In [29]:
query_analyzer.invoke(
    {
        "question": "how to use multi-modal models in an agent, only videos under 5 minutes"
    }
).pretty_print()

content_search: multi-modal models agent
title_search: multi-modal models agent
max_length_sec: 300


To then connect this to various vectorstores, you can follow [here](https://python.langchain.com/docs/how_to/self_query).

# Conclusion

This notebook explored two key components of building advanced RAG systems:

1. **Routing Strategies**:
   - **Logical Routing**: Using function calling to classify and route queries to appropriate data sources
   - **Semantic Routing**: Leveraging embeddings and cosine similarity to match queries with relevant prompt templates

2. **Query Construction**:
   - **Metadata Filtering**: Building structured queries that combine semantic search with metadata filters
   - **Schema Definition**: Using Pydantic models to define structured search parameters
   - **Query Analysis**: Converting natural language questions into structured database queries

These techniques enable:
- More precise and relevant document retrieval
- Better handling of diverse data sources
- Structured filtering based on metadata
- Natural language interface for complex queries

The combination of intelligent routing and structured query construction forms the foundation for building more sophisticated and accurate RAG systems.