# Self Query Retriever

This notebook demonstrates **self-query retrieval** - where an LLM helps translate natural language queries into structured database queries.

Key benefits:
- Lets users ask questions in natural language
- Automatically converts to optimized database queries
- Supports filtering by multiple metadata fields

Implementation adapted from:
- [LangChain Self-Query Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
- [Query Construction Blog Post](https://blog.langchain.dev/query-construction/)

The core implementation is in [ai_chains/B_2_self_query.py](../python/ai_chains/B_2_self_query.py)

## Setup Environment

First we configure the environment:
- Load environment variables (API keys, etc)
- Enable automatic reloading for development
- Import required libraries

In [None]:
from dotenv import load_dotenv
from rich import print as rprint

%load_ext autoreload
%autoreload 2

load_dotenv(verbose=True)

## Import Self-Query Components

We'll use two main functions from our implementation:

1. `get_query_constructor()` - Creates the LLM-powered query translator
2. `get_retriever()` - Configures the full retrieval pipeline

In [None]:
from genai_tk.chains.B_2_self_query import get_query_constructor, get_retriever

## Test Query Construction

Let's see how the query constructor translates natural language into a structured query:

- Input: Natural language question about movies
- Output: Structured query with filters for:
  - Genre (sci-fi)
  - Decade (1990s)
  - Director (Luc Besson)
  - Theme (taxi drivers)

In [None]:
from genai_blueprint.utils.config_mngr import global_config

global_config().set("embeddings.default", "ada_002_azure")


query_constructor = get_query_constructor({})
query = query_constructor.invoke(
    {"query": "What are some sci-fi movies from the 90's directed by Luc Besson about taxi drivers"}
)
print(query)

In [None]:
from rich import print

print(query)

## Full Retrieval Pipeline

Now let's test the complete self-query retriever:

1. Configured with GPT-4 for query understanding
2. Takes natural language input
3. Returns matching documents filtered by rating (>8.5)

Try modifying the query to test different filters.

In [None]:
# retriever = get_retriever(config={"llm": None})


from devtools import debug

retriever = get_retriever(config={"llm": "gpt_4omini_openai"})
debug(retriever)
result = retriever.invoke("I want to watch a movie rated higher than 8.5")
rprint(result)