# HyDE: Hypothetical Document Embeddings

In traditional vector search, queries are converted into embeddings and compared with a database of stored embeddings. HyDE enhances retrieval by first generating a hypothetical response and embedding that instead of the raw query. This helps in cases where:

- Queries are ambiguous or too short
- There isn't a direct match in the vector database
- LLMs can generate useful contextual information before retrieval


In [None]:
# install some code utilities
import importlib

if not importlib.util.find_spec("beyond_the_hype"):
    !pip install -qqq git+https://github.com/xtreamsrl/beyond-the-hype

In [None]:
import os 

os.environ["OPENAI_API_KEY"] = ...

In [None]:
from beyond_the_hype.data import get_movies_dataset
from beyond_the_hype.judge import llm_as_a_judge, answer_multiple_questions

import openai
import polars as pl
from sentence_transformers import SentenceTransformer
import lancedb

In [None]:
movies = get_movies_dataset()

In [None]:
encoder = SentenceTransformer("all-MiniLM-L6-v2")

In [None]:
uri = "./data/movies_embeddings"
db = lancedb.connect(uri)

movies_table = db.create_table("movies", movies, mode="overwrite")

client = openai.OpenAI()

HyDE is composed of two main steps:
- Generate a hypothetical document: we ask an LLM to generate a document that could reply to a given question. This document is called **"hypothetical"** because it's not real and could contain factual errors or hallucinations, but it looks like an actual document and could help retrieve documents.
- Use the hypothetical document (instead of the user question) to search between vectors.

The embedding encoder works as a lossy compressor that filters away all extra things, including errors.

In [None]:
def create_hyde_query(client: openai.OpenAI, query: str) -> str:
    hyde_prompt = f"""
You are the best movie expert on the market. 
Generate a document that could be used to reply the following question:
{query}
Give just the document. Don't add unnecessary information such as title etc. 
    """
    hyde_fake_reply = (
        client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "user", "content": hyde_prompt},
            ],
        )
        .choices[0]
        .message.content
    )
    return hyde_fake_reply

In [None]:
question = "How The wolf of wall street end?"

In [None]:
create_hyde_query(client, question)

In [None]:
def get_records(query, *, encoder=encoder, db_table=movies_table, max_results=10):
    query_vector = encoder.encode(query).tolist()
    return (
        db_table.search(query_vector)
        .limit(10)
        .select(
            [
                "release_year",
                "title",
                "origin",
                "director",
                "cast",
                "genre",
                "plot",
                "_distance",
            ]
        )
        .to_list()
    )

In [None]:
SYSTEM_MESSAGE = """ You are a movie expert, and your goal is to recommend the user with a good movie to watch.

RULES: 
- You should reply to questions about: movies plots or synopsys, movies metadata (release date, cast, or director), provide plots summary;
- For every questions outside the scope please reply politely that you're not able to provide a response and describe briefly your scope;
- Don't mention that you have a list of films as a context. This should be transparent to the user
- If you don't have the movie in your context reply that you don't know how to reply"""

In [None]:
prompt_template = """
  Here are some suggested movies (ranked by relevance) to help you with your choice.
  {context}

  Use these suggestions to answer this question:
  {question}
"""

context_template = """
Title: {title}
Release date: {release_year}
Director: {director}
Cast: {cast}
Genre: {genre}
Overview: {plot}
"""


def format_records_into_context(records, *, template):
    return "".join(
        context_template.format(
            title=rec["title"],
            release_year=rec["release_year"],
            director=rec["director"],
            cast=rec["cast"],
            genre=rec["genre"],
            plot=rec["plot"],
        )
        for rec in records
    )

In [None]:
def ask(
    question,
    *,
    max_results=10,
    system=SYSTEM_MESSAGE,
    prompt_template=prompt_template,
    context_template=context_template,
    db_table=movies_table,
    verbose=False,
):
    fake_hyde_reply = create_hyde_query(client, question)

    if verbose:
        print(f"FAKE HYDE REPLY:\n{fake_hyde_reply}\n\n")

    results = get_records(
        query=fake_hyde_reply, max_results=max_results, db_table=movies_table
    )
    context = format_records_into_context(results, template=context_template)

    prompt = prompt_template.format(question=question, context=context)

    chat_completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": prompt},
        ],
    )

    answer = chat_completion
    if verbose:
        print(f"RETRIEVED CONTEXT: \n{context}\n\n")
        print(f"FINAL REPLY:\n{answer.choices[0].message.content}\n\n")

    return answer


answer = ask(question=question, verbose=True)

# But... Is Our RAG Improved?

Let's take our questions/answers dataset and run again our LLM-as-a-Judge

In [None]:
questions_answers_df = pl.read_csv(
    source="eval_replies.csv"
).select(["question", "rag_answer"])

In [None]:
replied_answers = answer_multiple_questions(questions_answers_df, ask)

In [None]:
judged_questions_answer_df = llm_as_a_judge(questions_answers_df, client)

In [None]:
judged_questions_answer_df