# Notebook 3 - RAG System Prototype Logic Flow

## Introduction

### Notebook Overview
* **Purpose**: Development of a RAG (Retrieval Augmented Generation) System Prototype
* **Main Components**:
  * Part 1: Initial setup, imports and prompt functions
  * Part 2: Retrieval based on user input
  * Part 3: LLM integration for recommendations

### Summary
This notebook implements a real estate recommendation system using RAG architecture. It combines vector database retrieval (Chroma) with LLM capabilities (GPT-4.1) to provide personalized house recommendations based on user preferences. The system processes user inputs about neighborhood preferences, house requirements, size needs and budget constraints to search through real estate listings, generating tailored recommendations in a structured markdown format.


## Part 1 - Initial imports, setup and prompt functions

In [56]:
# Imports:
import pandas as pd
from dotenv import load_dotenv
from langchain.document_loaders.csv_loader import CSVLoader
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
# from langchain import  LLMChain
# from pathlib import Path

# Constants:
CSV_FILEPATH = '../data/real_estate_listings_formatted.csv'
VDB_PATH = '../vdb'
VDB_NAME = 'real_estate_listings'
# Environment variables:
load_dotenv()

# OpenAI Models:
embeddings = OpenAIEmbeddings()
chat_llm = ChatOpenAI(temperature= 0.0,
                      model= "gpt-4.1",
                      max_tokens= 1000,
                      max_retries=1)

# Loading Chroma DB:
db = Chroma(persist_directory= VDB_PATH,
            embedding_function=embeddings,
            collection_name=VDB_NAME)
import pandas as pd
from dotenv import load_dotenv
from langchain.document_loaders.csv_loader import CSVLoader
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate, PipelinePromptTemplate
from langchain.chains.question_answering import load_qa_chain
# from langchain import  LLMChain
# from pathlib import Path

# Constants:
CSV_FILEPATH = '../data/real_estate_listings_formatted.csv'
VDB_PATH = '../vdb'
# Environment variables:
load_dotenv()

# Openai models:
embeddings = OpenAIEmbeddings()
chat_llm = ChatOpenAI(temperature= 0.0,
                      model= "gpt-4.1",
                      max_tokens= 2000,
                      max_retries= 1)
# VectorDB loading:
vdb = db.as_retriever(search_type="mmr",search_kwargs={'k': 5, 'fetch_k': 15, 'lambda_mult': 0.5})

Creating a prompt template suitable for semantic search

In [57]:
def prompt_to_retriever(nhood:str, prefs:str, size:str, cost:str)-> str:
    template = \
    """
    Description of the ideal neighborhood: {var0}
    Description of buyer's preferences: {var1}
    House size preference: {var2}
    Budget: {var3}
    """
    prompt_template = PromptTemplate.from_template(template)
    res = prompt_template.format(var0= nhood,
                                 var1= prefs,
                                 var2= size,
                                 var3= cost
                                 )
    return res

Creating a prompt template suitable for input to LLM

In [58]:
def prompt_to_llm(nhood:str, prefs:str, size:str, cost:str, context)-> str:
    template = \
    """
    You are a helpful real estate recommendation engine that helps buyers find their ideal home based on their preferences.
    According only to the context provided you must answer their query.

    -- Buyer's preferences:
    The user has the following preferences:
        > Description of the ideal neighborhood: {var0}
        > Description of buyer's preferences: {var1}
        > House size preference: {var2}
        > Budget: {var3}

    -- Output format rules:
    > Based on the preferences above and the provided content, recommend a house/houses that meets the buyer's needs.
    > Provide your recommendations strictly in markdown bullet points.
    > Always augment the description of real estate listings (context).
    > The augmentation should personalize the listing without changing factual information.
    > First, provide a summarisation of houses provided in the context (in bullet points). Second provide the final recommendation (recommended house).
    > The final recommendation's descriptions in markdown bullets should be unique, appealing, and tailored to the buyer's preferences.

    -- Provided Real Estate Listings Context:
    """
    for idx, doc in enumerate(context):
        template+=\
            f"""Document {idx}:
            {doc.page_content}
            ---"""

    prompt_template = PromptTemplate.from_template(template)
    res = prompt_template.format(var0= nhood,
                                 var1= prefs,
                                 var2= size,
                                 var3= cost,
                                 var4= context)
    return res


## Part 2 - Retrieval based on user input

#### Questions and answers (user input):

Questions:

In [59]:
neighborhood_q = "What is the ideal neighborhood that you would like to live in?"
preferences_q = "What are your personal house preferences? Tell me whatever you imagine your house to be like."
house_size_q= "What is the size of your ideal house? (in square meters)"
house_cost_q= "What is the house cost you can afford? Is there any limit range for your budget?"

**Answers (user input)**:

In [60]:
neighborhood = 'Nearby the sea. I like a peaceful neighborhood.'
preferences= 'Big windows house. I like sun and a beautiful garden.'
house_size= 'A medium size house for a family of three.'
house_cost= 'I can afford to pay between 100,000 and 250,000 euros. The cheaper the better.'

Retrieving relevant documents based on user's preferences:

In [64]:
retrieval_query = prompt_to_retriever(neighborhood, preferences, house_size, house_cost)
docs = vdb.invoke(input=retrieval_query)

## Part 3 - Input to LLM

Creating the final prompt for input to llm:

In [77]:
query = prompt_to_llm(neighborhood, preferences, house_size, house_cost, docs)
print(query)


    You are a helpful real estate recommendation engine that helps buyers find their ideal home based on their preferences.
    According only to the context provided you must answer their query.

    -- Buyer's preferences:
    The user has the following preferences:
        > Description of the ideal neighborhood: Nearby the sea. I like a peaceful neighborhood.
        > Description of buyer's preferences: Big windows house. I like sun and a beautiful garden.
        > House size preference: A medium size house for a family of three.
        > Budget: I can afford to pay between 100,000 and 250,000 euros. The cheaper the better.

    -- Output format rules:
    > Based on the preferences above and the provided content, recommend a house/houses that meets the buyer's needs.
    > Provide your recommendations strictly in markdown bullet points.
    > Always augment the description of real estate listings (context).
    > The augmentation should personalize the listing without changing

In [76]:
# DEPRECATED
#rag = RetrievalQA.from_chain_type(
#    llm= chat_llm,
#    chain_type="stuff",
#    retriever= db.as_retriever(
#        search_type="mmr",
#        search_kwargs={'k': 5, 'fetch_k': 15, 'lambda_mult': 0.5}
#        ),
#    )
#gen_output = rag.invoke(query)
#print(gen_output['result'])

Generating the final answer using the LLM:

In [71]:
llm_output = chat_llm.invoke(input = query)

In [74]:
print(llm_output.content)

**Summary of Houses Provided in the Context:**

- **Seaside Village (295,000 euro, 3BR, 135 sqm):**
  - Steps from the beach, sunroom, updated kitchen, private garden.
  - Relaxed coastal lifestyle, sandy beaches, seafood restaurants, friendly community.
  - *Above budget.*

- **Sunnydale (270,000 euro, 3BR, 140 sqm):**
  - Modern home, open-concept, large windows, sleek kitchen, backyard for barbecues.
  - Peaceful suburb, excellent schools, quiet streets, community pool.
  - *Slightly above budget.*

- **Riverside Heights (320,000 euro, 4BR, 210 sqm):**
  - Spacious, modern kitchen, sun-drenched patio, private garden, master suite.
  - Scenic river views, parks, farmers' markets, cafes.
  - *Well above budget.*

- **Lavender Fields (230,000 euro, 3BR, 120 sqm):**
  - Charming, lavender gardens, updated kitchen, large backyard.
  - Family-friendly, parks, schools, strong community.
  - *Within budget.*

- **Elm Street (130,000 euro, 2BR, 80 sqm):**
  - Affordable, renovated kitchen, c