# Home Match

In this project I'd like to show my approach to building a GenAI-powered real estate agent. Getting access to real-world real estate listings is difficult I realised that an alternative way to get some test data is by using Generative AI.

In the first part of the project I'll focus on using ChatGPT 3.5 Turbo to generate 100 real estate listings in batches of 10 to not hit the max token limit.

Let's get into it. In order to make this project work you'll need to change the `YOUR_OPENAI_API_KEY` to a valid key.

In [12]:
import os
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

Next I'm initialing OpenAI API with a high number of maximum tokens.

In [2]:
from langchain.llms import OpenAI

model_name = "gpt-3.5-turbo"
temperature = 1
max_tokens = 4000
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=max_tokens)



## Generating Listings

In this section I'm loading listings from either a locally saved file `listings.txt` or if the file is empty or has less listings than the expected number of listings (100) I make a call to ChatGPT to get me another batch of up to 10 listings.

In my prompt to ChatGPT I'm providing two examples of listings to show it the correct format of listings and what sort of information I'm looking for in newly generated listings.

The code automatically saves all listings generated by ChatGPT in the `listings.txt` file so that they can be reused.

In [6]:
import re

def generate_listings(number):
    input = f"""
Given the following example please answer the question below

===Listing===
Neighborhood: Green Oaks
Price: $800,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,000 sqft

Description: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.

Neighborhood Description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

===Listing===
Neighborhood: Oakwood Hills
Price: $900,000
Bedrooms: 5
Bathrooms: 4
House Size: 3,200 sqft

Description: Welcome to this luxurious 5-bedroom, 4-bathroom home in the prestigious Oakwood Hills neighborhood. This stunning property features a gourmet kitchen, spacious living areas, and a private backyard oasis with a pool and spa. Perfect for entertaining or relaxing in style.

Neighborhood Description: Oakwood Hills is an exclusive community known for its upscale homes and lush landscapes. Enjoy easy access to shopping, dining, and top-rated schools. Take a leisurely walk around the neighborhood lake or tee off at the nearby golf course.

===Question===
Generate {number} real estate listings in the same format as the example above.
    """
    listings = llm(input)
    with open("listings.txt", "a") as file:
        file.write(listings)
    listings = split_listings(listings)
    print("Example listing: ", listings[0])
    return listings
    

def load_existing_listings():
    listings_file_path = "listings.txt"
    if not os.path.exists(listings_file_path):
        return []
    listings_text = open("listings.txt").read()
    return split_listings(listings_text)


def remove_empty_lines(items):
    return [item for item in items if item]
    

def split_listings(listings_text):
    listings = re.split("===Listing===\n", listings_text)
    return remove_empty_lines(listings)

listings = load_existing_listings()
print("Existing listings count: ", len(listings))

expected_number_of_listings = 100
remaining_listings = expected_number_of_listings - len(listings)
batch_number = 1
while remaining_listings > 0:
    print(f"Loading batch {batch_number}...")
    batch_number += 1
    listings.extend(generate_listings(min(10, remaining_listings)))
    remaining_listings = expected_number_of_listings - len(listings)

print(f"Loaded {len(listings)} listings")
print("")
print("Last listing:")
print("")
print(listings[len(listings) - 1])

Existing listings count:  100
Loaded 100 listings

Last listing:

Neighborhood: Riverbend
Price: $895,000
Bedrooms: 4
Bathrooms: 3.5
House Size: 2,800 sqft

Description: Discover this stunning 4-bedroom, 3.5-bathroom home in the picturesque Riverbend neighborhood. This home features a gourmet kitchen, spacious bedrooms, and a backyard retreat with a deck and garden perfect for outdoor relaxation.

Neighborhood Description: Riverbend is a peaceful community with water views, nature trails, and a strong sense of community. Enjoy the convenience of nearby schools, shopping centers, and recreational facilities in this serene setting.


## Storing listings in the vector database

Once we have our test real estate listings we can store them in a vector database for the use in the query mechanism.

In the code below I'm initiating a new collection called "listings" and write all listings to it.

In [8]:
import chromadb

chroma_client = chromadb.Client()
listings_collection = chroma_client.get_or_create_collection(name="listings")
listings_collection.add(
    documents = listings,
    ids = [f"{index}" for index in range(0, len(listings))]
)

[0;93m2024-06-15 22:26:23.504024 [W:onnxruntime:, helper.cc:82 IsInputSupported] CoreML does not support input dim > 16384. Input:embeddings.word_embeddings.weight, shape: {30522,384}[m
[0;93m2024-06-15 22:26:23.504386 [W:onnxruntime:, coreml_execution_provider.cc:104 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 49 number of nodes in the graph: 323 number of nodes supported by CoreML: 231[m


## Searching lisings

In the section below I'm building a ChatGPT-powered search engine that is able to find a listing that matches specified user preferences. The search engine works by first querying the vector database using the list of answers provided by the user and loading 10 most relevant listings. These listings are then baked into the ChatGPT query in which I ask to pick the most suitable one based on the users preferences. As a result I'm getting the ID of the listing which I can then use to access the right listing from the database.

In [9]:
def get_listings(answers):
    found_documents = listings_collection.query(
        query_texts=answers,
        n_results=10,
    )
    result = ""
    for id, document in zip(found_documents["ids"][0], found_documents["documents"][0]):
        result += "\n".join([
            f"===Listing {id}===",
            document,
            "",
        ])
    return result

def get_questions(questions, answers):
    result = ""
    for index in range(0, len(questions)):
        result += f"Question: {questions[index]}\n"
        result += f"Answer: {answers[index]}\n\n"
    return result

def get_best_listing(questions, answers):
    input = f"""
Given the following real estate listings please answer the question below

{get_listings(answers)}

===Question===
Find the listing ID of the most suitable real estate listing that best matches following criteria. Respond only with the Listing ID.

===Example response 1===
1

===Example response 2===
23

{get_questions(questions, answers)}
"""
    return llm(input)

Now that I have the search engine working I can fire some test requests against it.

In [10]:
questions = [   
    "How big do you want your house to be?",
    "What are 3 most important things for you in choosing this property?", 
    "Which amenities would you like?", 
    "Which transportation options are important to you?",
    "How urban do you want your neighborhood to be?",   
]
answers = [
    "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
    "A quiet neighborhood, good local schools, and convenient shopping options.",
    "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
    "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
    "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
]
listing_id = get_best_listing(questions, answers)
listing_id

'17'

## Personlaizing Listing Descriptions

In the section below I'm creating a component that allows me to return listing that was rewritten by ChatGPT to emphasize all the features that the user is looking for.

For testing purposes the code below prints both original and personalized listing so that they can be compared.

In [11]:
def get_personalized_listing(listing_id, questions, answers):
    listing = listings_collection.get(ids=[listing_id])["documents"][0]
    print("Original listing:")
    print(listing)
    print("")
    input = f"""
Given the following real estate listing please answer the question below

===Questions===
{get_questions(questions, answers)}

===Listing===
{listing}

Return a version of that listing which highlights features that the user is looking for. Please maintain the original style of the listing.
    """
    return llm(input)


personalized_listing = get_personalized_listing(listing_id, questions, answers)
print("Personalized listing:")
print("")
print(personalized_listing)

Original listing:
Neighborhood: Lakeside Living
Price: $950,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,200 sqft

Description: Welcome to this lakeside 3-bedroom, 2-bathroom home in the serene Lakeside Living neighborhood. This quaint property features a cozy den, screened porch, and a private dock for boating and fishing. Perfect for enjoying lakefront living.

Neighborhood Description: Lakeside Living is a waterfront community with boating, fishing, and lakeside picnics. Swim in the crystal-clear waters or kayak along the shoreline. Close to waterfront dining, marinas, and lakeside trails for outdoor adventures.

Personalized listing:

Neighborhood: Lakeside Living
Price: $950,000
Bedrooms: 3
Bathrooms: 2
House Size: 2,200 sqft

Description: Welcome to this lakeside 3-bedroom, 2-bathroom home in the serene Lakeside Living neighborhood. This property offers a quiet and peaceful setting, perfect for those seeking suburban tranquility. The spacious kitchen is ideal for cooking and entert