# Personalized Real Estate Agent 

### Objective 

Create "HomeMatch" application, which provides a perfonalized real etatate property search function 

### Instruction

Just run the code on the notebook from top to bottom. 

### Step 1: Setting Up the Python Application

In [None]:
!pip install -r ./requirements.txt

In [84]:
# settings up  LangChain, model (OpenAI GPT) 

from langchain.llms import OpenAI 
from langchain.prompts import PromptTemplate
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma

modelName = 'gpt-3.5-turbo'
reviewPath = './data/listings.csv'
temperature = 0.5
maxTokens = 3000
model = OpenAI(model_name=modelName, temperature=temperature, max_tokens=maxTokens)
embeddings = OpenAIEmbeddings()




### Step 2: Genrating Real Estate Listings

In [None]:

dataGeneratiingTemplate = '''
generate csv formatted real estate listing data. The number of listing is {num_reviews}, Each row has following data:
- neighborhoood (ex. "Green Oaks")
- price (ex. "$800,000")
- bedrooms (ex. "3". this should be integer)
- bathrooms (ex. "2". This should be integer)
- house size (ex. "2,000 sqft")
- description: (ex. Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.)
- neighborhood description (ex. Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.)
CSV format is a must. Do not use the example above for the output. Do not forget to quote (" ") data to avoid mis-separation by commans in the data

'''

dataGeneratingPrompt = PromptTemplate.from_template(dataGeneratiingTemplate)
reviews = model(dataGeneratingPrompt.format(num_reviews=20))

print(reviews)



In [5]:


with open(reviewPath, 'w') as file:
    file.write(reviews)
print(reviews)

"neighborhood","price","bedrooms","bathrooms","house size","description","neighborhood description"
"Green Oaks","$800,000",3,2,"2,000 sqft","Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.","Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze."
"Willow Cree

In [19]:
# data cleaning (removing imcomplete data)

import pandas as pd

try:
    df = pd.read_csv(reviewPath)
except pd.errors.ParserError:
    with open(reviewPath, 'r') as file:
        lines = file.readlines()
    with open(reviewPath, 'w') as file:
        file.writelines(lines[:-1])
    df = pd.read_csv(reviewPath)

print(df) 

          neighborhood       price  bedrooms  bathrooms  house size  \
0           Green Oaks    $800,000         3          2  2,000 sqft   
1         Willow Creek    $650,000         4          3  2,500 sqft   
2      Oakwood Heights  $1,200,000         5          4  3,800 sqft   
3   Riverfront Estates    $950,000         4          3  3,200 sqft   
4    Maplewood Heights    $550,000         3          2  1,800 sqft   
5         Sunset Ridge    $750,000         4          3  2,300 sqft   
6          Harbor View    $900,000         5          4  3,500 sqft   
7     Rosewood Estates  $1,500,000         6          5  4,500 sqft   
8          Willowbrook    $600,000         3          2  1,900 sqft   
9     Oakridge Heights  $1,100,000         4          3  3,000 sqft   
10           Riverbend    $700,000         4          3  2,400 sqft   
11   Maplewood Terrace    $500,000         2          1  1,200 sqft   
12        Sunset Hills    $800,000         4          3  2,500 sqft   
13    

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   neighborhood              17 non-null     object
 1   price                     17 non-null     object
 2   bedrooms                  17 non-null     int64 
 3   bathrooms                 17 non-null     int64 
 4   house size                17 non-null     object
 5   description               17 non-null     object
 6   neighborhood description  17 non-null     object
dtypes: int64(2), object(5)
memory usage: 1.1+ KB


### Step 3: Storing Listing in a Vector Database

In [23]:
# create embeddings based on the descriptions 
import numpy as numpy
from sentence_transformers import SentenceTransformer

def generate_embeddings(inputData, modelName='paraphrase-MiniLM-L6-v2'):
    model = SentenceTransformer(modelName)
    embeddings = model.encode(inputData)
    return embeddings 

inputData = 'test input'
ret = generate_embeddings(inputData)
print(ret)


  return self.fget.__get__(instance, owner)()


[-0.10783821 -0.14378223 -0.5348072  -0.02631692 -0.52043283 -0.39965105
  0.88257116  0.18027401  0.03389021 -0.2658091   0.11634979 -0.35382175
  0.00744328  0.2513596  -0.41194478 -0.7058538  -0.06635842 -0.3761077
  0.2790929  -0.513758    0.13947694 -0.14968006 -0.37274134 -0.04928928
 -0.3051302   0.13620248  0.15860763  0.16724765 -0.18057713 -0.25717857
  0.05877491  0.22056748 -0.02603078 -0.2780549   0.10935203  0.09457114
 -0.01761572 -0.57578427 -0.5135119  -0.01670957 -0.12084795 -0.41969696
  0.16149658  0.35353518  0.6809484   0.15212993 -0.12898082  0.22405568
 -0.16410679  0.6917116  -0.12223716 -0.10629122  0.39906946 -0.5541727
  0.19759262 -0.24014977  0.37758708 -0.16340174 -0.09132338 -0.02430771
 -0.88047266 -0.15386726 -0.28327876  0.17452793 -0.18881442 -0.09343737
 -0.23294406 -0.1507564   0.02773398 -0.10185362 -0.14483353  0.09617867
 -0.45131776 -0.07651255 -0.0974674   0.22563973 -0.5208095  -0.5169152
  0.21719311 -0.38924542  0.33783868  0.59961164  0.10

In [27]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path=reviewPath)
data = loader.load()

print(data)

[Document(page_content='neighborhood: Green Oaks\nprice: $800,000\nbedrooms: 3\nbathrooms: 2\nhouse size: 2,000 sqft\ndescription: Welcome to this eco-friendly oasis nestled in the heart of Green Oaks. This charming 3-bedroom, 2-bathroom home boasts energy-efficient features such as solar panels and a well-insulated structure. Natural light floods the living spaces, highlighting the beautiful hardwood floors and eco-conscious finishes. The open-concept kitchen and dining area lead to a spacious backyard with a vegetable garden, perfect for the eco-conscious family. Embrace sustainable living without compromising on style in this Green Oaks gem.\nneighborhood description: Green Oaks is a close-knit, environmentally-conscious community with access to organic grocery stores, community gardens, and bike paths. Take a stroll through the nearby Green Oaks Park or grab a cup of coffee at the cozy Green Bean Cafe. With easy access to public transportation and bike lanes, commuting is a breeze.

In [31]:
# split the data in chunks

splitter = CharacterTextSplitter(chunk_size=10, chunk_overlap=0)
split_docs = splitter.split_documents(data)

print(len(split_docs))


17


In [32]:
# embedding 

embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(split_docs, embeddings)

In [53]:
emb = db._collection.get(ids=['e655d0e9-c4e0-11ee-828e-00155d004022'], include=['embeddings'])
print(emb['embeddings'])

[[0.011848695576190948, -0.0018651332939043641, -0.00027117624995298684, -0.004736846312880516, -0.01602633111178875, 0.005115136504173279, 0.0010764813050627708, -0.024578969925642014, -0.02936844900250435, -0.008460533805191517, 0.012868433259427547, -0.004434214439243078, -0.027447393164038658, -0.02718423493206501, -0.0035921086091548204, 0.0026562523562461138, 0.024710549041628838, 0.0007759053260087967, -0.0018782912520691752, 0.0019062517676502466, -0.042526356875896454, 0.020710544660687447, -0.0163289625197649, -0.011657905764877796, 0.009756588377058506, 0.004480267409235239, 0.01455264538526535, -0.01034869346767664, -0.02013159729540348, -0.013250011950731277, -0.0024128311779350042, -0.0034210558515042067, -0.028973711654543877, 0.011703957803547382, -0.017934227362275124, -0.008388165384531021, 0.003300989978015423, -0.005282899830490351, 0.025368444621562958, 0.00816448125988245, 0.023750022053718567, -0.03297371417284012, 0.010414483025670052, -0.016394751146435738, -0.

In [54]:
query = '''
    Based on the listings, how many bedrooms does the house in Willow Creek have?
'''

In [55]:
similar_docs = db.similarity_search(query, k=5)
prompt = PromptTemplate(
    template="{query}\nContext: {context}",
    input_variables={"query", "context"},
)
chain = load_qa_chain(model, prompt=prompt, chain_type='stuff')
print(chain.run(input_documents=similar_docs, query=query))

The house in Willow Creek has 4 bedrooms.


### Step 4: Building the User Preference Interface

In [58]:
questions = [   
                "How big do you want your house to be?" 
                "What are 3 most important things for you in choosing this property?", 
                "Which amenities would you like?", 
                "Which transportation options are important to you?",
                "How urban do you want your neighborhood to be?",   
            ]

answers =   [
                "A comfortable three-bedroom house with a spacious kitchen and a cozy living room.",
                "A quiet neighborhood, good local schools, and convenient shopping options.",
                "A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.",
                "Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.",
                "A balance between suburban tranquility and access to urban amenities like restaurants and theaters."
            ]

In [127]:
questions_and_answers = ''

for i in range(len(questions)): 
    questions_and_answers += 'AI:{}\n'.format(questions[i])
    questions_and_answers += 'Human:{}\n'.format(answers[i])

print(questions_and_answers)

AI:How big do you want your house to be?What are 3 most important things for you in choosing this property?
Human:A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
AI:Which amenities would you like?
Human:A quiet neighborhood, good local schools, and convenient shopping options.
AI:Which transportation options are important to you?
Human:A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
AI:How urban do you want your neighborhood to be?
Human:Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.



### Step 5: Searching Based on Preferences / Step 6: Peronalizing Listing Descriptions

In [133]:
query = '''
Followings are the questions and answers about the users preferences.

{}
Based on the preferences, recoomend the best item from the listings
'''.format(questions_and_answers)

print(query)


Followings are the questions and answers about the users preferences.

AI:How big do you want your house to be?What are 3 most important things for you in choosing this property?
Human:A comfortable three-bedroom house with a spacious kitchen and a cozy living room.
AI:Which amenities would you like?
Human:A quiet neighborhood, good local schools, and convenient shopping options.
AI:Which transportation options are important to you?
Human:A backyard for gardening, a two-car garage, and a modern, energy-efficient heating system.
AI:How urban do you want your neighborhood to be?
Human:Easy access to a reliable bus line, proximity to a major highway, and bike-friendly roads.

Based on the preferences, recoomend the best item from the listings



In [134]:
print(chain.run(input_documents=similar_docs, query=query))

Based on the preferences provided, the best recommendation would be the listing in the Willow Creek neighborhood. It meets the criteria of a comfortable three-bedroom house with a spacious kitchen and a cozy living room. The neighborhood description mentions top-rated schools and nearby shopping centers, which aligns with the desired amenities. Additionally, the neighborhood offers easy access to major highways, fulfilling the transportation options requirement.


In [136]:
print(df.iloc[1]['description'])

Welcome to this spacious family home in the desirable Willow Creek neighborhood. With 4 bedrooms and 3 bathrooms, there is plenty of space for everyone. The updated kitchen features granite countertops and stainless steel appliances, perfect for the home chef. The master suite offers a walk-in closet and a luxurious en-suite bathroom. The backyard is an entertainer's dream with a patio and a large pool. Don't miss out on this Willow Creek gem!
