**Colab notebook to implement Personlized Real Estate Agent project. **

Start with installing all the required libraries from pip.

In [13]:
!pip install pandas
!pip install chromadb
!pip install langchain
!pip install numpy
!pip install -U langchain-openai
!pip install pydantic
!pip install shutil
!pip install openai

Collecting openai<2.0.0,>=1.24.0 (from langchain-openai)
  Using cached openai-1.28.1-py3-none-any.whl (320 kB)
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 0.28.1
    Uninstalling openai-0.28.1:
      Successfully uninstalled openai-0.28.1
Successfully installed openai-1.28.1
[31mERROR: Could not find a version that satisfies the requirement shutil (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for shutil[0m[31m


In [14]:
import os
import pandas as pd
import shutil
from langchain.embeddings import OpenAIEmbeddings
from langchain.evaluation import load_evaluator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores.chroma import Chroma
from dataclasses import dataclass
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from typing import List, Union, Dict

Generating Real Estate Listings

Using sample example listing prompt the following code generates real estate listing of 20. Used GPT4 for content generation. You'll use these listings to populate the database for testing and development of "HomeMatch".

In [18]:
from openai import OpenAI
OPEN_AI_API_KEY ='your_api_key'
prompt='''
Craft a comprehensive CSV file encapsulating the essence of 20 distinct real estate listings. Each listing should be meticulously organized into columns, showcasing the following attributes:
- Neighborhood: Identify the neighborhood where the property resides, such as "Green Oaks."
- Price: State the property's market price in USD, formatted (e.g., "$800,000").
- Bedrooms: Enumerate the bedrooms within the property (e.g., 3).
- Bathrooms: Count the property's bathrooms (e.g., 2).
- House Size: Detail the property's square footage (e.g., "2,000 sqft").

Elevate each property's presentation with a comprehensive paragraph that accentuates its unique features, amenities, and sustainable qualities. Highlight elements like energy-efficient appliances, solar panel integration, use of sustainable materials, and the presence of gardens.

Example Entry Format:
Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description
Green Oaks,"$800,000",3,2,"2,000 sqft","Nestled in Green Oaks, this eco-friendly haven features a 3-bedroom, 2-bathroom layout with solar panels and efficient insulation. Highlights include abundant natural light, hardwood floors, and an open-concept kitchen that leads to a lush backyard, embodying a sanctuary for eco-conscious living. The neighborhood of Green Oaks is celebrated for its vibrant and environmentally-aware community, boasting organic stores, community gardens, and convenient transit options, rendering it perfect for those prioritizing sustainability and community engagement."


Ensure the description not only reflects the property's allure, such as its eco-friendly design and comfortable living spaces but also paints a vivid picture of the neighborhood's character. Emphasize community elements like organic grocery stores, parks, cafés, accessibility to public transportation, and commitment to environmental initiatives.

Structure the CSV with clear headers for each column. Follow the example provided to format each subsequent row with information specific to a different property listing.
Make sure you generate 20 unique listings.
'''

messages = [{"role": "system", "content": f"{prompt}"}]
client = OpenAI(api_key=OPEN_AI_API_KEY)

response = client.chat.completions.create(model="gpt-4-turbo-preview", messages=messages)
bot_response = response.choices[0].message.content
messages.append({"role": "assistant", "content": bot_response})
print(bot_response)

```
Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description
Green Oaks,"$800,000",3,2,"2,000 sqft","Nestled in Green Oaks, this eco-friendly haven features a 3-bedroom, 2-bathroom layout with solar panels and efficient insulation. Highlights include abundant natural light, hardwood floors, and an open-concept kitchen that leads to a lush backyard, embodying a sanctuary for eco-conscious living. The neighborhood of Green Oaks is celebrated for its vibrant and environmentally-aware community, boasting organic stores, community gardens, and convenient transit options, rendering it perfect for those prioritizing sustainability and community engagement."
Maple Ridge,"$650,000",4,3,"2,500 sqft","Situated in the serene Maple Ridge, this stunning 4-bedroom, 3-bathroom home spans 2,500 sqft, boasting an energy-efficient design with LED lighting and low-flow plumbing fixtures. The spacious living areas are complemented by large, energy-efficient windows, inviting natural light while minimiz

**Data format to Array**

This following code parse CSV data stored as a string. It first converts the string into a file-like object. Then, it reads the CSV data using the csv.reader function, skipping the first row as it contains special characters. The column headers are extracted, followed by the extraction and storage of the row values. Finally, it prints both the column headers and the row values, enabling examination and handling of the CSV data.

In [19]:
import csv
import io

csv_file = io.StringIO(bot_response)
# Read the CSV data using the csv.reader function
reader = csv.reader(csv_file)
# Extract the column headers
next(reader) # Skippiing first row since it contains ``` character.
headers = next(reader)
# Extract the row values
rows = list(reader)
# Print the column headers
print(headers)
# Print the row values
for row in rows:
    print(row)

['Neighborhood', 'Price', 'Bedrooms', 'Bathrooms', 'House Size', 'Description']
['Green Oaks', '$800,000', '3', '2', '2,000 sqft', 'Nestled in Green Oaks, this eco-friendly haven features a 3-bedroom, 2-bathroom layout with solar panels and efficient insulation. Highlights include abundant natural light, hardwood floors, and an open-concept kitchen that leads to a lush backyard, embodying a sanctuary for eco-conscious living. The neighborhood of Green Oaks is celebrated for its vibrant and environmentally-aware community, boasting organic stores, community gardens, and convenient transit options, rendering it perfect for those prioritizing sustainability and community engagement.']
['Maple Ridge', '$650,000', '4', '3', '2,500 sqft', "Situated in the serene Maple Ridge, this stunning 4-bedroom, 3-bathroom home spans 2,500 sqft, boasting an energy-efficient design with LED lighting and low-flow plumbing fixtures. The spacious living areas are complemented by large, energy-efficient windo

**Store in CSV**

This code snippet utilizes the csv module to create a CSV file named 'output.csv'. It begins by defining the column headers as a list. Then, it opens the CSV file in write mode and creates a csv.writer object. The column headers are written to the file using the writerow method. Subsequently, multiple rows of data (stored in the variable rows) are written to the CSV file using the writerows method, effectively populating the file with structured data. Finally, the file is automatically closed after writing the data, ensuring proper handling and storage of the information.

In [21]:
import csv
headers = ["Neighborhood", "Price", "Bedrooms", "Bathrooms", "House Size", "Description"]
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    # Write the column headers
    writer.writerow(headers)
    # Write the row values
    writer.writerows(rows)

In [22]:
df=pd.read_csv('output.csv')
df.head(10)

Unnamed: 0,Neighborhood,Price,Bedrooms,Bathrooms,House Size,Description
0,Green Oaks,"$800,000",3.0,2.0,"2,000 sqft","Nestled in Green Oaks, this eco-friendly haven..."
1,Maple Ridge,"$650,000",4.0,3.0,"2,500 sqft","Situated in the serene Maple Ridge, this stunn..."
2,Pine Valley,"$1,200,000",5.0,4.0,"3,500 sqft",This luxurious property in Pine Valley offers ...
3,Willow Creek,"$750,000",3.0,3.0,"2,200 sqft",Discover this charming bungalow in Willow Cree...
4,Silver Lake,"$900,000",4.0,2.0,"2,600 sqft","In the heart of Silver Lake, this 4-bedroom, 2..."
5,Cedar Park,"$430,000",2.0,2.0,"1,500 sqft","This compact yet luxurious 2-bedroom, 2-bathro..."
6,Oakwood,"$980,000",4.0,3.5,"3,000 sqft","Located in Oakwood, this spacious 4-bedroom, 3..."
7,Birchwood,"$510,000",3.0,2.0,"1,800 sqft","This cozy 3-bedroom, 2-bathroom residence in B..."
8,Sunnydale,"$575,000",3.0,2.5,"2,100 sqft","Sunnydale's 3-bedroom, 2.5-bathroom townhouse ..."
9,Timber Heights,"$630,000",3.0,2.0,"2,300 sqft",Embrace the tranquility of Timber Heights with...


**Storing Listings in a Vector Database**

This following code snippet demonstrates the use of the langchain library for evaluating word embeddings. First, it sets the OpenAI API key to authenticate with OpenAI services. Then, it initializes an embedding function from the OpenAIEmbeddings class. The embedding function is used to generate vector representations for the words "new york". These vectors represent the semantic meaning of the words in a multi-dimensional space. Next, it loads an evaluator from the langchain library specifically designed for pairwise embedding distance comparison. The evaluator is then employed to compare the vectors of two words, "new york" and "nyc". The comparison assesses the similarity between the semantic meanings of the two words based on their respective vectors. Finally, the code prints out the result of the comparison, providing insights into the semantic relationship between the words.

In [20]:
from langchain.evaluation.loading import load_evaluator
os.environ["OPENAI_API_KEY"] ="your_api_key"
embedding_function = OpenAIEmbeddings()
vector = embedding_function.embed_query("new york")
print(f"Vector for 'new york': {vector}")
print(f"Vector length: {len(vector)}")

# Compare vector of two words
evaluator = load_evaluator('pairwise_embedding_distance')
words = ("new york", "nyc")
x = evaluator.evaluate_string_pairs(prediction=words[0], prediction_b=words[1])
print(f"Comparing ({words[0]}, {words[1]}): {x}")

Vector for 'new york': [-0.0105735306474718, -0.014859732349775213, 0.008842825023329644, -0.041969616274890785, -0.02598762903857556, -0.0005826766179906593, -0.026176925941194435, 0.0136563508844861, -0.022823682222812156, -0.03612848397850375, 0.021904245180128772, 0.0011890152323933553, 0.0036067639277272572, 0.005844512531696467, -0.010830432256279441, -0.01239888472292339, 0.02028170801721123, -0.019984243584690326, 0.01729353716282895, -0.011094094335705962, -0.008342542746952113, 0.028205096929180076, -0.010215221048058422, -0.014454097593384537, 0.0009709867953055441, 0.0015811282326390196, -0.0019031003024759334, -0.010695220981256744, 0.0030456367172878975, -0.015765648451220842, 0.013710435580759696, -0.01711776306409299, -0.028962280814365252, -0.024243403404518235, 0.006770710510660016, -0.007044513761676142, -0.01068170004001899, -0.012412405664161144, 0.007260851615447943, -0.0020011286398488415, 0.010641136284983149, -0.03145017125501516, -0.003583101814899898, -0.0065

In [24]:
# Configuration
CHROMA_PATH = "chroma"
CSV_PATH = "output.csv"

df = pd.read_csv(CSV_PATH)
documents = []
for index, row in df.iterrows():
    documents.append(Document(page_content=row['Description'], metadata={'id': str(index)}))


# Split Text
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
    length_function=len,
    add_start_index=True,
)
chunks = text_splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks.")

if chunks:
    document = chunks[10]
    print(document.page_content)
    print(document.metadata)

# Save to Chroma
if os.path.exists(CHROMA_PATH):
    shutil.rmtree(CHROMA_PATH)

db = Chroma.from_documents(
    chunks, OpenAIEmbeddings(), persist_directory=CHROMA_PATH
)
db.persist()
print(f"Saved {len(chunks)} chunks to {CHROMA_PATH}.")

Split 21 documents into 41 chunks.
design, while a rainwater harvesting system supports the verdant garden. Silver Lake is a hub for artists and musicians, known for its cultural vibrancy, eclectic shops, and farm-to-table restaurants, making it a haven for creative souls.
{'id': '4', 'start_index': 211}
Saved 41 chunks to chroma.


  warn_deprecated(


In [27]:
CHROMA_PATH = "chroma"

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Answer the question based on the above context: {question}
"""

query_text = "Would like to buy home with 2 bedrooms"

# Prepare the DB.
embedding_function = OpenAIEmbeddings()
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

# Search the DB.
results = db.similarity_search_with_relevance_scores(query_text, k=3)
if len(results) == 0 or results[0][1] < 0.7:
    print(f"Unable to find matching results.")
else:
    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)
    print(f"Generated Prompt:\n{prompt}")

    model = ChatOpenAI()
    response_text = model.predict(prompt)

    sources = [doc.metadata.get("id", None) for doc, _score in results]
    formatted_response = f"Response: {response_text}\nSources: {sources}"
    print(formatted_response)

Generated Prompt:
Human: 
Answer the question based only on the following context:

This cozy 3-bedroom, 2-bathroom residence in Birchwood offers 1,800 sqft of living space, with a focus on sustainability through solar panel roofing and a vegetable garden. The home is a blend of comfort and modern eco-friendly practices, situated in a neighborhood known for its commitment to

---

Situated in the serene Maple Ridge, this stunning 4-bedroom, 3-bathroom home spans 2,500 sqft, boasting an energy-efficient design with LED lighting and low-flow plumbing fixtures. The spacious living areas are complemented by large, energy-efficient windows, inviting natural light while minimizing

---

This adorable 3-bedroom, 2-bathroom home in Pebble Creek utilizes 1,600 sqft efficiently, featuring a high-efficiency HVAC system and backyard wildlife habitat. The neighborhood is noted for its walkability, close-knit community, and annual eco-awareness workshops, making it ideal for first-time

---

Answer 

In [26]:
CHROMA_PATH = "chroma"

PROMPT_TEMPLATE = """
Answer the question based only on the following context:

{context}

---

Given the context provided above, craft a response that not only answers the question {question},
but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences.
Strive to present your insights in a manner that resonates with the audience's interests and requirements
"""

query_text = "Would like to buy home in calm neighbourhood"

# Prepare the DB.
embedding_function = OpenAIEmbeddings()
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

# Search the DB.
results = db.similarity_search_with_relevance_scores(query_text, k=3)
if len(results) == 0 or results[0][1] < 0.7:
    print(f"Unable to find matching results.")
else:
    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)
    print(f"Generated Prompt:\n{prompt}")

    model = ChatOpenAI()
    response_text = model.predict(prompt)

    sources = [doc.metadata.get("id", None) for doc, _score in results]
    formatted_response = f"Response: {response_text}\nSources: {sources}"
    print(formatted_response)

Generated Prompt:
Human: 
Answer the question based only on the following context:

close-knit community, and annual eco-awareness workshops, making it ideal for first-time homebuyers looking for a sustainable lifestyle.

---

The neighborhood prides itself on its conservation efforts, extensive hiking trails, and community co-op gardens, offering a peaceful yet engaged lifestyle to its residents.

---

comfort and modern eco-friendly practices, situated in a neighborhood known for its commitment to eco-conscious living, local produce markets, and vibrant community events.

---

Given the context provided above, craft a response that not only answers the question Would like to buy home in calm neighbourhood, 
but also ensures that your explanation is distinct, captivating, and customized to align with the specified preferences. 
Strive to present your insights in a manner that resonates with the audience's interests and requirements

Response: If you're looking to buy a home in a calm 

**Learnings**

Utilization of Natural Language Processing (NLP) Techniques: The project employs NLP methodologies to develop a custom chatbot capable of processing and responding to prompts containing real-world context.

Integration of OpenAI API: By integrating the OpenAI API, the chatbot leverages advanced language processing capabilities, such as semantic embeddings, to generate meaningful representations of textual data.

Application of langchain Library: The project utilizes the langchain library to facilitate pairwise embedding distance comparison, enabling the chatbot to assess semantic similarity between words or phrases.

Semantic Understanding and Inference: Through the combined use of OpenAI API and langchain library, the chatbot demonstrates the ability to analyze and interpret textual data, providing insightful responses based on semantic understanding and inference.

Example Prompt Analysis: The provided example prompt showcases the chatbot's capability to accurately assess housing descriptions and determine the absence of homes with precisely two bedrooms, highlighting its potential utility in various domains requiring contextual comprehension and inference.