# Using OpenAI for Data Analysis and Prediction
### First setting up data with sklearn and pandas libraries

In [1]:
!pip install --upgrade openai

Collecting openai
  Downloading openai-1.33.0-py3-none-any.whl.metadata (21 kB)
Downloading openai-1.33.0-py3-none-any.whl (325 kB)
   ---------------------------------------- 0.0/325.5 kB ? eta -:--:--
   --------------------- ------------------ 174.1/325.5 kB 5.1 MB/s eta 0:00:01
   ---------------------------------------- 325.5/325.5 kB 5.0 MB/s eta 0:00:00
Installing collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.32.0
    Uninstalling openai-1.32.0:
      Successfully uninstalled openai-1.32.0
Successfully installed openai-1.33.0


In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load and prepare the dataset
df = pd.read_csv('../csv/RDC_Inventory_Monthly_Core_Metrics_Country_History.csv')
df.dropna(inplace=True)
df = df.iloc[::-1].reset_index(drop=True)

# Feature engineering: Add new features if possible
# Example: Extract year and month from 'month_date_yyyymm'
df['year'] = df['month_date_yyyymm'].astype(str).str[:4].astype(int)
df['month'] = df['month_date_yyyymm'].astype(str).str[4:6].astype(int)

# Add lag features
df['lag_1'] = df['median_listing_price_mm'].shift(1)
df['lag_2'] = df['median_listing_price_mm'].shift(2)
df['lag_3'] = df['median_listing_price_mm'].shift(3)
df.dropna(inplace=True)  # Drop rows with NaN values resulting from lagging

features = ['year', 'month','lag_1', 'lag_2', 'lag_3']

# Define features and target variable
X = df[features]  # Features (updated to use year and month)
y = df['median_listing_price_mm']  # Target variable

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train the model with cross-validation
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, predictions)
mse = mean_squared_error(y_test, predictions)
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)

Mean Absolute Error (MAE): 0.0078449375
Mean Squared Error (MSE): 0.0001349643116874999


### Using OpenAI to learn data and make predictions

In [5]:
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()
gpt = os.getenv('gpt_token')
org = os.getenv('gpt_org')

client = OpenAI(api_key=gpt, organization=org)

summary_prompt = f"""
Analyze the following predictions and real estate metrics data:
{df.to_dict()}
What are the key trends and insights? Can you also provide predictions of 
up to a year into the future based on the information for each section of the metrics.
"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": summary_prompt}
    ],
    max_tokens=2000
)



In [6]:
print(response.choices[0].message.content)

### Key Trends and Insights:

1. **Median Listing Price**:
    - From October 2017 to October 2023, there has been a general upward trend in the median listing price with periodic corrections.
    - Notable jumps occurred during 2022, reaching a peak of $450,000 in June 2022.
    - Early 2023 has shown cooling down with prices around $429,950 in April 2023.

2. **Median Listing Price MoM (Month over Month) and YoY (Year over Year)**:
    - MoM changes show some volatility but a consistent upward trend until late 2022.
    - YoY changes peaked massively in early 2022 but have declined considerably by early 2023.

3. **Active Listing Count**:
    - There’s a significant decrease in the number of active listings from 2018 through 2021, reaching a low point in early 2022.
    - 2023 saw a resurgence in active listings, suggesting more inventory becoming available in the market.
    
4. **Days on Market**:
    - The median days on market have seen cyclical patterns. Significantly shorter da