# Using OpenAI for Data Analysis and Prediction
### First setting up data with sklearn and pandas libraries

In [1]:
!pip install --upgrade openai



In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load and prepare the dataset
df = pd.read_csv('../csv/RDC_Inventory_Monthly_Core_Metrics_Country_History.csv')
df.dropna(inplace=True)
df = df.iloc[::-1].reset_index(drop=True)

# Feature engineering: Add new features if possible
# Example: Extract year and month from 'month_date_yyyymm'
df['year'] = df['month_date_yyyymm'].astype(str).str[:4].astype(int)
df['month'] = df['month_date_yyyymm'].astype(str).str[4:6].astype(int)

# Add lag features
df['lag_1'] = df['median_listing_price_mm'].shift(1)
df['lag_2'] = df['median_listing_price_mm'].shift(2)
df['lag_3'] = df['median_listing_price_mm'].shift(3)
df.dropna(inplace=True)  # Drop rows with NaN values resulting from lagging

features = ['year', 'month','lag_1', 'lag_2', 'lag_3']

# Define features and target variable
X = df[features]  # Features (updated to use year and month)
y = df['median_listing_price_mm']  # Target variable

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train the model with cross-validation
model = RandomForestRegressor()
model.fit(X_train, y_train)

# Make predictions on the test set
predictions = model.predict(X_test)

# Evaluate the model
mae = mean_absolute_error(y_test, predictions)
mse = mean_squared_error(y_test, predictions)
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)

Mean Absolute Error (MAE): 0.007888312499999998
Mean Squared Error (MSE): 0.00013951657881249995


### Using OpenAI to learn data and make predictions

In [6]:
from openai import OpenAI
import os
from dotenv import load_dotenv

load_dotenv()
gpt = os.getenv('gpt_token')
org = os.getenv('gpt_org')

client = OpenAI(api_key=gpt, organization=org)

summary_prompt = f"""
Analyze the following predictions and real estate metrics data:
{df.to_dict()}
What are the key trends and insights? Can you also provide predictions of 
up to a year into the future based on the information for each section of the metrics.
"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": summary_prompt}
    ],
    max_tokens=2000
)



In [4]:
print(response.choices[0].message.content)

### Key Trends and Insights

#### 1. **Median Listing Price:**
- **Trend:**
  - The median listing price steadily increased from $275,000 (Oct 2017) to $429,950 (Apr 2024).
  - The year-over-year growth peaked in Apr 2021 at 15.72% and later showed slight decreases.
  - Recently, monthly changes have been relatively stable with some minor fluctuations.
  
- **Insights:**
  - Overall, there has been a strong upward trend in prices.
  - Prices may be stabilizing but still showing a slow upward trend.

#### 2. **Active Listing Count:**
- **Trend:**
  - The active listing count shows significant fluctuations, peaking around Spring (Apr 2021) and bottoming around Winter (Dec 2021 onwards).
  - Year-over-year comparisons show recent increases, reflecting a strengthening market.

- **Insights:**
  - The active listing counts exhibit seasonality, typically higher in spring/summer and lower in winter.
  - The market appears to be stabilizing post-2021 downturns.

#### 3. **Median Days on Market