# 🏠 Canadian Housing Market Analysis Report

This notebook summarizes the full data science pipeline for analyzing and forecasting Canadian housing prices using scraped articles, extracted price mentions, and a Prophet time-series forecast.


## 📌 Project Steps
1. Web scraping of articles (BetterDwelling)
2. Cleaning article content
3. Extracting price mentions with regex
4. Normalizing and filtering values (200k–2M range)
5. Simulating monthly timestamps
6. Forecasting with Prophet (2023–2033)
7. Visualization of trends and confidence intervals

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Load filtered forecast
forecast = pd.read_csv("../data/processed/forecast_2023_2033.csv")

## 📈 Forecast Visualization (2023–2033)

In [None]:
# Plot forecast with confidence interval
fig, ax = plt.subplots(figsize=(12, 6))

ax.fill_between(
    pd.to_datetime(forecast["ds"]),
    forecast["yhat_lower"],
    forecast["yhat_upper"],
    color="skyblue",
    alpha=0.5,
    label="Confidence Interval",
)

ax.plot(
    pd.to_datetime(forecast["ds"]), forecast["yhat"], color="blue", label="Forecast"
)
ax.set_title("Canadian Housing Price Forecast (2023–2033)", fontsize=16)
ax.set_xlabel("Year")
ax.set_ylabel("Price (CAD $)")

ax.xaxis.set_major_locator(mdates.YearLocator(1))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
ax.grid(True)
ax.legend()
plt.tight_layout()
plt.show()

## ✅ Key Observations
- The model predicts moderate fluctuations in prices from 2023–2033
- The forecast shows a wide confidence interval after ~2029, indicating uncertainty
- Actual scraped prices covered a reasonable range (200k–2M)
- A cleaner dataset with real timestamps could improve accuracy