# AAPL Market Pattern Analysis Report

## 1. Introduction

This report summarizes the analysis performed on historical Apple Inc. (AAPL) stock data obtained from `AAPL.csv`. The primary goal was to identify distinct market regimes and detect anomalous trading days using unsupervised machine learning techniques.

The analysis involved:
* **Feature Engineering:** Calculating time-series features like log returns, volatility, momentum, and volume changes.
* **Regime Detection:** Using K-Means clustering to group periods with similar volatility and momentum characteristics.
* **Anomaly Detection:** Employing Isolation Forest, Local Outlier Factor (LOF), and Z-Score methods to identify days with unusual trading patterns based on the engineered features.

Two different rolling window sizes were used for feature calculation to capture patterns over different time scales:
* **60-day window:** Capturing medium-term trends and volatility.
* **21-day window:** Capturing shorter-term (approximately monthly) dynamics.

## 2. Methodology Overview

* **Data Preparation:** The raw OHLCV (Open, High, Low, Close, Volume) data was loaded, dates were parsed, and basic cleaning (handling missing values, ensuring positive prices/volume) was performed. 'Adj Close' was used as the primary price series.
* **Feature Engineering:**
    * `log_return`: Daily price change.
    * `volatility`: Rolling standard deviation of log returns (annualized).
    * `momentum`: Rolling mean of log returns (annualized).
    * `volume_log_ratio`: Log ratio of daily volume to rolling mean volume.
* **Scaling:** Features were standardized using `StandardScaler` before applying K-Means, Isolation Forest, and LOF to ensure equal weighting. Z-Score was applied to the scaled `log_return`.
* **Regime Detection (K-Means):** Clustered data points based on scaled `volatility` and `momentum`. Different numbers of clusters (K=3, 4, 5, 6 for 60-day; K=4 for 21-day) were explored.
* **Anomaly Detection:**
    * `Isolation Forest`: Identifies anomalies by isolating observations. Effective for high-dimensional data.
    * `Local Outlier Factor (LOF)`: Measures the local density deviation of a data point with respect to its neighbors.
    * `Z-Score`: Identifies outliers based on a threshold applied to the standard score of the `log_return` feature. Anomalies were flagged if the absolute Z-score exceeded 3.5.

## 3. Results and Interpretation

### 3.1. Regime Detection (60-Day Window)

Using a 60-day window, we explored market regimes with K=3, 4, 5, and 6 clusters. The regimes are characterized by their average scaled volatility and momentum (refer to console output for precise means):

* **K=3 (`regimes_K=3_Win=60.png`):**
    * Regime 0: Low volatility, near-zero momentum (stable periods).
    * Regime 1: High volatility, strong negative momentum (significant downturns/corrections).
    * Regime 2: Moderate volatility, strong positive momentum (uptrends).
* **K=4 (`regimes_K=4_Win=60.png`):**
    * Splits the K=3 regimes further.
    * Regime 0: Moderate volatility, strong negative momentum.
    * Regime 1: Very low volatility, slightly positive momentum (calm growth).
    * Regime 2: Extremely high volatility, very strong negative momentum (market crashes/major crises).
    * Regime 3: Moderate volatility, strong positive momentum (strong uptrends).
* **K=5 (`regimes_K=5_Win=60.png`):**
    * Introduces more granularity, isolating an extremely high volatility/negative momentum regime (Label 3).
    * Further differentiates between low/moderate volatility states.
* **K=6 (`regimes_K=6_Win=60.png`):**
    * Continues to refine, potentially separating different types of uptrends or downturns based on volatility levels. Label 4 remains the extreme crash state.

**Interpretation:** The 60-day window analysis clearly identifies distinct historical periods corresponding to calm markets, steady uptrends, volatile downturns, and major market stress events (like 1987, 2000-01, 2008-09). Increasing K allows for finer distinctions within these broader categories. K=4 seems to provide a good balance, capturing the major distinct states including crashes.

*(Visualizations: `regimes_K=3_Win=60.png`, `regimes_K=4_Win=60.png`, `regimes_K=5_Win=60.png`, `regimes_K=6_Win=60.png`)*

### 3.2. Regime Detection (21-Day Window)

* **K=4 (`regimes_K=4_Win=21.png`):**
    * Regime 0: Moderate volatility, negative momentum (short-term pullbacks).
    * Regime 1: Low volatility, slightly positive momentum (calm short-term periods).
    * Regime 2: Moderate volatility, positive momentum (short-term rallies).
    * Regime 3: Extremely high volatility, very strong negative momentum (sharp, short crashes/spikes).

**Interpretation:** The 21-day window captures shorter-term fluctuations. The regimes identified are similar in character (low vol, high vol, +/- momentum) to the 60-day window but reflect more rapid shifts in market conditions. Regime 3 still captures extreme events, but they might appear more frequently as sharp, short-lived spikes compared to the sustained periods seen with the 60-day window.

*(Visualization: `regimes_K=4_Win=21.png`)*

### 3.3. Anomaly Detection (60-Day Window)

Anomalies (Label -1, typically red dots) were identified using three methods:

* **Isolation Forest (`anomalies_IsolationForest_Win=60.png`):** Detected 1156 anomalies. This method tends to be sensitive and flags numerous points, often during transitions between regimes or periods of moderately high volatility.
* **LOF (`anomalies_LOF_Win=60.png`):** Detected 80 anomalies. LOF focuses on local density, often highlighting points that are unusual compared to their immediate neighbors. It appears more selective than Isolation Forest, often pinpointing sharper spikes or drops.
* **Z-Score (`anomalies_ZScore_Win=60.png`):** Detected 70 anomalies based on `log_return` exceeding a 3.5 standard deviation threshold. This method specifically flags days with exceptionally large single-day price movements (positive or negative).

**Interpretation:** The different methods highlight different aspects of anomalous behavior. Z-Score focuses purely on the magnitude of daily returns. LOF considers the local context. Isolation Forest takes a broader view across all features. Visually, many anomalies cluster around known periods of market stress (e.g., 1987 crash, dot-com bubble burst, 2008 financial crisis, COVID-19 crash in 2020). There is some overlap, but also distinct points identified by each method, suggesting they offer complementary perspectives.

*(Visualizations: `anomalies_IsolationForest_Win=60.png`, `anomalies_LOF_Win=60.png`, `anomalies_ZScore_Win=60.png`)*

### 3.4. Anomaly Detection (21-Day Window)

* **Isolation Forest (`anomalies_IsolationForest_Win=21.png`):** Detected 1239 anomalies. Similar to the 60-day window, it identifies numerous points.

**Interpretation:** Using the shorter window, Isolation Forest again flags a significant number of points. These might correspond to sharp short-term reversals, earnings announcement reactions, or other events causing rapid changes in the short-term features. Comparing visually with the 60-day plot, some anomaly periods overlap, but the 21-day window might flag more frequent, less sustained deviations.

*(Visualization: `anomalies_IsolationForest_Win=21.png`)*

## 4. Key Learnings & Conclusions

* **Market Regimes Exist:** AAPL's historical price action is not random noise; it exhibits distinct periods (regimes) characterized by different combinations of volatility and momentum. K-Means clustering successfully identified these regimes.
* **Window Size Matters:** The choice of lookback window significantly impacts the analysis. Longer windows (60-day) identify broader, more sustained market states, while shorter windows (21-day) capture more frequent, shorter-term fluctuations.
* **Anomalies Highlight Stress Points:** Anomaly detection algorithms effectively pinpoint days or periods of unusual market activity, often coinciding with known market crises, crashes, or sharp rallies.
* **Complementary Anomaly Methods:** Different anomaly detection techniques (Isolation Forest, LOF, Z-Score) capture different types of outliers, providing a more robust view when used together. Isolation Forest was the most sensitive in this analysis.
* **Quantitative Insight:** This analysis provides a quantitative framework for segmenting AAPL's history based on market behavior, moving beyond simple visual inspection of price charts.

## 5. Limitations & Next Steps

* **Parameter Sensitivity:** The results (number of regimes, detected anomalies) are sensitive to parameter choices (K in K-Means, window sizes, anomaly detection thresholds/parameters).
* **Feature Selection:** The analysis relied on volatility, momentum, and volume features. Incorporating other technical indicators or fundamental data could yield different insights.
* **Descriptive, Not Predictive:** This analysis describes historical patterns; it does not inherently predict future regimes or anomalies.
* **Label Interpretation:** Regime labels require interpretation based on feature means; they don't have inherent qualitative meaning.

**Potential Next Steps:**
* Experiment with different window sizes and K values.
* Incorporate additional features (e.g., RSI, MACD, VIX correlation).
* Use different clustering or anomaly detection algorithms.
* Correlate identified regimes and anomalies with specific macroeconomic events or company news releases.
* Explore using these identified regimes/anomalies as features in predictive trading models (with caution).
