# Week 8: Deep Learning for Sequences - RNN/LSTM in Finance

---

Welcome to Week 8! This week we dive into the exciting world of **Deep Learning for Sequential Data**.

**What you'll learn:**
- **Recurrent Neural Networks (RNN)**: Understanding networks with memory
- **Long Short-Term Memory (LSTM)**: Advanced RNNs that solve long-term dependency problems
- **Sequence Modeling**: Predicting future values from historical patterns
- **Performance Evaluation**: Measuring and comparing model improvements

**Why this matters in Finance:**
- Financial data is inherently sequential (time series)
- Prices, volumes, and returns have temporal dependencies
- Traditional ML ignores the order of data points
- LSTMs can capture complex patterns across different time horizons

**By the end of this notebook, you'll be able to:**

‚úÖ Understand how RNNs maintain memory of past information  
‚úÖ Build and train an LSTM model for price prediction  
‚úÖ Compare LSTM performance against baseline models  
‚úÖ Interpret and visualize LSTM predictions  
‚úÖ Apply sequence modeling to real DeFi/crypto data  

---

### üõ†Ô∏è Setup: Installing Required Libraries

Before we begin, we need to import our tools:

- **NumPy & Pandas**: Data manipulation and numerical operations
- **Matplotlib & Seaborn**: Visualization
- **TensorFlow/Keras**: Deep learning framework for building neural networks
- **Scikit-learn**: Preprocessing and evaluation metrics

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

# Deep Learning libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, SimpleRNN
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Set visualization style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")

# Confirm successful import
print("=" * 70)
print("‚úÖ ALL LIBRARIES IMPORTED SUCCESSFULLY!")
print("=" * 70)
print("\nLibraries loaded:")
print("  ‚Ä¢ NumPy version:", np.__version__)
print("  ‚Ä¢ Pandas version:", pd.__version__)
print("  ‚Ä¢ TensorFlow version:", tf.__version__)
print("  ‚Ä¢ Keras: Ready for Deep Learning")
print("  ‚Ä¢ GPU Available:", "Yes" if len(tf.config.list_physical_devices('GPU')) > 0 else "No (CPU only)")
print("\n" + "=" * 70)

---

# Part 1: Understanding Sequential Data and the Need for RNNs

## üéØ What Makes Financial Data Sequential?

### The Problem with Traditional Neural Networks

Imagine you're trying to predict tomorrow's Bitcoin price. Traditional neural networks treat each data point **independently**:

```
Traditional Neural Network:
Price_t-3 ‚Üí NN ‚Üí Prediction_t+1
Price_t-2 ‚Üí NN ‚Üí Prediction_t+1
Price_t-1 ‚Üí NN ‚Üí Prediction_t+1
```

**What's wrong with this?**
- Each prediction ignores the **order** of data
- No memory of what happened before
- Can't capture trends, momentum, or patterns across time

### The Sequential Nature of Finance

Financial data has **temporal dependencies**:
1. **Short-term momentum**: Yesterday's rise often continues today
2. **Trend patterns**: Prices move in trends (bull/bear markets)
3. **Volatility clustering**: High volatility days cluster together
4. **Seasonal patterns**: Weekly/monthly regularities
5. **Mean reversion**: Prices eventually return to equilibrium

**We need a model that "remembers" the sequence!**

---

## üß† Enter Recurrent Neural Networks (RNNs)

### The Core Idea: Neural Networks with Memory

**RNNs process sequential data one step at a time, maintaining a "hidden state" (memory):**

```
RNN with Memory:
Price_t-3 ‚Üí [RNN] ‚Üí hidden_state_1
Price_t-2 ‚Üí [RNN + hidden_state_1] ‚Üí hidden_state_2
Price_t-1 ‚Üí [RNN + hidden_state_2] ‚Üí hidden_state_3
            [hidden_state_3] ‚Üí Prediction_t+1
```

### Key Concepts:

1. **Hidden State (h_t)**: The "memory" that gets passed from one time step to the next
   - Contains information about all previous time steps
   - Updated at each new observation

2. **Recurrent Connection**: Output feeds back as input
   - Same weights applied at each time step
   - Learns patterns that work across different time positions

3. **Unrolling Through Time**: Conceptually, an RNN for T time steps is like a very deep network with T layers

---

## üìê RNN Mathematics (Simplified)

At each time step t, an RNN does:

**Step 1:** Combine current input (x_t) with previous hidden state (h_t-1)
```
h_t = tanh(W_hh * h_t-1 + W_xh * x_t + b_h)
```

**Step 2:** Generate output from hidden state
```
y_t = W_hy * h_t + b_y
```

Where:
- **x_t**: Current input (e.g., today's price features)
- **h_t**: Hidden state (memory) at time t
- **y_t**: Output/prediction at time t
- **W**: Weight matrices (learned during training)
- **tanh**: Activation function (keeps values between -1 and 1)

### Intuition:
Think of the hidden state as a **summary of everything seen so far**, constantly updated with new information.

---

## ‚ö†Ô∏è The Vanishing Gradient Problem

### Why Basic RNNs Struggle

**The Problem:**
When training RNNs through backpropagation, gradients need to flow backward through many time steps. With standard RNNs:

- Gradients get multiplied repeatedly
- They either **vanish** (‚Üí 0) or **explode** (‚Üí ‚àû)
- Network can't learn long-term dependencies

**Example in Finance:**
```
Imagine trying to predict Bitcoin price in 2024 based on:
- Yesterday's price (easy for RNN)
- Last week's trend (harder for RNN)
- The halving event 6 months ago (impossible for basic RNN!)
```

Basic RNNs **forget long-term information** due to vanishing gradients.

---

## üöÄ Long Short-Term Memory (LSTM): The Solution

### What Makes LSTM Special?

**LSTMs are specifically designed to remember information for long periods!**

They solve the vanishing gradient problem through a clever architecture with **gates** that control information flow.

### The LSTM Cell Architecture

An LSTM cell has two states:
1. **Cell State (C_t)**: Long-term memory highway
2. **Hidden State (h_t)**: Short-term working memory

And three gates:
1. **Forget Gate (f_t)**: What to forget from long-term memory
2. **Input Gate (i_t)**: What new information to store
3. **Output Gate (o_t)**: What to output/remember short-term

### LSTM Intuition with a Trading Example

**Imagine you're a trader analyzing Bitcoin:**

1. **Cell State (Long-term memory):**
   - "Bitcoin tends to pump after halvings" (stored long-term)
   - "Institutional adoption is increasing" (stored long-term)
   - "Regulatory news impacts price" (stored long-term)

2. **Forget Gate:**
   - "That FUD tweet from 3 months ago? Not relevant anymore" ‚Üí FORGET
   - "The bull market pattern? Still relevant" ‚Üí KEEP

3. **Input Gate:**
   - "New: Major exchange got hacked" ‚Üí IMPORTANT, STORE THIS
   - "Minor: Some random price fluctuation" ‚Üí IGNORE

4. **Output Gate:**
   - "Based on long-term bull trend + recent hack news ‚Üí Predict temporary dip"

---

## üìä LSTM Mathematics (Step-by-Step)

At each time step t, LSTM performs these operations:

**Step 1: Forget Gate** (What to forget from cell state)
```
f_t = œÉ(W_f ¬∑ [h_t-1, x_t] + b_f)
```
- Output: Values between 0 (completely forget) and 1 (completely remember)

**Step 2: Input Gate** (What new information to store)
```
i_t = œÉ(W_i ¬∑ [h_t-1, x_t] + b_i)
CÃÉ_t = tanh(W_C ¬∑ [h_t-1, x_t] + b_C)
```
- i_t: How much to update
- CÃÉ_t: Candidate values to add

**Step 3: Update Cell State** (Update long-term memory)
```
C_t = f_t * C_t-1 + i_t * CÃÉ_t
```
- Forget old info (f_t * C_t-1)
- Add new info (i_t * CÃÉ_t)

**Step 4: Output Gate** (What to output)
```
o_t = œÉ(W_o ¬∑ [h_t-1, x_t] + b_o)
h_t = o_t * tanh(C_t)
```
- Decides what parts of cell state to reveal

Where:
- **œÉ**: Sigmoid function (outputs 0 to 1)
- **tanh**: Hyperbolic tangent (outputs -1 to 1)
- **¬∑**: Matrix multiplication
- **‚àó**: Element-wise multiplication

### The Key Insight:
The cell state (C_t) flows through time with only minor modifications, allowing gradients to flow backward easily. This solves the vanishing gradient problem!

---

## üí° When to Use LSTMs in Finance?

‚úÖ **USE LSTMs when you have:**
- Time series data with temporal dependencies
- Need to capture long-term patterns (weeks/months)
- Multiple time-varying features
- Non-linear relationships in sequential data
- Sufficient data (typically 1000+ observations)

‚ùå **BE CAREFUL when:**
- You have very little data (< 500 observations)
- Relationships are clearly linear (simpler models may work)
- You need real-time, ultra-fast predictions
- Interpretability is critical (LSTMs are black boxes)
- Data has strong non-stationarity

---

## üåü Real-World Finance Applications of LSTM

1. **Price Prediction**: Forecast stock/crypto prices using historical patterns
2. **Volatility Forecasting**: Predict future volatility for option pricing
3. **Trading Signal Generation**: Generate buy/sell signals from patterns
4. **Risk Management**: Predict Value-at-Risk (VaR) dynamically
5. **Market Regime Detection**: Identify shifts between bull/bear markets
6. **Sentiment Analysis**: Process sequential text data (news, tweets)
7. **Portfolio Optimization**: Dynamic rebalancing based on predicted returns
8. **Fraud Detection**: Identify unusual transaction patterns

---

## üìö LSTM vs Traditional Methods

| Aspect | Traditional ML (RF, XGBoost) | LSTM |
|--------|------------------------------|------|
| **Sequential Memory** | ‚ùå No (treats data independently) | ‚úÖ Yes (maintains hidden state) |
| **Long-term Dependencies** | ‚ùå Hard to capture | ‚úÖ Designed for this |
| **Feature Engineering** | Manual (lag features, rolling stats) | Automatic (learns patterns) |
| **Training Time** | Fast (minutes) | Slow (hours) |
| **Data Requirements** | Moderate (100s) | High (1000s+) |
| **Interpretability** | High (feature importance) | Low (black box) |
| **Overfitting Risk** | Moderate | High (needs regularization) |

**Bottom Line:** LSTMs excel when you have enough data and complex temporal patterns. For simpler problems, traditional ML may be better!

---

Now let's build our first LSTM model! üöÄ

---

# Part 2: Data Preparation for Sequence Modeling

## üéØ Creating Realistic Financial Time Series Data

Before we can train an LSTM, we need to understand how to prepare sequential data. We'll:

1. Generate synthetic crypto price data with realistic properties
2. Create sequences (look-back windows) for training
3. Split data properly for time series (no random shuffling!)
4. Scale features appropriately

### Key Concept: Look-back Windows

LSTMs need **sequences** of data as input:

```
If look_back = 10 days:
Input:  [Price_t-10, Price_t-9, ..., Price_t-1]  (10 previous days)
Output: Price_t                                   (tomorrow's price)
```

We'll generate multiple such sequences from our time series data.

In [None]:
class FinancialDataGenerator:
    """
    Generate synthetic financial time series data with realistic properties
    for LSTM training and evaluation.
    """
    
    def __init__(self, n_samples=2000, start_price=100, seed=42):
        """
        Initialize the data generator.
        
        Parameters:
        -----------
        n_samples : int
            Number of time steps to generate
        start_price : float
            Initial price value
        seed : int
            Random seed for reproducibility
        """
        self.n_samples = n_samples
        self.start_price = start_price
        self.seed = seed
        np.random.seed(seed)
    
    def generate_crypto_prices(self):
        """
        Generate synthetic crypto prices with trend, momentum, and noise.
        
        Mimics real crypto behavior:
        - Geometric Brownian Motion (random walk)
        - Trend component (bull/bear cycles)
        - Momentum component (short-term persistence)
        - Volatility clustering
        """
        print("üîß Generating synthetic crypto price data...")
        print("="*70)
        
        # Initialize
        prices = np.zeros(self.n_samples)
        prices[0] = self.start_price
        
        # Parameters
        base_volatility = 0.02  # 2% daily volatility
        trend_strength = 0.0003  # Slight upward bias
        momentum_strength = 0.3  # How much yesterday influences today
        
        print(f"üìä Parameters:")
        print(f"   ‚Ä¢ Samples: {self.n_samples}")
        print(f"   ‚Ä¢ Starting price: ${self.start_price}")
        print(f"   ‚Ä¢ Base volatility: {base_volatility*100:.1f}%")
        print(f"   ‚Ä¢ Trend strength: {trend_strength*100:.2f}%")
        print(f"   ‚Ä¢ Momentum: {momentum_strength*100:.0f}%")
        
        # Generate price series
        for t in range(1, self.n_samples):
            # Random shock (white noise)
            shock = np.random.randn() * base_volatility
            
            # Trend component (long-term drift)
            trend = trend_strength
            
            # Momentum component (yesterday's return influences today)
            if t > 1:
                prev_return = (prices[t-1] - prices[t-2]) / prices[t-2]
                momentum = momentum_strength * prev_return
            else:
                momentum = 0
            
            # Combine components
            total_return = trend + momentum + shock
            
            # Update price
            prices[t] = prices[t-1] * (1 + total_return)
        
        # Create DataFrame
        dates = pd.date_range(start='2020-01-01', periods=self.n_samples, freq='D')
        df = pd.DataFrame({'Date': dates, 'Price': prices})
        df.set_index('Date', inplace=True)
        
        # Calculate additional features
        df['Returns'] = df['Price'].pct_change()
        df['Log_Returns'] = np.log(df['Price'] / df['Price'].shift(1))
        df['Volatility_5d'] = df['Returns'].rolling(window=5).std()
        df['MA_10'] = df['Price'].rolling(window=10).mean()
        df['MA_50'] = df['Price'].rolling(window=50).mean()
        
        # Drop NaN rows
        df.dropna(inplace=True)
        
        print("\n‚úÖ Data generation complete!")
        print(f"   ‚Ä¢ Final shape: {df.shape}")
        print(f"   ‚Ä¢ Price range: ${df['Price'].min():.2f} - ${df['Price'].max():.2f}")
        print(f"   ‚Ä¢ Mean daily return: {df['Returns'].mean()*100:.3f}%")
        print(f"   ‚Ä¢ Volatility (std): {df['Returns'].std()*100:.2f}%")
        print("="*70)
        
        return df
    
    def plot_price_series(self, df, figsize=(14, 10)):
        """
        Create comprehensive visualization of the generated price series.
        """
        fig, axes = plt.subplots(3, 1, figsize=figsize)
        
        # Plot 1: Price with Moving Averages
        axes[0].plot(df.index, df['Price'], label='Price', linewidth=1.5, alpha=0.8)
        axes[0].plot(df.index, df['MA_10'], label='MA(10)', linewidth=1, alpha=0.7)
        axes[0].plot(df.index, df['MA_50'], label='MA(50)', linewidth=1, alpha=0.7)
        axes[0].set_title('Generated Crypto Price Series with Moving Averages', fontsize=14, fontweight='bold')
        axes[0].set_ylabel('Price ($)', fontsize=11)
        axes[0].legend(loc='best')
        axes[0].grid(True, alpha=0.3)
        
        # Plot 2: Daily Returns
        axes[1].plot(df.index, df['Returns']*100, color='steelblue', alpha=0.6, linewidth=0.8)
        axes[1].axhline(y=0, color='red', linestyle='--', linewidth=1, alpha=0.5)
        axes[1].set_title('Daily Returns (%)', fontsize=14, fontweight='bold')
        axes[1].set_ylabel('Return (%)', fontsize=11)
        axes[1].grid(True, alpha=0.3)
        
        # Plot 3: Rolling Volatility
        axes[2].plot(df.index, df['Volatility_5d']*100, color='coral', linewidth=1.5)
        axes[2].set_title('5-Day Rolling Volatility', fontsize=14, fontweight='bold')
        axes[2].set_ylabel('Volatility (%)', fontsize=11)
        axes[2].set_xlabel('Date', fontsize=11)
        axes[2].grid(True, alpha=0.3)
        
        plt.tight_layout()
        return fig

# Generate data
print("\n" + "="*70)
print("STEP 1: DATA GENERATION")
print("="*70 + "\n")

data_gen = FinancialDataGenerator(n_samples=2000, start_price=100, seed=42)
crypto_df = data_gen.generate_crypto_prices()

print("\nüìä First few rows of generated data:")
display(crypto_df.head(10))

print("\nüìà Statistical Summary:")
display(crypto_df.describe())

# Visualize
print("\nüé® Creating visualizations...")
fig = data_gen.plot_price_series(crypto_df)
plt.show()

print("\nüí° Key Observations:")
print("="*70)
print("   ‚Ä¢ Price shows realistic random walk behavior")
print("   ‚Ä¢ Moving averages smooth out short-term noise")
print("   ‚Ä¢ Returns fluctuate around zero (mean-reverting)")
print("   ‚Ä¢ Volatility clusters (high volatility periods persist)")
print("   ‚Ä¢ This mimics real crypto market behavior!")
print("="*70)

---

## üîß Creating Sequences for LSTM Training

### Understanding the Sequence Creation Process

**The Challenge:** LSTMs need 3D input data with shape `(samples, timesteps, features)`

**Our 1D price series:**
```
[100, 102, 101, 103, 105, 104, 106, ...]
```

**Must become 3D sequences:**
```
Sequence 1: [[100, 102, 101, 103, 105]] ‚Üí Target: 104
Sequence 2: [[102, 101, 103, 105, 104]] ‚Üí Target: 106
Sequence 3: [[101, 103, 105, 104, 106]] ‚Üí Target: ...
...
```

### Important Considerations:

1. **Look-back Period**: How many past time steps to use
   - Too short: Miss long-term patterns
   - Too long: Overfitting, computational cost
   - Common choices: 10-60 days for daily data

2. **Time Series Split**: NEVER randomly shuffle!
   - Train on past data only
   - Validate on intermediate period
   - Test on most recent data

3. **Scaling**: Critical for neural networks
   - Fit scaler on training data only
   - Transform validation and test using training scaler
   - Prevents data leakage

In [None]:
class SequencePreparator:
    """
    Prepare sequential data for LSTM training.
    Handles sequence creation, train/val/test split, and scaling.
    """
    
    def __init__(self, look_back=30, train_ratio=0.7, val_ratio=0.15):
        """
        Initialize the sequence preparator.
        
        Parameters:
        -----------
        look_back : int
            Number of previous time steps to use as input
        train_ratio : float
            Proportion of data for training
        val_ratio : float
            Proportion of data for validation
        """
        self.look_back = look_back
        self.train_ratio = train_ratio
        self.val_ratio = val_ratio
        self.test_ratio = 1 - train_ratio - val_ratio
        self.scaler = MinMaxScaler(feature_range=(0, 1))
    
    def create_sequences(self, data, look_back):
        """
        Create sequences from 1D time series data.
        
        Parameters:
        -----------
        data : np.array
            1D array of values
        look_back : int
            Number of previous time steps
        
        Returns:
        --------
        X, y : np.arrays
            X: 3D array (samples, timesteps, features)
            y: 2D array (samples, 1) - targets
        """
        X, y = [], []
        
        for i in range(len(data) - look_back):
            # Get sequence of 'look_back' previous values
            sequence = data[i:(i + look_back)]
            # Get the next value as target
            target = data[i + look_back]
            
            X.append(sequence)
            y.append(target)
        
        X = np.array(X)
        y = np.array(y)
        
        # Reshape X to (samples, timesteps, features)
        X = X.reshape((X.shape[0], X.shape[1], 1))
        
        return X, y
    
    def prepare_data(self, df, target_column='Price'):
        """
        Complete data preparation pipeline.
        
        Returns:
        --------
        Dictionary with train, val, test splits (scaled and unscaled)
        """
        print("\n" + "="*70)
        print("STEP 2: SEQUENCE PREPARATION")
        print("="*70)
        
        # Extract target values
        data = df[target_column].values.reshape(-1, 1)
        
        print(f"\nüìä Data Shape: {data.shape}")
        print(f"   ‚Ä¢ Total samples: {len(data)}")
        print(f"   ‚Ä¢ Look-back period: {self.look_back} days")
        print(f"   ‚Ä¢ Effective sequences: {len(data) - self.look_back}")
        
        # Split indices (time series split - no shuffling!)
        total_len = len(data)
        train_end = int(total_len * self.train_ratio)
        val_end = int(total_len * (self.train_ratio + self.val_ratio))
        
        print(f"\nüîÄ Time Series Split:")
        print(f"   ‚Ä¢ Train: 0 to {train_end} ({self.train_ratio*100:.0f}%)")
        print(f"   ‚Ä¢ Val:   {train_end} to {val_end} ({self.val_ratio*100:.0f}%)")
        print(f"   ‚Ä¢ Test:  {val_end} to {total_len} ({self.test_ratio*100:.0f}%)")
        
        # Split data
        train_data = data[:train_end]
        val_data = data[train_end:val_end]
        test_data = data[val_end:]
        
        # Fit scaler on training data only (IMPORTANT!)
        print("\n‚öñÔ∏è  Scaling data...")
        train_scaled = self.scaler.fit_transform(train_data)
        val_scaled = self.scaler.transform(val_data)
        test_scaled = self.scaler.transform(test_data)
        
        print(f"   ‚Ä¢ Scaler fitted on training data")
        print(f"   ‚Ä¢ Min: {self.scaler.data_min_[0]:.2f}, Max: {self.scaler.data_max_[0]:.2f}")
        print(f"   ‚Ä¢ Scaled range: [0, 1]")
        
        # Create sequences
        print("\nüîÑ Creating sequences...")
        X_train, y_train = self.create_sequences(train_scaled, self.look_back)
        X_val, y_val = self.create_sequences(val_scaled, self.look_back)
        X_test, y_test = self.create_sequences(test_scaled, self.look_back)
        
        print(f"\n‚úÖ Sequences Created:")
        print(f"   ‚Ä¢ X_train: {X_train.shape} | y_train: {y_train.shape}")
        print(f"   ‚Ä¢ X_val:   {X_val.shape} | y_val: {y_val.shape}")
        print(f"   ‚Ä¢ X_test:  {X_test.shape} | y_test: {y_test.shape}")
        print("="*70)
        
        # Return everything
        return {
            'X_train': X_train, 'y_train': y_train,
            'X_val': X_val, 'y_val': y_val,
            'X_test': X_test, 'y_test': y_test,
            'train_data': train_data,
            'val_data': val_data,
            'test_data': test_data,
            'scaler': self.scaler
        }
    
    def visualize_splits(self, df, train_end_idx, val_end_idx, figsize=(14, 5)):
        """
        Visualize the train/val/test split.
        """
        fig, ax = plt.subplots(figsize=figsize)
        
        # Plot full data
        ax.plot(df.index, df['Price'], color='gray', alpha=0.3, label='Full Data')
        
        # Highlight splits
        train_data = df.iloc[:train_end_idx]
        val_data = df.iloc[train_end_idx:val_end_idx]
        test_data = df.iloc[val_end_idx:]
        
        ax.plot(train_data.index, train_data['Price'], color='blue', linewidth=2, label='Train')
        ax.plot(val_data.index, val_data['Price'], color='orange', linewidth=2, label='Validation')
        ax.plot(test_data.index, test_data['Price'], color='green', linewidth=2, label='Test')
        
        ax.set_title('Train / Validation / Test Split (Time Series)', fontsize=14, fontweight='bold')
        ax.set_xlabel('Date', fontsize=11)
        ax.set_ylabel('Price ($)', fontsize=11)
        ax.legend(loc='best')
        ax.grid(True, alpha=0.3)
        
        plt.tight_layout()
        return fig

# Prepare sequences
seq_prep = SequencePreparator(look_back=30, train_ratio=0.7, val_ratio=0.15)
data_dict = seq_prep.prepare_data(crypto_df, target_column='Price')

# Visualize splits
print("\nüé® Visualizing data splits...")
train_end = int(len(crypto_df) * 0.7)
val_end = int(len(crypto_df) * 0.85)
fig = seq_prep.visualize_splits(crypto_df, train_end, val_end)
plt.show()

print("\nüí° Key Points About Sequence Preparation:")
print("="*70)
print("   1. ‚è∞ Temporal Order Preserved:")
print("      ‚Ä¢ Train on past, validate on intermediate, test on future")
print("      ‚Ä¢ NO random shuffling (unlike regular ML!)")
print()
print("   2. üî¢ Sequence Structure:")
print("      ‚Ä¢ Each sample is a window of 30 consecutive days")
print("      ‚Ä¢ Target is the price on day 31")
print("      ‚Ä¢ Sliding window creates many overlapping sequences")
print()
print("   3. ‚öñÔ∏è  Scaling:")
print("      ‚Ä¢ Scaler fitted ONLY on training data")
print("      ‚Ä¢ Same scaler applied to val and test")
print("      ‚Ä¢ Prevents data leakage from future")
print()
print("   4. üìä Shape Transformation:")
print(f"      ‚Ä¢ Original: (samples, 1)")
print(f"      ‚Ä¢ After sequences: (samples, timesteps, features)")
print(f"      ‚Ä¢ Ready for LSTM input!")
print("="*70)

---

# Part 3: Baseline Models - Know Your Competition!

## üéØ Why Build Baseline Models?

**Before celebrating LSTM performance, we MUST establish baselines:**

### The Principle:
> "A complex model is only valuable if it beats simple alternatives!"

### Common Baselines in Finance:

1. **Naive Forecast**: "Tomorrow = Today"
   - Surprisingly hard to beat in random walks!
   - Also called "persistence model"

2. **Simple Moving Average**: Average of last N days
   - Smooths noise
   - Widely used in practice

3. **Linear Regression**: Fit a trend line
   - Assumes linear relationships
   - Fast and interpretable

### Why This Matters:
- If LSTM barely beats naive forecast ‚Üí Not worth the complexity
- If LSTM beats all baselines ‚Üí Strong evidence of value
- Helps quantify the "performance lift" from deep learning

---

## üìä Evaluation Metrics

We'll use multiple metrics because each captures different aspects:

1. **MAE (Mean Absolute Error)**: Average prediction error in dollars
   - Easy to interpret: "Off by $X on average"
   - Not sensitive to outliers

2. **RMSE (Root Mean Squared Error)**: Penalizes large errors more
   - Standard metric in ML
   - Sensitive to outliers

3. **MAPE (Mean Absolute Percentage Error)**: Error as percentage
   - Scale-independent
   - Easy to understand: "Off by X%"

4. **R¬≤ Score**: Proportion of variance explained
   - 1.0 = perfect predictions
   - 0.0 = no better than mean
   - < 0.0 = worse than mean!

In [None]:
class BaselineModels:
    """
    Baseline models for time series forecasting.
    Essential for evaluating if LSTM provides real value.
    """
    
    def __init__(self):
        self.models = {}
        self.predictions = {}
        self.metrics = {}
    
    def calculate_metrics(self, y_true, y_pred, model_name):
        """
        Calculate comprehensive evaluation metrics.
        """
        # Ensure correct shapes
        y_true = y_true.flatten()
        y_pred = y_pred.flatten()
        
        mae = mean_absolute_error(y_true, y_pred)
        rmse = np.sqrt(mean_squared_error(y_true, y_pred))
        mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
        r2 = r2_score(y_true, y_pred)
        
        self.metrics[model_name] = {
            'MAE': mae,
            'RMSE': rmse,
            'MAPE': mape,
            'R2': r2
        }
        
        return mae, rmse, mape, r2
    
    def naive_forecast(self, X, y):
        """
        Naive forecast: Tomorrow's price = Today's price
        (Use the last value from each sequence)
        """
        # Last value in each sequence is our prediction
        predictions = X[:, -1, 0]  # Last timestep of each sequence
        self.predictions['Naive'] = predictions
        return predictions
    
    def moving_average_forecast(self, X, window=5):
        """
        Moving average of last 'window' days as prediction.
        """
        predictions = np.mean(X[:, -window:, 0], axis=1)
        self.predictions['MA'] = predictions
        return predictions
    
    def linear_regression_forecast(self, X_train, y_train, X_test):
        """
        Linear regression using the sequence values as features.
        """
        # Flatten sequences for linear regression
        X_train_flat = X_train.reshape(X_train.shape[0], -1)
        X_test_flat = X_test.reshape(X_test.shape[0], -1)
        
        # Train
        lr = LinearRegression()
        lr.fit(X_train_flat, y_train)
        self.models['LinearRegression'] = lr
        
        # Predict
        predictions = lr.predict(X_test_flat)
        self.predictions['LinearRegression'] = predictions
        return predictions
    
    def evaluate_all_baselines(self, X_train, y_train, X_test, y_test, scaler):
        """
        Train and evaluate all baseline models.
        """
        print("\n" + "="*70)
        print("STEP 3: BASELINE MODELS EVALUATION")
        print("="*70)
        
        results = {}
        
        # 1. Naive Forecast
        print("\n1Ô∏è‚É£ Naive Forecast (Persistence Model)")
        print("-" * 70)
        pred_naive = self.naive_forecast(X_test, y_test)
        
        # Inverse transform to original scale
        pred_naive_original = scaler.inverse_transform(pred_naive.reshape(-1, 1))
        y_test_original = scaler.inverse_transform(y_test.reshape(-1, 1))
        
        mae, rmse, mape, r2 = self.calculate_metrics(
            y_test_original, pred_naive_original, 'Naive'
        )
        
        print(f"   ‚Ä¢ MAE:  ${mae:.2f}")
        print(f"   ‚Ä¢ RMSE: ${rmse:.2f}")
        print(f"   ‚Ä¢ MAPE: {mape:.2f}%")
        print(f"   ‚Ä¢ R¬≤:   {r2:.4f}")
        results['Naive'] = pred_naive_original
        
        # 2. Moving Average
        print("\n2Ô∏è‚É£ Moving Average (5-day)")
        print("-" * 70)
        pred_ma = self.moving_average_forecast(X_test, window=5)
        pred_ma_original = scaler.inverse_transform(pred_ma.reshape(-1, 1))
        
        mae, rmse, mape, r2 = self.calculate_metrics(
            y_test_original, pred_ma_original, 'MovingAverage'
        )
        
        print(f"   ‚Ä¢ MAE:  ${mae:.2f}")
        print(f"   ‚Ä¢ RMSE: ${rmse:.2f}")
        print(f"   ‚Ä¢ MAPE: {mape:.2f}%")
        print(f"   ‚Ä¢ R¬≤:   {r2:.4f}")
        results['MovingAverage'] = pred_ma_original
        
        # 3. Linear Regression
        print("\n3Ô∏è‚É£ Linear Regression")
        print("-" * 70)
        pred_lr = self.linear_regression_forecast(X_train, y_train, X_test)
        pred_lr_original = scaler.inverse_transform(pred_lr.reshape(-1, 1))
        
        mae, rmse, mape, r2 = self.calculate_metrics(
            y_test_original, pred_lr_original, 'LinearRegression'
        )
        
        print(f"   ‚Ä¢ MAE:  ${mae:.2f}")
        print(f"   ‚Ä¢ RMSE: ${rmse:.2f}")
        print(f"   ‚Ä¢ MAPE: {mape:.2f}%")
        print(f"   ‚Ä¢ R¬≤:   {r2:.4f}")
        results['LinearRegression'] = pred_lr_original
        
        print("\n" + "="*70)
        print("‚úÖ BASELINE EVALUATION COMPLETE")
        print("="*70)
        
        return results, y_test_original
    
    def plot_baseline_comparison(self, results, y_true, figsize=(14, 6)):
        """
        Visualize baseline predictions vs actual.
        """
        fig, axes = plt.subplots(1, 2, figsize=figsize)
        
        # Plot 1: Predictions vs Actual
        plot_range = slice(0, 100)  # First 100 points for clarity
        axes[0].plot(y_true[plot_range], label='Actual', linewidth=2, alpha=0.8, color='black')
        
        colors = ['blue', 'orange', 'green']
        for i, (name, pred) in enumerate(results.items()):
            axes[0].plot(pred[plot_range], label=name, linewidth=1.5, alpha=0.7, color=colors[i])
        
        axes[0].set_title('Baseline Predictions vs Actual (First 100 points)', 
                         fontsize=12, fontweight='bold')
        axes[0].set_xlabel('Time Step')
        axes[0].set_ylabel('Price ($)')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Plot 2: Metrics Comparison
        metrics_df = pd.DataFrame(self.metrics).T
        metrics_df[['MAE', 'RMSE']].plot(kind='bar', ax=axes[1], color=['steelblue', 'coral'])
        axes[1].set_title('Baseline Model Performance (MAE & RMSE)', 
                         fontsize=12, fontweight='bold')
        axes[1].set_xlabel('Model')
        axes[1].set_ylabel('Error ($)')
        axes[1].legend(['MAE', 'RMSE'])
        axes[1].grid(True, alpha=0.3, axis='y')
        plt.xticks(rotation=45)
        
        plt.tight_layout()
        return fig

# Evaluate baseline models
baseline = BaselineModels()
baseline_results, y_test_original = baseline.evaluate_all_baselines(
    data_dict['X_train'], 
    data_dict['y_train'],
    data_dict['X_test'], 
    data_dict['y_test'],
    data_dict['scaler']
)

# Create summary table
print("\nüìä Baseline Models Summary:")
metrics_df = pd.DataFrame(baseline.metrics).T
metrics_df = metrics_df.round(4)
display(metrics_df)

# Visualize
print("\nüé® Creating comparison visualizations...")
fig = baseline.plot_baseline_comparison(baseline_results, y_test_original)
plt.show()

print("\nüí° Baseline Insights:")
print("="*70)
print("   ‚Ä¢ Naive forecast is often surprisingly good for financial data")
print("   ‚Ä¢ Moving average smooths out noise but lags behind trends")
print("   ‚Ä¢ Linear regression captures overall trend but misses non-linearities")
print("   ‚Ä¢ These are the benchmarks our LSTM must beat!")
print("="*70)

# Find best baseline
best_baseline = min(baseline.metrics.items(), key=lambda x: x[1]['RMSE'])
print(f"\nüèÜ Best Baseline: {best_baseline[0]}")
print(f"   ‚Ä¢ RMSE: ${best_baseline[1]['RMSE']:.2f}")
print(f"\nüëâ Our LSTM needs to beat this to be worthwhile!")

---

# Part 4: Building and Training LSTM Models

## üèóÔ∏è LSTM Architecture Design

### Key Decisions When Building LSTMs:

1. **Number of LSTM Layers**: 
   - Single layer: Simple patterns
   - Multiple layers: Complex hierarchical patterns
   - More layers ‚â† always better (overfitting risk)

2. **Number of Units (Neurons)**:
   - Too few: Underfitting (can't capture patterns)
   - Too many: Overfitting (memorizes training data)
   - Common choices: 50-200 for financial data

3. **Dropout Regularization**:
   - Randomly "drops" neurons during training
   - Prevents overfitting
   - Common values: 0.2-0.3 (20-30%)

4. **Return Sequences**:
   - True: Output at each timestep (for stacked LSTMs)
   - False: Output only at end (for prediction)

---

## üé® Our LSTM Architecture

We'll build two models for comparison:

### Model 1: Simple LSTM
```
Input (30 timesteps, 1 feature)
    ‚Üì
LSTM Layer (50 units)
    ‚Üì
Dropout (20%)
    ‚Üì
Dense Layer (1 unit)
    ‚Üì
Output (price prediction)
```

### Model 2: Deep LSTM (Stacked)
```
Input (30 timesteps, 1 feature)
    ‚Üì
LSTM Layer 1 (50 units, return_sequences=True)
    ‚Üì
Dropout (20%)
    ‚Üì
LSTM Layer 2 (50 units)
    ‚Üì
Dropout (20%)
    ‚Üì
Dense Layer (1 unit)
    ‚Üì
Output (price prediction)
```

---

## üéì Training Configuration

### Loss Function: Mean Squared Error (MSE)
- Standard for regression problems
- Penalizes large errors

### Optimizer: Adam
- Adaptive learning rate
- Works well for LSTMs
- Learning rate: 0.001 (default)

### Callbacks:
1. **Early Stopping**: Stop if validation loss stops improving
   - Patience: Wait 15 epochs before stopping
   - Prevents overfitting

2. **Learning Rate Reduction**: Reduce LR when stuck
   - Factor: 0.5 (halve the learning rate)
   - Patience: 10 epochs
   - Helps fine-tune towards end of training

In [None]:
class LSTMModels:
    """
    Build, train, and evaluate LSTM models for time series forecasting.
    """
    
    def __init__(self):
        self.models = {}
        self.histories = {}
        self.predictions = {}
    
    def build_simple_lstm(self, input_shape, units=50, dropout=0.2):
        """
        Build a simple LSTM model.
        
        Parameters:
        -----------
        input_shape : tuple
            (timesteps, features)
        units : int
            Number of LSTM units
        dropout : float
            Dropout rate for regularization
        """
        model = Sequential([
            LSTM(units=units, input_shape=input_shape, name='LSTM_Layer'),
            Dropout(dropout, name='Dropout'),
            Dense(1, name='Output_Layer')
        ], name='Simple_LSTM')
        
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='mse',
            metrics=['mae']
        )
        
        return model
    
    def build_deep_lstm(self, input_shape, units=50, dropout=0.2):
        """
        Build a stacked (deep) LSTM model.
        """
        model = Sequential([
            LSTM(units=units, return_sequences=True, input_shape=input_shape, 
                 name='LSTM_Layer_1'),
            Dropout(dropout, name='Dropout_1'),
            LSTM(units=units, name='LSTM_Layer_2'),
            Dropout(dropout, name='Dropout_2'),
            Dense(1, name='Output_Layer')
        ], name='Deep_LSTM')
        
        model.compile(
            optimizer=Adam(learning_rate=0.001),
            loss='mse',
            metrics=['mae']
        )
        
        return model
    
    def train_model(self, model, X_train, y_train, X_val, y_val, 
                   epochs=100, batch_size=32, verbose=1):
        """
        Train LSTM model with callbacks.
        """
        # Callbacks
        early_stop = EarlyStopping(
            monitor='val_loss',
            patience=15,
            restore_best_weights=True,
            verbose=1
        )
        
        reduce_lr = ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=10,
            min_lr=0.00001,
            verbose=1
        )
        
        # Train
        history = model.fit(
            X_train, y_train,
            validation_data=(X_val, y_val),
            epochs=epochs,
            batch_size=batch_size,
            callbacks=[early_stop, reduce_lr],
            verbose=verbose
        )
        
        return history
    
    def plot_training_history(self, history, model_name, figsize=(14, 5)):
        """
        Plot training and validation loss curves.
        """
        fig, axes = plt.subplots(1, 2, figsize=figsize)
        
        # Plot 1: Loss
        axes[0].plot(history.history['loss'], label='Training Loss', linewidth=2)
        axes[0].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
        axes[0].set_title(f'{model_name}: Loss Curves', fontsize=12, fontweight='bold')
        axes[0].set_xlabel('Epoch')
        axes[0].set_ylabel('Loss (MSE)')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Plot 2: MAE
        axes[1].plot(history.history['mae'], label='Training MAE', linewidth=2)
        axes[1].plot(history.history['val_mae'], label='Validation MAE', linewidth=2)
        axes[1].set_title(f'{model_name}: MAE Curves', fontsize=12, fontweight='bold')
        axes[1].set_xlabel('Epoch')
        axes[1].set_ylabel('MAE')
        axes[1].legend()
        axes[1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        return fig
    
    def evaluate_model(self, model, X_test, y_test, scaler, model_name):
        """
        Evaluate model and calculate metrics.
        """
        # Predict
        y_pred_scaled = model.predict(X_test, verbose=0)
        
        # Inverse transform to original scale
        y_pred = scaler.inverse_transform(y_pred_scaled)
        y_true = scaler.inverse_transform(y_test.reshape(-1, 1))
        
        # Calculate metrics
        mae = mean_absolute_error(y_true, y_pred)
        rmse = np.sqrt(mean_squared_error(y_true, y_pred))
        mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
        r2 = r2_score(y_true, y_pred)
        
        metrics = {
            'MAE': mae,
            'RMSE': rmse,
            'MAPE': mape,
            'R2': r2
        }
        
        self.predictions[model_name] = y_pred
        
        return y_pred, y_true, metrics

print("\n" + "="*70)
print("STEP 4: BUILDING AND TRAINING LSTM MODELS")
print("="*70)

# Initialize
lstm_models = LSTMModels()

# Extract data
X_train = data_dict['X_train']
y_train = data_dict['y_train']
X_val = data_dict['X_val']
y_val = data_dict['y_val']
X_test = data_dict['X_test']
y_test = data_dict['y_test']
scaler = data_dict['scaler']

input_shape = (X_train.shape[1], X_train.shape[2])  # (timesteps, features)

print(f"\nüìä Input Shape: {input_shape}")
print(f"   ‚Ä¢ Timesteps (look-back): {input_shape[0]}")
print(f"   ‚Ä¢ Features: {input_shape[1]}")

# ============= Model 1: Simple LSTM =============
print("\n" + "="*70)
print("MODEL 1: SIMPLE LSTM")
print("="*70)

print("\nüèóÔ∏è  Building Simple LSTM...")
simple_lstm = lstm_models.build_simple_lstm(input_shape, units=50, dropout=0.2)

print("\nüìã Model Architecture:")
simple_lstm.summary()

print("\nüéì Training Simple LSTM...")
print("-" * 70)
history_simple = lstm_models.train_model(
    simple_lstm, X_train, y_train, X_val, y_val,
    epochs=100, batch_size=32, verbose=1
)
lstm_models.histories['Simple_LSTM'] = history_simple

print("\n‚úÖ Training Complete!")
print(f"   ‚Ä¢ Total epochs: {len(history_simple.history['loss'])}")
print(f"   ‚Ä¢ Final training loss: {history_simple.history['loss'][-1]:.6f}")
print(f"   ‚Ä¢ Final validation loss: {history_simple.history['val_loss'][-1]:.6f}")

# Plot training history
print("\nüé® Plotting training history...")
fig = lstm_models.plot_training_history(history_simple, 'Simple LSTM')
plt.show()

# Evaluate on test set
print("\nüìä Evaluating on Test Set...")
pred_simple, y_true, metrics_simple = lstm_models.evaluate_model(
    simple_lstm, X_test, y_test, scaler, 'Simple_LSTM'
)

print("\nüéØ Test Set Performance:")
print("-" * 70)
print(f"   ‚Ä¢ MAE:  ${metrics_simple['MAE']:.2f}")
print(f"   ‚Ä¢ RMSE: ${metrics_simple['RMSE']:.2f}")
print(f"   ‚Ä¢ MAPE: {metrics_simple['MAPE']:.2f}%")
print(f"   ‚Ä¢ R¬≤:   {metrics_simple['R2']:.4f}")

# ============= Model 2: Deep LSTM =============
print("\n" + "="*70)
print("MODEL 2: DEEP LSTM (STACKED)")
print("="*70)

print("\nüèóÔ∏è  Building Deep LSTM...")
deep_lstm = lstm_models.build_deep_lstm(input_shape, units=50, dropout=0.2)

print("\nüìã Model Architecture:")
deep_lstm.summary()

print("\nüéì Training Deep LSTM...")
print("-" * 70)
history_deep = lstm_models.train_model(
    deep_lstm, X_train, y_train, X_val, y_val,
    epochs=100, batch_size=32, verbose=1
)
lstm_models.histories['Deep_LSTM'] = history_deep

print("\n‚úÖ Training Complete!")
print(f"   ‚Ä¢ Total epochs: {len(history_deep.history['loss'])}")
print(f"   ‚Ä¢ Final training loss: {history_deep.history['loss'][-1]:.6f}")
print(f"   ‚Ä¢ Final validation loss: {history_deep.history['val_loss'][-1]:.6f}")

# Plot training history
print("\nüé® Plotting training history...")
fig = lstm_models.plot_training_history(history_deep, 'Deep LSTM')
plt.show()

# Evaluate on test set
print("\nüìä Evaluating on Test Set...")
pred_deep, _, metrics_deep = lstm_models.evaluate_model(
    deep_lstm, X_test, y_test, scaler, 'Deep_LSTM'
)

print("\nüéØ Test Set Performance:")
print("-" * 70)
print(f"   ‚Ä¢ MAE:  ${metrics_deep['MAE']:.2f}")
print(f"   ‚Ä¢ RMSE: ${metrics_deep['RMSE']:.2f}")
print(f"   ‚Ä¢ MAPE: {metrics_deep['MAPE']:.2f}%")
print(f"   ‚Ä¢ R¬≤:   {metrics_deep['R2']:.4f}")

# Store models
lstm_models.models['Simple_LSTM'] = simple_lstm
lstm_models.models['Deep_LSTM'] = deep_lstm

print("\n" + "="*70)
print("‚úÖ ALL LSTM MODELS TRAINED AND EVALUATED!")
print("="*70)

---

# Part 5: Performance Comparison & Analysis

## üéØ The Million Dollar Question:

**"Is the LSTM worth the complexity?"**

We need to compare:
1. Baseline models (simple, interpretable)
2. LSTM models (complex, powerful)

### What to Look For:

‚úÖ **LSTM adds value if:**
- Significantly lower RMSE/MAE than baselines
- Higher R¬≤ score
- Captures patterns baselines miss
- Performance improvement justifies training time

‚ùå **LSTM may not be worth it if:**
- Only marginally better than naive forecast
- Overfits (great on train, poor on test)
- Too computationally expensive
- Can't be deployed in production

### Key Metrics for Comparison:

1. **Absolute Performance**: Which model has lowest error?
2. **Relative Improvement**: How much better is LSTM?
3. **Consistency**: Does LSTM beat baselines on all metrics?
4. **Practical Significance**: Is the improvement meaningful?

In [None]:
print("\n" + "="*70)
print("STEP 5: COMPREHENSIVE PERFORMANCE COMPARISON")
print("="*70)

# Combine all metrics
all_metrics = baseline.metrics.copy()
all_metrics['Simple_LSTM'] = metrics_simple
all_metrics['Deep_LSTM'] = metrics_deep

# Create comprehensive comparison DataFrame
comparison_df = pd.DataFrame(all_metrics).T
comparison_df = comparison_df[['MAE', 'RMSE', 'MAPE', 'R2']]  # Reorder columns

print("\nüìä COMPLETE PERFORMANCE COMPARISON")
print("="*70)
display(comparison_df.round(4))

# Find best model for each metric
print("\nüèÜ BEST PERFORMERS BY METRIC")
print("="*70)

best_mae = comparison_df['MAE'].idxmin()
best_rmse = comparison_df['RMSE'].idxmin()
best_mape = comparison_df['MAPE'].idxmin()
best_r2 = comparison_df['R2'].idxmax()

print(f"   ‚Ä¢ Lowest MAE:  {best_mae} (${comparison_df.loc[best_mae, 'MAE']:.2f})")
print(f"   ‚Ä¢ Lowest RMSE: {best_rmse} (${comparison_df.loc[best_rmse, 'RMSE']:.2f})")
print(f"   ‚Ä¢ Lowest MAPE: {best_mape} ({comparison_df.loc[best_mape, 'MAPE']:.2f}%)")
print(f"   ‚Ä¢ Highest R¬≤:  {best_r2} ({comparison_df.loc[best_r2, 'R2']:.4f})")

# Calculate improvement over best baseline
print("\nüìà LSTM IMPROVEMENT OVER BEST BASELINE")
print("="*70)

baseline_models = ['Naive', 'MovingAverage', 'LinearRegression']
best_baseline_rmse = comparison_df.loc[baseline_models, 'RMSE'].min()
best_baseline_name = comparison_df.loc[baseline_models, 'RMSE'].idxmin()

simple_lstm_improvement = ((best_baseline_rmse - metrics_simple['RMSE']) / best_baseline_rmse) * 100
deep_lstm_improvement = ((best_baseline_rmse - metrics_deep['RMSE']) / best_baseline_rmse) * 100

print(f"\nBest Baseline: {best_baseline_name} (RMSE: ${best_baseline_rmse:.2f})")
print(f"\nSimple LSTM:")
print(f"   ‚Ä¢ RMSE: ${metrics_simple['RMSE']:.2f}")
print(f"   ‚Ä¢ Improvement: {simple_lstm_improvement:+.2f}%")

print(f"\nDeep LSTM:")
print(f"   ‚Ä¢ RMSE: ${metrics_deep['RMSE']:.2f}")
print(f"   ‚Ä¢ Improvement: {deep_lstm_improvement:+.2f}%")

# Visualize comparison
print("\nüé® Creating comprehensive visualizations...")

fig = plt.figure(figsize=(16, 10))
gs = fig.add_gridspec(3, 2, hspace=0.3, wspace=0.3)

# Plot 1: RMSE Comparison (Bar Chart)
ax1 = fig.add_subplot(gs[0, 0])
colors = ['steelblue', 'steelblue', 'steelblue', 'coral', 'darkred']
bars = ax1.bar(comparison_df.index, comparison_df['RMSE'], color=colors, alpha=0.7)
ax1.set_title('RMSE Comparison (Lower is Better)', fontsize=12, fontweight='bold')
ax1.set_ylabel('RMSE ($)')
ax1.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=45, ha='right')

# Highlight best
best_idx = comparison_df['RMSE'].argmin()
bars[best_idx].set_edgecolor('gold')
bars[best_idx].set_linewidth(3)

# Plot 2: MAE Comparison (Bar Chart)
ax2 = fig.add_subplot(gs[0, 1])
bars = ax2.bar(comparison_df.index, comparison_df['MAE'], color=colors, alpha=0.7)
ax2.set_title('MAE Comparison (Lower is Better)', fontsize=12, fontweight='bold')
ax2.set_ylabel('MAE ($)')
ax2.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=45, ha='right')

# Plot 3: R¬≤ Comparison (Bar Chart)
ax3 = fig.add_subplot(gs[1, 0])
bars = ax3.bar(comparison_df.index, comparison_df['R2'], color=colors, alpha=0.7)
ax3.set_title('R¬≤ Score Comparison (Higher is Better)', fontsize=12, fontweight='bold')
ax3.set_ylabel('R¬≤ Score')
ax3.axhline(y=0, color='red', linestyle='--', alpha=0.5, linewidth=1)
ax3.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=45, ha='right')

# Plot 4: MAPE Comparison (Bar Chart)
ax4 = fig.add_subplot(gs[1, 1])
bars = ax4.bar(comparison_df.index, comparison_df['MAPE'], color=colors, alpha=0.7)
ax4.set_title('MAPE Comparison (Lower is Better)', fontsize=12, fontweight='bold')
ax4.set_ylabel('MAPE (%)')
ax4.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=45, ha='right')

# Plot 5: Predictions vs Actual (First 100 points)
ax5 = fig.add_subplot(gs[2, :])
plot_range = slice(0, 100)

ax5.plot(y_true[plot_range], label='Actual', linewidth=2.5, color='black', alpha=0.8)
ax5.plot(baseline_results['Naive'][plot_range], label='Naive', linewidth=1.5, alpha=0.6)
ax5.plot(pred_simple[plot_range], label='Simple LSTM', linewidth=1.5, alpha=0.7)
ax5.plot(pred_deep[plot_range], label='Deep LSTM', linewidth=1.5, alpha=0.7)

ax5.set_title('Predictions vs Actual (First 100 Test Points)', fontsize=12, fontweight='bold')
ax5.set_xlabel('Time Step')
ax5.set_ylabel('Price ($)')
ax5.legend(loc='best', fontsize=10)
ax5.grid(True, alpha=0.3)

plt.suptitle('Comprehensive Model Performance Comparison', 
             fontsize=16, fontweight='bold', y=0.995)

plt.show()

# Final verdict
print("\n" + "="*70)
print("üéì FINAL VERDICT")
print("="*70)

if simple_lstm_improvement > 5 or deep_lstm_improvement > 5:
    print("\n‚úÖ LSTM ADDS SIGNIFICANT VALUE")
    print("-" * 70)
    print(f"   ‚Ä¢ LSTM models show >5% improvement over best baseline")
    print(f"   ‚Ä¢ Deep learning is justified for this problem")
    print(f"   ‚Ä¢ Recommendation: Deploy LSTM in production")
elif simple_lstm_improvement > 0 or deep_lstm_improvement > 0:
    print("\n‚ö†Ô∏è  LSTM ADDS MODERATE VALUE")
    print("-" * 70)
    print(f"   ‚Ä¢ LSTM models show modest improvement")
    print(f"   ‚Ä¢ Consider trade-off: complexity vs. performance")
    print(f"   ‚Ä¢ Recommendation: Use LSTM if resources permit")
else:
    print("\n‚ùå LSTM DOES NOT ADD VALUE")
    print("-" * 70)
    print(f"   ‚Ä¢ LSTM doesn't outperform simple baselines")
    print(f"   ‚Ä¢ Stick with simpler models (Naive, Linear Regression)")
    print(f"   ‚Ä¢ Recommendation: More data or better features needed")

print("\nüí° Key Insights:")
print("="*70)
print("   1. Always establish strong baselines before celebrating LSTM results")
print("   2. Small improvements may not justify deployment complexity")
print("   3. Deep LSTM vs Simple LSTM trade-off: performance vs. complexity")
print("   4. Consider your business context: How valuable is each % improvement?")
print("="*70)

---

# üéØ Week 8 Summary: What We Learned

## Key Concepts Covered:

### 1. Sequential Data & RNNs
- ‚úÖ Financial data has temporal dependencies
- ‚úÖ Traditional ML ignores sequence order
- ‚úÖ RNNs maintain "memory" through hidden states
- ‚úÖ Vanishing gradient problem limits basic RNNs

### 2. LSTM Architecture
- ‚úÖ Gates control information flow
- ‚úÖ Cell state acts as long-term memory
- ‚úÖ Solves vanishing gradient problem
- ‚úÖ Can capture patterns across long time horizons

### 3. Practical Implementation
- ‚úÖ Sequence creation with look-back windows
- ‚úÖ Proper time series train/val/test split
- ‚úÖ Scaling data without leakage
- ‚úÖ Building models with Keras/TensorFlow

### 4. Model Evaluation
- ‚úÖ Baseline models are ESSENTIAL
- ‚úÖ Multiple metrics provide different insights
- ‚úÖ Performance improvement must justify complexity
- ‚úÖ Visualizations help understand model behavior

---

## üöÄ Next Steps & Extensions

**Try these on your own:**

1. **Add More Features**: 
   - Include volume, volatility, moving averages
   - Use multiple cryptocurrencies (multivariate LSTM)

2. **Experiment with Architecture**:
   - Try different numbers of LSTM units
   - Adjust dropout rates
   - Add more layers or Dense layers

3. **Alternative Models**:
   - Try GRU (simpler alternative to LSTM)
   - Explore bidirectional LSTMs
   - Test Transformer models (attention mechanism)

4. **Production Considerations**:
   - Model saving and loading
   - Real-time prediction pipeline
   - Monitoring model drift
   - Retraining strategies

5. **Advanced Topics**:
   - Sequence-to-sequence models
   - Multi-step ahead forecasting
   - Uncertainty quantification
   - Attention mechanisms

---

## üíº Real-World Applications

**Where LSTMs excel in Finance:**

1. **Trading Strategies**: Generate buy/sell signals from patterns
2. **Risk Management**: Forecast volatility and Value-at-Risk
3. **Portfolio Optimization**: Dynamic allocation based on predicted returns
4. **Sentiment Analysis**: Process sequential text data (news, tweets)
5. **Fraud Detection**: Identify unusual transaction sequences
6. **Credit Scoring**: Analyze payment history patterns

---

## üéì Final Thoughts

**Remember:**
- üß† LSTMs are powerful but not magic
- ‚öñÔ∏è  Always compare against simple baselines
- üìä More data usually helps more than complex architecture
- üîç Understand WHY your model works, not just THAT it works
- üí° In finance, consistent small improvements compound over time

**The real skill** is knowing when to use deep learning and when simpler methods suffice!

---

## üìö Further Reading

- Original LSTM Paper: Hochreiter & Schmidhuber (1997)
- "Understanding LSTM Networks" by Chris Olah
- "Deep Learning" by Goodfellow, Bengio, Courville
- "Advances in Financial Machine Learning" by Marcos L√≥pez de Prado

---

# üéâ Congratulations!

You've completed Week 8 and learned how to apply state-of-the-art deep learning to financial time series!

**Keep practicing, keep learning, and keep building! üöÄ**

In [None]:
# Optional: Save your trained models
print("\nüíæ Saving trained models...")
print("="*70)

# Save models
lstm_models.models['Simple_LSTM'].save('simple_lstm_model.h5')
lstm_models.models['Deep_LSTM'].save('deep_lstm_model.h5')

print("‚úÖ Models saved successfully!")
print("   ‚Ä¢ simple_lstm_model.h5")
print("   ‚Ä¢ deep_lstm_model.h5")
print("\nüí° You can load these models later with: model = keras.models.load_model('model_name.h5')")
print("="*70)

print("\nüéä WEEK 8 COMPLETE! üéä")
print("\n" + "="*70)
print("Thank you for learning with us!")
print("Next week: Advanced Topics in Deep Learning for Finance")
print("="*70)