# Energy Market Forecasting Framework

## Interpretable Elastic Net Models for Gas, Power, and Carbon Markets

**Framework Overview:**
1. Define 3 targets (Natural Gas, Electricity, Carbon) with log-return transformation
2. Build shared feature engine with supply/demand blocks
3. Add target-specific features
4. Train interpretable Elastic Net models with lags
5. Walk-forward validation for robust performance assessment
6. Regime-ready structure for future enhancements

**Target Transformation:**
$$y_t = \Delta \log(P_t) = \log(P_t) - \log(P_{t-1})$$

This reduces shared trend problems and makes coefficients interpretable as "impact on percent change".

## Step 1: Import Libraries and Load Data

Import required modules and load raw energy market data.

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Import custom modules
from data_extractor import DataExtractor
from feature_engineering import FeatureEngineer
from target_transforms import TargetTransformer, prepare_all_targets
from model_framework import EnergyMarketModel, WalkForwardValidator
import config

# Set display options
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_rows', 100)
sns.set_style('whitegrid')

print("Libraries imported successfully!")
print(f"Working directory: {os.getcwd()}")

In [None]:
# Load and process raw data
extractor = DataExtractor("raw_data_2021_2024.csv")
df_raw = extractor.load_data()

# Handle missing data with interpolation
df = extractor.handle_missing_data(method='interpolate')

# Display data summary
print(f"\nData shape: {df.shape}")
print(f"\nFirst few rows:")
print(df.head())

## Step 2: Define and Transform Targets

**Target Definitions:**
- **Natural Gas (TTF):** `BM_TTF_M1_CLOSE_EUR_MWH`
- **Electricity (Germany Power):** `BM_GERMANY_POWER_M1_CLOSE_EUR_MWH`  
- **Carbon (EUA):** `BM_EUA_CO2_CAL1_PRICE_EUR_TON`

**Transformation:** Convert prices to log-returns for:
- Removing shared trends
- Making coefficients interpretable (% change impact)
- Improving model robustness

In [None]:
# Prepare all three targets with transformations
targets_data = prepare_all_targets(df)

# Extract individual components for each target
gas_data = targets_data['gas']
power_data = targets_data['power']
carbon_data = targets_data['carbon']

print("\n" + "="*70)
print("TARGETS PREPARED")
print("="*70)
for target_name, target_info in targets_data.items():
    print(f"\n{target_name.upper()}:")
    print(f"  Return column: {target_info['return_column']}")
    print(f"  Non-null returns: {target_info['data'][target_info['return_column']].notna().sum()}")

In [None]:
# Visualize targets (prices and returns)
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

targets = ['gas', 'power', 'carbon']
titles = ['Natural Gas (TTF)', 'Electricity (Germany)', 'Carbon (EUA)']

for idx, (target, title) in enumerate(zip(targets, titles)):
    # Plot prices
    targets_data[target]['prices'].plot(ax=axes[0, idx], title=f"{title} - Prices")
    axes[0, idx].set_ylabel('Price')
    axes[0, idx].grid(True)
    
    # Plot returns
    targets_data[target]['data'][targets_data[target]['return_column']].plot(
        ax=axes[1, idx], title=f"{title} - Log Returns"
    )
    axes[1, idx].set_ylabel('Log Return')
    axes[1, idx].axhline(y=0, color='r', linestyle='--', alpha=0.3)
    axes[1, idx].grid(True)

plt.tight_layout()
plt.show()

print("Target visualizations complete!")

## Step 3: Feature Engineering

Build features in **4 blocks** + target-specific additions:

**1. Demand Block** (seasonal)
- Temperature anomalies (actual - normal)
- Load (Germany, France)
- Gas consumption anomalies

**2. Supply Block**
- Norway outages, Russian flows, LNG imports
- EU storage levels
- Renewables and nuclear generation

**3. Market Positioning Block**
- COT net positions (TTF, EUA)
- Options implied volatility
- Fuel switch indicators

**4. Cross-Commodity Block**
- Coal, Brent, JKM, Henry Hub
- Price returns for cross-market dynamics

In [None]:
# Build features for Natural Gas (TTF)
print("="*70)
print("BUILDING FEATURES FOR NATURAL GAS")
print("="*70)

fe_gas = FeatureEngineer(df)
X_gas = fe_gas.build_all_features(target='gas', add_lags_to_features=True)

print(f"\n Gas features shape: {X_gas.shape}")
print(f"Feature names (first 20): {X_gas.columns[:20].tolist()}")

In [None]:
# Build features for Electricity (Power)
print("\n" + "="*70)
print("BUILDING FEATURES FOR ELECTRICITY")
print("="*70)

fe_power = FeatureEngineer(df)
X_power = fe_power.build_all_features(target='power', add_lags_to_features=True)

print(f"\nPower features shape: {X_power.shape}")

In [None]:
# Build features for Carbon (EUA)
print("\n" + "="*70)
print("BUILDING FEATURES FOR CARBON")
print("="*70)

fe_carbon = FeatureEngineer(df)
X_carbon = fe_carbon.build_all_features(target='carbon', add_lags_to_features=True)

print(f"\nCarbon features shape: {X_carbon.shape}")

## Step 4: Train Elastic Net Models with Lags

**Model specification:**
$$y_t = \alpha + \sum_{i=1}^{p} \phi_i y_{t-i} + X_t^T \beta + \varepsilon_t$$

**Elastic Net regularization:**
- Balances L1 (Lasso) and L2 (Ridge) penalties
- Produces sparse, interpretable coefficients
- Handles correlated features well
- Uses cross-validation to select optimal alpha

**Lags included:**
- Target lags: [1, 2, 3, 5, 10] days
- Feature lags: [1, 2, 5] days for key drivers

In [None]:
# Initialize models for each target
model_gas = EnergyMarketModel(target_name='gas', use_cv=True)
model_power = EnergyMarketModel(target_name='power', use_cv=True)
model_carbon = EnergyMarketModel(target_name='carbon', use_cv=True)

print("Models initialized!")
print(f"\nModel parameters:")
print(f"  L1 ratio: {config.MODEL_PARAMS['elastic_net']['l1_ratio']}")
print(f"  CV folds: {config.MODEL_PARAMS['elastic_net']['cv_folds']}")
print(f"  Target lags: {config.MODEL_PARAMS['lags']['target_lags']}")
print(f"  Feature lags: {config.MODEL_PARAMS['lags']['feature_lags']}")

## Step 5: Walk-Forward Validation

**Time series validation approach:**
- **Initial train size:** 252 days (~1 year)
- **Step size:** 21 days (~1 month)
- **Method:** Rolling or expanding window
- **Horizon:** 1-step ahead forecast

This ensures models are tested on truly out-of-sample data and prevents look-ahead bias.

In [None]:
# Validate Natural Gas model
print("="*70)
print("VALIDATING NATURAL GAS MODEL")
print("="*70)

validator_gas = WalkForwardValidator(
    initial_train_size=252,
    step_size=21,
    horizon=1
)

y_gas = gas_data['data'][gas_data['return_column']]
results_gas = validator_gas.validate(model_gas, X_gas, y_gas, expanding=False)

# Display results
print("\nValidation Results Summary:")
print(results_gas[['fold', 'train_size', 'test_size', 'rmse', 'mae', 'r2']].head(10))

In [None]:
# Validate Electricity model
print("\n" + "="*70)
print("VALIDATING ELECTRICITY MODEL")
print("="*70)

validator_power = WalkForwardValidator(
    initial_train_size=252,
    step_size=21,
    horizon=1
)

y_power = power_data['data'][power_data['return_column']]
results_power = validator_power.validate(model_power, X_power, y_power, expanding=False)

print("\nValidation Results Summary:")
print(results_power[['fold', 'train_size', 'test_size', 'rmse', 'mae', 'r2']].head(10))

In [None]:
# Validate Carbon model
print("\n" + "="*70)
print("VALIDATING CARBON MODEL")
print("="*70)

validator_carbon = WalkForwardValidator(
    initial_train_size=252,
    step_size=21,
    horizon=1
)

y_carbon = carbon_data['data'][carbon_data['return_column']]
results_carbon = validator_carbon.validate(model_carbon, X_carbon, y_carbon, expanding=False)

print("\nValidation Results Summary:")
print(results_carbon[['fold', 'train_size', 'test_size', 'rmse', 'mae', 'r2']].head(10))

## Step 6: Model Interpretation - Top Features

Analyze which features have the strongest impact on each target market.

In [None]:
# Get top features for each model
print("="*70)
print("TOP 15 FEATURES BY ABSOLUTE COEFFICIENT")
print("="*70)

print("\n--- NATURAL GAS (TTF) ---")
print(model_gas.get_top_features(15))

print("\n--- ELECTRICITY (GERMANY POWER) ---")
print(model_power.get_top_features(15))

print("\n--- CARBON (EUA) ---")
print(model_carbon.get_top_features(15))

In [None]:
# Visualize coefficient importance
fig, axes = plt.subplots(1, 3, figsize=(20, 6))

models = [model_gas, model_power, model_carbon]
titles = ['Natural Gas', 'Electricity', 'Carbon']

for idx, (model, title) in enumerate(zip(models, titles)):
    top_features = model.get_top_features(15).iloc[1:]  # Exclude intercept
    
    axes[idx].barh(range(len(top_features)), top_features['abs_coefficient'].values)
    axes[idx].set_yticks(range(len(top_features)))
    axes[idx].set_yticklabels(top_features['feature'].values, fontsize=8)
    axes[idx].set_xlabel('Absolute Coefficient')
    axes[idx].set_title(f'{title} - Top 15 Features')
    axes[idx].invert_yaxis()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 7: Compare Performance Across Markets

Compare validation metrics across all three markets.

In [None]:
# Compare average metrics across markets
comparison = pd.DataFrame({
    'Market': ['Natural Gas', 'Electricity', 'Carbon'],
    'Avg RMSE': [
        results_gas['rmse'].mean(),
        results_power['rmse'].mean(),
        results_carbon['rmse'].mean()
    ],
    'Avg MAE': [
        results_gas['mae'].mean(),
        results_power['mae'].mean(),
        results_carbon['mae'].mean()
    ],
    'Avg R²': [
        results_gas['r2'].mean(),
        results_power['r2'].mean(),
        results_carbon['r2'].mean()
    ],
    'N Folds': [
        len(results_gas),
        len(results_power),
        len(results_carbon)
    ]
})

print("="*70)
print("PERFORMANCE COMPARISON ACROSS MARKETS")
print("="*70)
print(comparison.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

metrics = ['Avg RMSE', 'Avg MAE', 'Avg R²']
for idx, metric in enumerate(metrics):
    axes[idx].bar(comparison['Market'], comparison[metric])
    axes[idx].set_title(metric)
    axes[idx].set_ylabel(metric)
    axes[idx].tick_params(axis='x', rotation=45)
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Step 8: Future Extensions - Regime Models

**Framework is regime-ready:**

The current setup can be extended with regime models:

**Hard Regimes (Two Models):**
$$y_t = \alpha_{s_t} + X_t^T \beta_{s_t} + \varepsilon_t$$

Fit separate coefficients per regime, providing interpretable "beta tables" for each market state.

**Soft Regimes (More Stable):**
Weight predictions by regime probabilities for smoother transitions.

**Configuration available in `config.py`:**
- Set `REGIME_PARAMS['enabled'] = True`
- Specify regime features and number of states
- Choose hard vs soft regime approach

## Summary and Key Takeaways

**Framework Highlights:**
1. ✅ **3 Targets Defined:** Natural Gas (TTF), Electricity (Germany), Carbon (EUA)
2. ✅ **Log-Return Transform:** Removes trends, interpretable coefficients
3. ✅ **4 Feature Blocks:** Demand, Supply, Market Positioning, Cross-Commodity
4. ✅ **Target-Specific Features:** Each market gets custom must-have drivers
5. ✅ **Elastic Net with Lags:** Interpretable, regularized, handles dynamics
6. ✅ **Walk-Forward Validation:** Robust out-of-sample testing
7. ✅ **Regime-Ready:** Structure supports future regime modeling

**Next Steps:**
- Fine-tune hyperparameters (alpha, l1_ratio, lag selection)
- Add more sophisticated anomaly detection
- Implement regime detection and switching models
- Expand to multi-step ahead forecasts
- Add trading strategy backtests based on forecasts