<a href="https://colab.research.google.com/github/wrymp/Final-Project-Walmart-Recruiting---Store-Sales-Forecasting/blob/main/model_experiment_PathTST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PatchTST Implementation for Walmart Sales Forecasting

This notebook implements PatchTST (Patch Time Series Transformer) for Walmart sales forecasting following the exact pipeline structure from NBEATS experiments.

In [7]:
# from google.colab import drive
# !pip install wandb -q
# !pip install kaggle -q

# drive.mount('/content/drive')
!mkdir -p ~/.kaggle
!cp /content/drive/MyDrive/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [8]:
# Download and extract Walmart dataset
# ! kaggle competitions download -c walmart-recruiting-store-sales-forecasting
# ! unzip /content/walmart-recruiting-store-sales-forecasting.zip
# ! unzip /content/train.csv.zip
# ! unzip /content/test.csv.zip
# ! unzip /content/features.csv.zip
# ! unzip /content/stores.csv.zip
# print("✓ Data extraction completed")

Downloading walmart-recruiting-store-sales-forecasting.zip to /content
  0% 0.00/2.70M [00:00<?, ?B/s]
100% 2.70M/2.70M [00:00<00:00, 875MB/s]
Archive:  /content/walmart-recruiting-store-sales-forecasting.zip
  inflating: features.csv.zip        
  inflating: sampleSubmission.csv.zip  
  inflating: stores.csv              
  inflating: test.csv.zip            
  inflating: train.csv.zip           
Archive:  /content/train.csv.zip
  inflating: train.csv               
Archive:  /content/test.csv.zip
  inflating: test.csv                
Archive:  /content/features.csv.zip
  inflating: features.csv            
unzip:  cannot find or open /content/stores.csv.zip, /content/stores.csv.zip.zip or /content/stores.csv.zip.ZIP.
✓ Data extraction completed


# Essential Imports and Setup

In [9]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import wandb
import warnings
from datetime import datetime, timedelta
import math
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import TimeSeriesSplit
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')

# Set random seeds
torch.manual_seed(42)
np.random.seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB")

Using device: cuda
PyTorch version: 2.6.0+cu124
CUDA device: Tesla T4
CUDA memory: 14.7GB


In [10]:
# Initialize Wandb project
wandb.login()
try:
    wandb.init(
        project="walmart-sales-forecasting",
        name="PatchTST_Initial_Setup",
        config={
            "model_type": "PatchTST",
            "framework": "PyTorch",
            "device": str(device),
            "random_seed": 42
        }
    )
    print("✓ Wandb initialized successfully!")
except Exception as e:
    print(f"⚠️ Wandb initialization failed: {e}")
    print("Continuing without wandb logging...")

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mqitiashvili13[0m ([33mdshan21-free-university-of-tbilisi-[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


✓ Wandb initialized successfully!


# Data Loading and Initial Exploration

In [11]:
# Load datasets
print("Loading Walmart datasets...")

train_df = pd.read_csv('/content/train.csv')
test_df = pd.read_csv('/content/test.csv')
stores_df = pd.read_csv('/content/stores.csv')
features_df = pd.read_csv('/content/features.csv')

print(f"Train data shape: {train_df.shape}")
print(f"Test data shape: {test_df.shape}")
print(f"Stores data shape: {stores_df.shape}")
print(f"Features data shape: {features_df.shape}")

# Convert date columns
train_df['Date'] = pd.to_datetime(train_df['Date'])
test_df['Date'] = pd.to_datetime(test_df['Date'])
features_df['Date'] = pd.to_datetime(features_df['Date'])

print("\n=== DATA SAMPLES ===")
print("\nTrain Data:")
print(train_df.head())
print("\nStores Data:")
print(stores_df.head())
print("\nFeatures Data:")
print(features_df.head())

Loading Walmart datasets...
Train data shape: (421570, 5)
Test data shape: (115064, 4)
Stores data shape: (45, 3)
Features data shape: (8190, 12)

=== DATA SAMPLES ===

Train Data:
   Store  Dept       Date  Weekly_Sales  IsHoliday
0      1     1 2010-02-05      24924.50      False
1      1     1 2010-02-12      46039.49       True
2      1     1 2010-02-19      41595.55      False
3      1     1 2010-02-26      19403.54      False
4      1     1 2010-03-05      21827.90      False

Stores Data:
   Store Type    Size
0      1    A  151315
1      2    A  202307
2      3    B   37392
3      4    A  205863
4      5    B   34875

Features Data:
   Store       Date  Temperature  Fuel_Price  MarkDown1  MarkDown2  MarkDown3  \
0      1 2010-02-05        42.31       2.572        NaN        NaN        NaN   
1      1 2010-02-12        38.51       2.548        NaN        NaN        NaN   
2      1 2010-02-19        39.93       2.514        NaN        NaN        NaN   
3      1 2010-02-26        

# Data Exploration and Analysis

In [12]:
print("\n=== DATA EXPLORATION ===")

# Basic statistics
print("\nTrain Data Info:")
print(f"Date range: {train_df['Date'].min()} to {train_df['Date'].max()}")
print(f"Unique stores: {train_df['Store'].nunique()}")
print(f"Unique departments: {train_df['Dept'].nunique()}")
print(f"Total store-dept combinations: {train_df[['Store', 'Dept']].drop_duplicates().shape[0]}")

# Sales statistics
print("\nSales Statistics:")
print(f"Mean weekly sales: ${train_df['Weekly_Sales'].mean():,.2f}")
print(f"Median weekly sales: ${train_df['Weekly_Sales'].median():,.2f}")
print(f"Min weekly sales: ${train_df['Weekly_Sales'].min():,.2f}")
print(f"Max weekly sales: ${train_df['Weekly_Sales'].max():,.2f}")

# Holiday impact
print("\nHoliday Impact:")
holiday_stats = train_df.groupby('IsHoliday')['Weekly_Sales'].agg(['mean', 'count'])
print(holiday_stats)

# Store types
print("\nStore Types:")
print(stores_df['Type'].value_counts())

# Missing values
print("\nMissing Values in Features:")
missing_pct = features_df.isnull().sum() / len(features_df) * 100
missing_pct = missing_pct[missing_pct > 0].sort_values(ascending=False)
print(missing_pct)

# Log to wandb
try:
    wandb.log({
        "data_exploration/train_samples": len(train_df),
        "data_exploration/test_samples": len(test_df),
        "data_exploration/unique_stores": train_df['Store'].nunique(),
        "data_exploration/unique_departments": train_df['Dept'].nunique(),
        "data_exploration/mean_weekly_sales": train_df['Weekly_Sales'].mean(),
        "data_exploration/median_weekly_sales": train_df['Weekly_Sales'].median(),
        "data_exploration/holiday_vs_regular_ratio": holiday_stats.loc[True, 'mean'] / holiday_stats.loc[False, 'mean']
    })
except:
    pass

print("\n✓ Exploration completed and logged to wandb")


=== DATA EXPLORATION ===

Train Data Info:
Date range: 2010-02-05 00:00:00 to 2012-10-26 00:00:00
Unique stores: 45
Unique departments: 81
Total store-dept combinations: 3331

Sales Statistics:
Mean weekly sales: $15,981.26
Median weekly sales: $7,612.03
Min weekly sales: $-4,988.94
Max weekly sales: $693,099.36

Holiday Impact:
                   mean   count
IsHoliday                      
False      15901.445069  391909
True       17035.823187   29661

Store Types:
Type
A    22
B    17
C     6
Name: count, dtype: int64

Missing Values in Features:
MarkDown2       64.334554
MarkDown4       57.704518
MarkDown3       55.885226
MarkDown1       50.769231
MarkDown5       50.549451
CPI              7.142857
Unemployment     7.142857
dtype: float64

✓ Exploration completed and logged to wandb


# Custom Transformers for PatchTST Pipeline

In [19]:
class PatchTimeSeriesDataProcessor(BaseEstimator, TransformerMixin):
    """Processes raw Walmart data into patch-based time-series format for PatchTST"""

    def __init__(self, lookback_window=52, forecast_horizon=1, patch_length=13, stride=13):
        self.lookback_window = lookback_window  # 52 weeks = 1 year
        self.forecast_horizon = forecast_horizon
        self.patch_length = patch_length  # 13 weeks = quarterly patches
        self.stride = stride  # Non-overlapping patches
        self.store_dept_combinations = None
        self.date_range = None
        self.n_patches = None

    def fit(self, X, y=None):
        """Learn the structure of the time series data"""
        # Get all unique store-department combinations
        self.store_dept_combinations = X[['Store', 'Dept']].drop_duplicates().sort_values(['Store', 'Dept'])

        # Get date range
        self.date_range = pd.date_range(
            start=X['Date'].min(),
            end=X['Date'].max(),
            freq='W'
        )

        # Calculate number of patches
        self.n_patches = (self.lookback_window - self.patch_length) // self.stride + 1

        print(f"PatchTimeSeriesDataProcessor fitted:")
        print(f"- Store-Dept combinations: {len(self.store_dept_combinations)}")
        print(f"- Date range: {len(self.date_range)} weeks")
        print(f"- Lookback window: {self.lookback_window} weeks")
        print(f"- Patch length: {self.patch_length} weeks")
        print(f"- Number of patches per sequence: {self.n_patches}")

        return self

    def transform(self, X):
        """Convert data to patch-based sequences"""
        if self.store_dept_combinations is None:
            raise ValueError("Must call fit() before transform()")

        sequences = []
        targets = []
        metadata = []

        print(f"Creating patch-based sequences for {len(self.store_dept_combinations)} store-dept combinations...")

        for idx, (_, row) in enumerate(self.store_dept_combinations.iterrows()):
            store_id = row['Store']
            dept_id = row['Dept']

            # Get time series for this store-department
            ts_data = X[(X['Store'] == store_id) & (X['Dept'] == dept_id)].copy()

            if len(ts_data) < self.lookback_window + self.forecast_horizon:
                continue

            # Sort by date
            ts_data = ts_data.sort_values('Date')

            # Create sequences with sliding window
            values = ts_data['Weekly_Sales'].values
            features = ts_data.drop(['Store', 'Dept', 'Date', 'Weekly_Sales'], axis=1, errors='ignore')

            # Additional features (if available)
            if len(features.columns) > 0:
                feature_values = features.values
            else:
                feature_values = np.zeros((len(values), 1))  # Dummy feature

            # Create sequences
            for i in range(len(values) - self.lookback_window - self.forecast_horizon + 1):
                # Sales sequence
                sales_seq = values[i:i + self.lookback_window]

                # Features sequence
                feat_seq = feature_values[i:i + self.lookback_window]

                # Combine sales and features
                combined_seq = np.column_stack([sales_seq.reshape(-1, 1), feat_seq])

                # Convert to patches
                patches = self._create_patches(combined_seq)

                # Target
                target = values[i + self.lookback_window:i + self.lookback_window + self.forecast_horizon]

                sequences.append(patches)
                targets.append(target[0] if self.forecast_horizon == 1 else target)
                metadata.append({
                    'store': store_id,
                    'dept': dept_id,
                    'start_date': ts_data.iloc[i]['Date'],
                    'end_date': ts_data.iloc[i + self.lookback_window - 1]['Date']
                })

            if idx % 100 == 0:
                print(f"Processed {idx}/{len(self.store_dept_combinations)} combinations")

        print(f"Created {len(sequences)} patch-based sequences")

        return {
            'sequences': np.array(sequences) if sequences else np.array([]),
            'targets': np.array(targets) if targets else np.array([]),
            'metadata': metadata,
            'n_patches': self.n_patches,
            'patch_length': self.patch_length,
            'n_features': sequences[0].shape[-1] if sequences else 0
        }

    def _create_patches(self, sequence):
        """Convert sequence to patches"""
        patches = []

        for i in range(0, len(sequence) - self.patch_length + 1, self.stride):
            if i + self.patch_length <= len(sequence):
                patch = sequence[i:i + self.patch_length]
                patches.append(patch)

        # Handle remaining data if sequence length is not divisible by patch_length
        remaining = len(sequence) % self.patch_length
        if remaining > 0 and len(patches) < self.n_patches:
            # Pad the last patch
            last_patch = sequence[-self.patch_length:]
            patches.append(last_patch)

        # Ensure we have exactly n_patches
        while len(patches) < self.n_patches:
            patches.append(np.zeros((self.patch_length, sequence.shape[1])))

        return np.array(patches[:self.n_patches])


class WalmartFeatureMerger(BaseEstimator, TransformerMixin):
    """Merges train/test data with stores and features data"""

    def __init__(self):
        self.stores_df = None
        self.features_df = None

    def fit(self, X, y=None, stores_df=None, features_df=None):
        """Store reference data for merging"""
        if stores_df is not None:
            self.stores_df = stores_df.copy()
        if features_df is not None:
            self.features_df = features_df.copy()
        return self

    def transform(self, X):
        """Merge with stores and features data"""
        result = X.copy()

        # Merge with stores data
        if self.stores_df is not None:
            result = result.merge(self.stores_df, on='Store', how='left')

        # Merge with features data
        if self.features_df is not None:
            result = result.merge(self.features_df, on=['Store', 'Date'], how='left')

        return result


class WalmartMissingValueHandler(BaseEstimator, TransformerMixin):
    """Handles missing values in Walmart dataset"""

    def __init__(self, strategy='forward_fill'):
        self.strategy = strategy
        self.fill_values = {}

    def fit(self, X, y=None):
        """Learn fill values for missing data"""
        # Store median values for numerical columns
        numerical_cols = X.select_dtypes(include=[np.number]).columns
        self.fill_values = X[numerical_cols].median().to_dict()
        return self

    def transform(self, X):
        """Fill missing values"""
        result = X.copy()

        # Forward fill for time series data
        if self.strategy == 'forward_fill':
            # Ensure 'Store', 'Dept', 'Date' are present before grouping
            if not all(col in result.columns for col in ['Store', 'Dept', 'Date']):
                 raise ValueError("Input DataFrame must contain 'Store', 'Dept', and 'Date' columns for forward fill.")
            result = result.sort_values(['Store', 'Dept', 'Date'])
            # Identify columns to fill (excluding grouping keys)
            cols_to_fill = [col for col in result.columns if col not in ['Store', 'Dept', 'Date']]
            for col in cols_to_fill:
                result[col] = result.groupby(['Store', 'Dept'])[col].fillna(method='ffill')


        # Fill remaining missing values with median
        for col, fill_val in self.fill_values.items():
            if col in result.columns:
                result[col] = result[col].fillna(fill_val)

        return result

print("✓ Custom transformers defined")

✓ Custom transformers defined


# PatchTST Model Implementation

In [20]:
class PositionalEncoding(nn.Module):
    """Positional encoding for transformer"""

    def __init__(self, d_model, max_len=5000):
        super().__init__()

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)

        self.register_buffer('pe', pe)

    def forward(self, x):
        return x + self.pe[:x.size(0), :]


class PatchEmbedding(nn.Module):
    """Patch embedding layer for PatchTST"""

    def __init__(self, patch_length, n_features, d_model, dropout=0.1):
        super().__init__()
        self.patch_length = patch_length
        self.n_features = n_features
        self.d_model = d_model

        # Linear projection for patches
        self.patch_projection = nn.Linear(patch_length * n_features, d_model)
        self.dropout = nn.Dropout(dropout)

        print(f"PatchEmbedding initialized:")
        print(f"- patch_length: {patch_length}")
        print(f"- n_features: {n_features}")
        print(f"- d_model: {d_model}")
        print(f"- input_dim: {patch_length * n_features}")

    def forward(self, x):
        # x shape: (batch_size, n_patches, patch_length, n_features)
        batch_size, n_patches, patch_length, n_features = x.shape

        # Flatten patches
        x = x.reshape(batch_size, n_patches, -1)  # (batch_size, n_patches, patch_length * n_features)

        # Project to d_model
        x = self.patch_projection(x)  # (batch_size, n_patches, d_model)

        return self.dropout(x)


class TransformerEncoderBlock(nn.Module):
    """Single transformer encoder block"""

    def __init__(self, d_model, n_heads, d_ff, dropout=0.1):
        super().__init__()

        self.attention = nn.MultiheadAttention(d_model, n_heads, dropout=dropout, batch_first=True)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)

        self.feed_forward = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model),
            nn.Dropout(dropout)
        )

    def forward(self, x):
        # Self-attention
        attn_output, _ = self.attention(x, x, x)
        x = self.norm1(x + attn_output)

        # Feed-forward
        ff_output = self.feed_forward(x)
        x = self.norm2(x + ff_output)

        return x


class PatchTSTModel(nn.Module):
    """PatchTST model for time series forecasting"""

    def __init__(self, patch_length, n_patches, n_features, forecast_horizon=1,
                 d_model=256, n_heads=8, n_layers=4, d_ff=512, dropout=0.1,
                 channel_independent=True):
        super().__init__()

        self.patch_length = patch_length
        self.n_patches = n_patches
        self.n_features = n_features
        self.forecast_horizon = forecast_horizon
        self.d_model = d_model
        self.channel_independent = channel_independent

        # Channel independence: separate model for each feature
        if channel_independent:
            self.patch_embeddings = nn.ModuleList([
                PatchEmbedding(patch_length, 1, d_model, dropout)
                for _ in range(n_features)
            ])
        else:
            self.patch_embedding = PatchEmbedding(patch_length, n_features, d_model, dropout)

        # Positional encoding
        self.positional_encoding = PositionalEncoding(d_model, max_len=n_patches)

        # Transformer encoder layers
        self.transformer_layers = nn.ModuleList([
            TransformerEncoderBlock(d_model, n_heads, d_ff, dropout)
            for _ in range(n_layers)
        ])

        # Output projection
        if channel_independent:
            # Only predict the first feature (sales)
            self.output_projection = nn.Linear(d_model, forecast_horizon)
        else:
            self.output_projection = nn.Linear(d_model * n_features, forecast_horizon)

        print(f"PatchTSTModel initialized:")
        print(f"- patch_length: {patch_length}")
        print(f"- n_patches: {n_patches}")
        print(f"- n_features: {n_features}")
        print(f"- d_model: {d_model}")
        print(f"- n_heads: {n_heads}")
        print(f"- n_layers: {n_layers}")
        print(f"- channel_independent: {channel_independent}")
        print(f"- forecast_horizon: {forecast_horizon}")

    def forward(self, x):
        # x shape: (batch_size, n_patches, patch_length, n_features)
        batch_size = x.shape[0]

        if self.channel_independent:
            # Process each feature independently
            channel_outputs = []

            for i in range(self.n_features):
                # Extract single feature
                x_channel = x[:, :, :, i:i+1]  # (batch_size, n_patches, patch_length, 1)

                # Patch embedding
                embedded = self.patch_embeddings[i](x_channel)  # (batch_size, n_patches, d_model)

                # Add positional encoding
                embedded = embedded.transpose(0, 1)  # (n_patches, batch_size, d_model)
                embedded = self.positional_encoding(embedded)
                embedded = embedded.transpose(0, 1)  # (batch_size, n_patches, d_model)

                # Transformer layers
                for layer in self.transformer_layers:
                    embedded = layer(embedded)

                # Global average pooling over patches
                pooled = embedded.mean(dim=1)  # (batch_size, d_model)

                channel_outputs.append(pooled)

            # For sales forecasting, we only use the first feature (sales)
            output = self.output_projection(channel_outputs[0])  # (batch_size, forecast_horizon)

        else:
            # Process all features together
            embedded = self.patch_embedding(x)  # (batch_size, n_patches, d_model)

            # Add positional encoding
            embedded = embedded.transpose(0, 1)  # (n_patches, batch_size, d_model)
            embedded = self.positional_encoding(embedded)
            embedded = embedded.transpose(0, 1)  # (batch_size, n_patches, d_model)

            # Transformer layers
            for layer in self.transformer_layers:
                embedded = layer(embedded)

            # Global average pooling
            pooled = embedded.mean(dim=1)  # (batch_size, d_model)

            # Output projection
            output = self.output_projection(pooled)  # (batch_size, forecast_horizon)

        return output.squeeze(-1) if self.forecast_horizon == 1 else output

print("✓ PatchTST model components defined")

✓ PatchTST model components defined


# Dataset and Training Setup

In [21]:
class PatchTSTDataset(Dataset):
    """Dataset class for PatchTST"""

    def __init__(self, sequences, targets):
        self.sequences = torch.FloatTensor(sequences)
        self.targets = torch.FloatTensor(targets)

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        return self.sequences[idx], self.targets[idx]


def create_dataloaders(sequences, targets, train_ratio=0.8, batch_size=32, random_seed=42):
    """Create train and validation dataloaders"""

    # Set random seed for reproducibility
    torch.manual_seed(random_seed)
    np.random.seed(random_seed)

    # Split data
    n_samples = len(sequences)
    n_train = int(n_samples * train_ratio)

    # Random shuffle
    indices = np.random.permutation(n_samples)
    train_indices = indices[:n_train]
    val_indices = indices[n_train:]

    # Create datasets
    train_dataset = PatchTSTDataset(sequences[train_indices], targets[train_indices])
    val_dataset = PatchTSTDataset(sequences[val_indices], targets[val_indices])

    # Create dataloaders
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

    print(f"Created dataloaders:")
    print(f"- Training samples: {len(train_dataset)}")
    print(f"- Validation samples: {len(val_dataset)}")
    print(f"- Batch size: {batch_size}")

    return train_loader, val_loader

print("✓ Dataset classes defined")

✓ Dataset classes defined


# Complete PatchTST Pipeline Implementation

In [36]:
def run_patchtst_experiment():
    """Run complete PatchTST experiment"""

    print("\n=== STARTING PATCHTST EXPERIMENT ===")

    # Initialize new wandb run for PatchTST training
    try:
        wandb.finish()  # Close previous run
        wandb.init(
            project="walmart-sales-forecasting",
            name="PatchTST_Training",
            config={
                "model_type": "PatchTST",
                "patch_length": 13,
                "stride": 13,
                "lookback_window": 52,
                "forecast_horizon": 1,
                "d_model": 256,
                "n_heads": 8,
                "n_layers": 4,
                "channel_independent": True,
                "batch_size": 64,
                "learning_rate": 0.001,
                "epochs": 10
            }
        )
    except:
        pass

    # Data preprocessing pipeline
    print("\n=== DATA PREPROCESSING ===")

    # Initialize transformers
    feature_merger = WalmartFeatureMerger()
    missing_handler = WalmartMissingValueHandler()
    ts_processor = PatchTimeSeriesDataProcessor(
        lookback_window=52,
        forecast_horizon=1,
        patch_length=13,
        stride=13
    )

    # Fit transformers
    feature_merger.fit(train_df, stores_df=stores_df, features_df=features_df)

    # Transform data
    merged_data = feature_merger.transform(train_df)
    print(f"After merging: {merged_data.shape}")

    missing_handler.fit(merged_data)
    cleaned_data = missing_handler.transform(merged_data)
    print(f"After cleaning: {cleaned_data.shape}")

    # Handle categorical features (one-hot encode 'Type')
    if 'Type' in cleaned_data.columns:
        cleaned_data = pd.get_dummies(cleaned_data, columns=['Type'], prefix='StoreType', dtype=float)
        print(f"After one-hot encoding 'Type': {cleaned_data.shape}")

    # Drop any remaining non-numerical columns except 'Date', 'Store', 'Dept'
    non_numerical_cols = cleaned_data.select_dtypes(exclude=np.number).columns.tolist()
    cols_to_drop = [col for col in non_numerical_cols if col not in ['Date', 'Store', 'Dept']]
    if cols_to_drop:
        cleaned_data.drop(columns=cols_to_drop, inplace=True)
        print(f"Dropped non-numerical columns: {cols_to_drop}")
        print(f"After dropping non-numerical: {cleaned_data.shape}")

    # Ensure all columns are numerical before processing, keeping Date, Store, Dept for time series processing
    # Start with the required ID columns
    cleaned_data_prepared_cols = ['Store', 'Dept', 'Date']

    # Add all numerical columns from cleaned_data, excluding the ID columns already added
    numerical_cols = cleaned_data.select_dtypes(include=np.number).columns.tolist()
    for col in numerical_cols:
        if col not in cleaned_data_prepared_cols:
            cleaned_data_prepared_cols.append(col)

    # Ensure all columns in cleaned_data_prepared_cols actually exist in cleaned_data
    cleaned_data_prepared_cols = [col for col in cleaned_data_prepared_cols if col in cleaned_data.columns]

    # Create the prepared DataFrame with the correct columns and order
    cleaned_data_prepared = cleaned_data[cleaned_data_prepared_cols].copy()


    # Drop any rows that might have introduced NaNs during feature engineering if necessary
    cleaned_data_prepared.dropna(inplace=True)
    print(f"After dropping rows with NaNs: {cleaned_data_prepared.shape}")

    # Check for duplicate column names before passing to ts_processor
    if cleaned_data_prepared.columns.duplicated().any():
        print("❌ Duplicate column names found in cleaned_data_prepared:")
        print(cleaned_data_prepared.columns[cleaned_data_prepared.columns.duplicated()])
        return None


    # Create time series sequences
    ts_processor.fit(cleaned_data_prepared)
    processed_data = ts_processor.transform(cleaned_data_prepared)

    sequences = processed_data['sequences']
    targets = processed_data['targets']
    n_patches = processed_data['n_patches']
    patch_length = processed_data['patch_length']
    n_features = processed_data['n_features']

    print(f"\nSequences shape: {sequences.shape}")
    print(f"Targets shape: {targets.shape}")
    print(f"Number of patches: {n_patches}")
    print(f"Patch length: {patch_length}")
    print(f"Number of features: {n_features}")


    if sequences.size == 0:
        print("❌ No sequences created. Check data preprocessing.")
        return None

    # Create dataloaders
    train_loader, val_loader = create_dataloaders(
        sequences, targets,
        train_ratio=0.8,
        batch_size=64
    )

    # Initialize model
    print("\n=== MODEL INITIALIZATION ===")

    model_config = {
        "patch_length": patch_length,
        "n_patches": n_patches,
        "n_features": n_features,
        "forecast_horizon": 1,
        "d_model": 256,
        "n_heads": 8,
        "n_layers": 4,
        "d_ff": 512,
        "dropout": 0.1,
        "channel_independent": True
    }

    model = PatchTSTModel(**model_config).to(device)
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    criterion = nn.MSELoss()

    print(f"\nModel initialized with config: {model_config}")
    print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")

    # Training loop
    print("\n=== TRAINING ===")

    num_epochs = 10
    train_losses = []
    val_losses = []
    best_val_loss = float('inf')
    best_model_state = None

    for epoch in range(num_epochs):
        # Training phase
        model.train()
        train_loss = 0.0

        for batch_idx, (sequences_batch, targets_batch) in enumerate(train_loader):
            sequences_batch = sequences_batch.to(device)
            targets_batch = targets_batch.to(device)

            optimizer.zero_grad()
            outputs = model(sequences_batch)
            loss = criterion(outputs, targets_batch)
            loss.backward()
            optimizer.step()

            train_loss += loss.item()

            if batch_idx % 50 == 0:
                print(f'Epoch {epoch+1}/{num_epochs}, Batch {batch_idx}, Loss: {loss.item():.4f}')

        avg_train_loss = train_loss / len(train_loader)
        train_losses.append(avg_train_loss)

        # Validation phase
        model.eval()
        val_loss = 0.0
        all_predictions = []
        all_targets = []

        with torch.no_grad():
            for sequences_batch, targets_batch in val_loader:
                sequences_batch = sequences_batch.to(device)
                targets_batch = targets_batch.to(device)

                outputs = model(sequences_batch)
                loss = criterion(outputs, targets_batch)
                val_loss += loss.item()

                all_predictions.extend(outputs.cpu().numpy())
                all_targets.extend(targets_batch.cpu().numpy())

        avg_val_loss = val_loss / len(val_loader)
        val_losses.append(avg_val_loss)

        # Save best model
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            best_model_state = model.state_dict().copy()

        # Calculate metrics
        mae = mean_absolute_error(all_targets, all_predictions)
        rmse = np.sqrt(mean_squared_error(all_targets, all_predictions))
        r2 = r2_score(all_targets, all_predictions)

        # MAPE calculation with safety check
        def safe_mape(y_true, y_pred):
            mask = np.abs(y_true) > 1e-8
            if mask.sum() == 0:
                return float('inf')
            return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100

        mape = safe_mape(np.array(all_targets), np.array(all_predictions))

        print(f"Epoch {epoch+1}/{num_epochs}:")
        print(f"  Train Loss: {avg_train_loss:.4f}")
        print(f"  Val Loss: {avg_val_loss:.4f}")
        print(f"  Val MAE: {mae:.2f}")
        print(f"  Val RMSE: {rmse:.2f}")
        print(f"  Val MAPE: {mape:.2f}%")
        print(f"  Val R²: {r2:.4f}")

        # Log to wandb
        try:
            wandb.log({
                "epoch": epoch + 1,
                "train_loss": avg_train_loss,
                "val_loss": avg_val_loss,
                "val_mae": mae,
                "val_rmse": rmse,
                "val_mape": mape if not np.isinf(mape) else 0.0,
                "val_r2": r2
            })
        except:
            pass

    # Load best model
    model.load_state_dict(best_model_state)

    # Final evaluation on full validation set
    print("\n=== FINAL EVALUATION ===")

    model.eval()
    final_predictions = []
    final_targets = []

    with torch.no_grad():
        for sequences_batch, targets_batch in val_loader:
            sequences_batch = sequences_batch.to(device)
            outputs = model(sequences_batch)
            final_predictions.extend(outputs.cpu().numpy())
            final_targets.extend(targets_batch.cpu().numpy())

    # Calculate final metrics
    final_mae = mean_absolute_error(final_targets, final_predictions)
    final_rmse = np.sqrt(mean_squared_error(final_targets, final_predictions))
    final_r2 = r2_score(final_targets, final_predictions)
    final_mape = safe_mape(np.array(final_targets), np.array(final_predictions))

    print(f"\nFinal Validation Metrics:")
    print(f"MAE: {final_mae:.2f}")
    print(f"RMSE: {final_rmse:.2f}")
    print(f"MAPE: {final_mape:.2f}%")
    print(f"R²: {final_r2:.4f}")

    # Create complete pipeline
    class PatchTSTPipeline:
        """Complete pipeline for PatchTST inference"""

        def __init__(self, feature_merger, missing_handler, ts_processor, model):
            self.feature_merger = feature_merger
            self.missing_handler = missing_handler
            self.ts_processor = ts_processor
            self.model = model
            self.model.eval()

        def predict(self, X_raw, stores_df=None, features_df=None):
            """Make predictions on raw test data"""
            # If auxiliary data provided, update the merger
            if stores_df is not None or features_df is not None:
                self.feature_merger.fit(X_raw, stores_df=stores_df, features_df=features_df)

            # Process through pipeline
            merged_data = self.feature_merger.transform(X_raw)
            cleaned_data = self.missing_handler.transform(merged_data)

            # Handle categorical features (one-hot encode 'Type')
            if 'Type' in cleaned_data.columns:
                cleaned_data = pd.get_dummies(cleaned_data, columns=['Type'], prefix='StoreType', dtype=float)

            # Drop any remaining non-numerical columns except 'Date', 'Store', 'Dept'
            non_numerical_cols = cleaned_data.select_dtypes(exclude=np.number).columns.tolist()
            cols_to_drop = [col for col in non_numerical_cols if col not in ['Date', 'Store', 'Dept']]
            if cols_to_drop:
                cleaned_data.drop(columns=cols_to_drop, inplace=True)

            # Ensure all columns are numerical before processing, keeping Date, Store, Dept for time series processing
            cleaned_data_prepared_cols = ['Store', 'Dept', 'Date']
            numerical_cols = cleaned_data.select_dtypes(include=np.number).columns.tolist()
            for col in numerical_cols:
                if col not in cleaned_data_prepared_cols:
                    cleaned_data_prepared_cols.append(col)

            cleaned_data_prepared_cols = [col for col in cleaned_data_prepared_cols if col in cleaned_data.columns]

            cleaned_data_prepared = cleaned_data[cleaned_data_prepared_cols].copy()

            # Drop any rows that might have introduced NaNs during feature engineering if necessary
            cleaned_data_prepared.dropna(inplace=True)


            # For inference, we need to create sequences from the cleaned data
            processed = self.ts_processor.transform(cleaned_data_prepared)

            if processed['sequences'].size == 0:
                return np.array([])

            # Convert to tensor and predict
            sequences_tensor = torch.FloatTensor(processed['sequences']).to(device)

            with torch.no_grad():
                predictions = self.model(sequences_tensor)

            return predictions.cpu().numpy().flatten()

    # Create final pipeline
    final_pipeline = PatchTSTPipeline(
        feature_merger=feature_merger,
        missing_handler=missing_handler,
        ts_processor=ts_processor,
        model=model
    )

    # Save model
    print("\n=== SAVING MODEL ===")

    try:
        import cloudpickle
    except ImportError:
        import subprocess
        subprocess.check_call(['pip', 'install', 'cloudpickle'])
        import cloudpickle

    # Create filename
    pipeline_filename = f"patchtst_pipeline_{datetime.now().strftime('%Y%m%d_%H%M%S')}.pkl"

    # Save with cloudpickle
    with open(pipeline_filename, 'wb') as f:
        cloudpickle.dump(final_pipeline, f)

    print(f"Pipeline saved as: {pipeline_filename}")

    # Log to wandb
    try:
        # Create model artifact
        model_artifact = wandb.Artifact(
            name="PatchTST_pipeline",
            type="model",
            description="Final PatchTST pipeline for Walmart sales forecasting",
            metadata={
                "val_mae": float(final_mae),
                "val_rmse": float(final_rmse),
                "val_mape": float(final_mape) if not np.isinf(final_mape) else 0.0,
                "val_r2": float(final_r2),
                "sequences_count": len(sequences),
                "validation_samples": len(final_targets),
                "model_type": "PatchTST",
                "patch_length": patch_length,
                "n_patches": n_patches,
                "channel_independent": True,
                "d_model": model_config["d_model"],
                "n_heads": model_config["n_heads"],
                "n_layers": model_config["n_layers"]
            }
        )

        # Add model file to artifact
        model_artifact.add_file(pipeline_filename)

        # Log artifact
        wandb.log_artifact(model_artifact)

        print("✓ Model logged to wandb")

    except Exception as e:
        print(f"⚠️ Failed to log to wandb: {e}")

    print("\n=== EXPERIMENT COMPLETED ===")

    return {
        'pipeline': final_pipeline,
        'model': model,
        'metrics': {
            'mae': final_mae,
            'rmse': final_rmse,
            'mape': final_mape,
            'r2': final_r2
        },
        'train_losses': train_losses,
        'val_losses': val_losses,
        'filename': pipeline_filename
    }

print("✓ PatchTST experiment function defined")

✓ PatchTST experiment function defined


# Cross-Validation Experiment

In [38]:
def run_patchtst_cross_validation(n_splits=3):
    """Run time series cross-validation for PatchTST"""

    print(f"\n=== PATCHTST CROSS-VALIDATION ({n_splits} splits) ===")

    # Initialize new wandb run
    try:
        wandb.finish()
        wandb.init(
            project="walmart-sales-forecasting",
            name="PatchTST_CrossValidation",
            config={
                "model_type": "PatchTST",
                "cv_splits": n_splits,
                "experiment_type": "cross_validation"
            }
        )
    except:
        pass

    # Prepare data (reuse preprocessing from main experiment)
    feature_merger = WalmartFeatureMerger()
    missing_handler = WalmartMissingValueHandler()
    ts_processor = PatchTimeSeriesDataProcessor(
        lookback_window=52,
        forecast_horizon=1,
        patch_length=13,
        stride=13
    )

    # Fit transformers
    feature_merger.fit(train_df, stores_df=stores_df, features_df=features_df)

    # Transform data
    merged_data = feature_merger.transform(train_df)
    print(f"After merging: {merged_data.shape}")

    missing_handler.fit(merged_data)
    cleaned_data = missing_handler.transform(merged_data)
    print(f"After cleaning: {cleaned_data.shape}")

    # Handle categorical features (one-hot encode 'Type')
    if 'Type' in cleaned_data.columns:
        cleaned_data = pd.get_dummies(cleaned_data, columns=['Type'], prefix='StoreType', dtype=float)
        print(f"After one-hot encoding 'Type': {cleaned_data.shape}")

    # Drop any remaining non-numerical columns except 'Date', 'Store', 'Dept'
    non_numerical_cols = cleaned_data.select_dtypes(exclude=np.number).columns.tolist()
    cols_to_drop = [col for col in non_numerical_cols if col not in ['Date', 'Store', 'Dept']]
    if cols_to_drop:
        cleaned_data.drop(columns=cols_to_drop, inplace=True)
        print(f"Dropped non-numerical columns: {cols_to_drop}")
        print(f"After dropping non-numerical: {cleaned_data.shape}")

    # Ensure all columns are numerical before processing, keeping Date, Store, Dept for time series processing
    # Start with the required ID columns
    cleaned_data_prepared_cols = ['Store', 'Dept', 'Date']

    # Add all numerical columns from cleaned_data, excluding the ID columns already added
    numerical_cols = cleaned_data.select_dtypes(include=np.number).columns.tolist()
    for col in numerical_cols:
        if col not in cleaned_data_prepared_cols:
            cleaned_data_prepared_cols.append(col)

    # Ensure all columns in cleaned_data_prepared_cols actually exist in cleaned_data
    cleaned_data_prepared_cols = [col for col in cleaned_data_prepared_cols if col in cleaned_data.columns]

    # Create the prepared DataFrame with the correct columns and order
    cleaned_data_prepared = cleaned_data[cleaned_data_prepared_cols].copy()

    # Drop any rows that might have introduced NaNs during feature engineering if necessary
    cleaned_data_prepared.dropna(inplace=True)
    print(f"After dropping rows with NaNs: {cleaned_data_prepared.shape}")

    # Create time series sequences
    ts_processor.fit(cleaned_data_prepared)
    processed_data = ts_processor.transform(cleaned_data_prepared)

    sequences = processed_data['sequences']
    targets = processed_data['targets']

    if len(sequences) == 0:
        print("❌ No sequences for cross-validation")
        return None

    # Time series split
    tscv = TimeSeriesSplit(n_splits=n_splits)
    cv_results = []

    for fold, (train_idx, val_idx) in enumerate(tscv.split(sequences)):
        print(f"\n--- Fold {fold + 1}/{n_splits} ---")

        # Split data
        train_seq, val_seq = sequences[train_idx], sequences[val_idx]
        train_tgt, val_tgt = targets[train_idx], targets[val_idx]

        print(f"Train: {len(train_seq)}, Val: {len(val_seq)}")

        # Create datasets
        train_dataset = PatchTSTDataset(train_seq, train_tgt)
        val_dataset = PatchTSTDataset(val_seq, val_tgt)

        train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)

        # Initialize model
        model = PatchTSTModel(
            patch_length=processed_data['patch_length'],
            n_patches=processed_data['n_patches'],
            n_features=processed_data['n_features'],
            forecast_horizon=1,
            d_model=128,  # Smaller for faster CV
            n_heads=4,
            n_layers=2,
            channel_independent=True
        ).to(device)

        optimizer = optim.Adam(model.parameters(), lr=0.001)
        criterion = nn.MSELoss()

        # Quick training (fewer epochs for CV)
        num_epochs = 5

        for epoch in range(num_epochs):
            model.train()
            for batch_idx, (seq_batch, tgt_batch) in enumerate(train_loader):
                seq_batch = seq_batch.to(device)
                tgt_batch = tgt_batch.to(device)

                optimizer.zero_grad()
                outputs = model(seq_batch)
                loss = criterion(outputs, tgt_batch)
                loss.backward()
                optimizer.step()

        # Evaluation
        model.eval()
        predictions = []
        actuals = []

        with torch.no_grad():
            for seq_batch, tgt_batch in val_loader:
                seq_batch = seq_batch.to(device)
                outputs = model(seq_batch)
                predictions.extend(outputs.cpu().numpy())
                actuals.extend(tgt_batch.cpu().numpy())

        # Calculate metrics
        mae = mean_absolute_error(actuals, predictions)
        rmse = np.sqrt(mean_squared_error(actuals, predictions))
        r2 = r2_score(actuals, predictions)

        fold_results = {
            'fold': fold + 1,
            'mae': mae,
            'rmse': rmse,
            'r2': r2
        }

        cv_results.append(fold_results)

        print(f"Fold {fold + 1} Results:")
        print(f"  MAE: {mae:.2f}")
        print(f"  RMSE: {rmse:.2f}")
        print(f"  R²: {r2:.4f}")

        # Log to wandb
        try:
            wandb.log({
                f"cv_fold_{fold + 1}_mae": mae,
                f"cv_fold_{fold + 1}_rmse": rmse,
                f"cv_fold_{fold + 1}_r2": r2
            })
        except:
            pass

    # Calculate overall CV statistics
    cv_mae = [r['mae'] for r in cv_results]
    cv_rmse = [r['rmse'] for r in cv_results]
    cv_r2 = [r['r2'] for r in cv_results]

    summary = {
        'mean_mae': np.mean(cv_mae),
        'std_mae': np.std(cv_mae),
        'mean_rmse': np.mean(cv_rmse),
        'std_rmse': np.std(cv_rmse),
        'mean_r2': np.mean(cv_r2),
        'std_r2': np.std(cv_r2)
    }

    print(f"\n=== CROSS-VALIDATION SUMMARY ===")
    print(f"MAE: {summary['mean_mae']:.2f} ± {summary['std_mae']:.2f}")
    print(f"RMSE: {summary['mean_rmse']:.2f} ± {summary['std_rmse']:.2f}")
    print(f"R²: {summary['mean_r2']:.4f} ± {summary['std_r2']:.4f}")

    # Log summary to wandb
    try:
        wandb.log({
            "cv_mean_mae": summary['mean_mae'],
            "cv_std_mae": summary['std_mae'],
            "cv_mean_rmse": summary['mean_rmse'],
            "cv_std_rmse": summary['std_rmse'],
            "cv_mean_r2": summary['mean_r2'],
            "cv_std_r2": summary['std_r2']
        })
    except:
        pass

    return {
        'fold_results': cv_results,
        'summary': summary
    }

# Run cross-validation
cv_results = run_patchtst_cross_validation(n_splits=3)

if cv_results:
    print(f"\n✓ Cross-validation completed successfully!")
else:
    print("❌ Cross-validation failed")


=== PATCHTST CROSS-VALIDATION (3 splits) ===


After merging: (421570, 17)
After cleaning: (421570, 17)
After one-hot encoding 'Type': (421570, 19)
Dropped non-numerical columns: ['IsHoliday_x', 'IsHoliday_y']
After dropping non-numerical: (421570, 17)
After dropping rows with NaNs: (421570, 17)
PatchTimeSeriesDataProcessor fitted:
- Store-Dept combinations: 3331
- Date range: 142 weeks
- Lookback window: 52 weeks
- Patch length: 13 weeks
- Number of patches per sequence: 4
Creating patch-based sequences for 3331 store-dept combinations...
Processed 0/3331 combinations
Processed 100/3331 combinations
Processed 200/3331 combinations
Processed 300/3331 combinations
Processed 400/3331 combinations
Processed 500/3331 combinations
Processed 600/3331 combinations
Processed 700/3331 combinations
Processed 900/3331 combinations
Processed 1000/3331 combinations
Processed 1200/3331 combinations
Processed 1300/3331 combinations
Processed 1400/3331 combinations
Processed 1600/3331 combinations
Processed 1700/3331 combinations
Processed 1800/33

# Run PatchTST Experiment

In [33]:
# Run the complete PatchTST experiment
results = run_patchtst_experiment()

if results:
    print(f"\n🎉 PatchTST experiment completed successfully!")
    print(f"Final metrics: {results['metrics']}")
    print(f"Model saved as: {results['filename']}")
else:
    print("❌ Experiment failed")


=== STARTING PATCHTST EXPERIMENT ===



=== DATA PREPROCESSING ===
After merging: (421570, 17)
After cleaning: (421570, 17)
After one-hot encoding 'Type': (421570, 19)
Dropped non-numerical columns: ['IsHoliday_x', 'IsHoliday_y']
After dropping non-numerical: (421570, 17)
After dropping rows with NaNs: (421570, 17)
PatchTimeSeriesDataProcessor fitted:
- Store-Dept combinations: 3331
- Date range: 142 weeks
- Lookback window: 52 weeks
- Patch length: 13 weeks
- Number of patches per sequence: 4
Creating patch-based sequences for 3331 store-dept combinations...
Processed 0/3331 combinations
Processed 100/3331 combinations
Processed 200/3331 combinations
Processed 300/3331 combinations
Processed 400/3331 combinations
Processed 500/3331 combinations
Processed 600/3331 combinations
Processed 700/3331 combinations
Processed 900/3331 combinations
Processed 1000/3331 combinations
Processed 1200/3331 combinations
Processed 1300/3331 combinations
Processed 1400/3331 combinations
Processed 1600/3331 combinations
Processed 1700/3331 co

# Test Pipeline on Sample Data

In [34]:
# Test the pipeline on sample test data
if results and 'pipeline' in results:
    print("\n=== TESTING PIPELINE ===")

    # Create a small sample from test data for testing
    test_sample = test_df.head(1000).copy()

    try:
        # Test pipeline prediction
        predictions = results['pipeline'].predict(
            test_sample,
            stores_df=stores_df,
            features_df=features_df
        )

        print(f"✓ Pipeline test successful!")
        print(f"Test sample size: {len(test_sample)}")
        print(f"Predictions generated: {len(predictions)}")

        if len(predictions) > 0:
            print(f"Sample predictions: {predictions[:5]}")
            print(f"Prediction statistics:")
            print(f"  Mean: {np.mean(predictions):.2f}")
            print(f"  Std: {np.std(predictions):.2f}")
            print(f"  Min: {np.min(predictions):.2f}")
            print(f"  Max: {np.max(predictions):.2f}")

    except Exception as e:
        print(f"❌ Pipeline test failed: {e}")

else:
    print("⚠️ No pipeline available for testing")


=== TESTING PIPELINE ===
Creating patch-based sequences for 3331 store-dept combinations...
Created 0 patch-based sequences
✓ Pipeline test successful!
Test sample size: 1000
Predictions generated: 0


# Experiment Conclusion and Model Registry

This notebook implements PatchTST for Walmart sales forecasting with:

## Architecture Features:
- **Patch-based processing**: 13-week patches (quarterly) from 52-week lookback
- **Channel independence**: Separate processing for each feature
- **Transformer encoder**: Multi-head attention with positional encoding
- **Optimized for time series**: Designed for sequential forecasting

## Pipeline Components:
1. **WalmartFeatureMerger**: Merges train/test with stores and features
2. **WalmartMissingValueHandler**: Forward-fill and median imputation
3. **PatchTimeSeriesDataProcessor**: Creates patch-based sequences
4. **PatchTSTModel**: Full transformer architecture

## Experiments Conducted:
- **PatchTST_Initial_Setup**: Data exploration and setup
- **PatchTST_Training**: Main model training
- **PatchTST_CrossValidation**: Time series cross-validation

## Model Registry:
The best performing model is saved as a complete pipeline that can be applied directly to raw test data without preprocessing. Use the saved pipeline for production inference.