# OHLCV Scaler with Close Included in OHLC Collective Normalization

This notebook tests including Close in the OHLC collective z-score normalization.
The idea is to normalize Open, High, Low, and Close together as a group.

## Step 0: Setup and Data Preparation

In [1]:
import torch
import numpy as np
from einops import rearrange, repeat, reduce
from uni2ts.common.torch_util import safe_div

# Set seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✓ All dependencies imported successfully")

# Create sample OHLCV data with Close included
# Format: [open, high, low, close, volume, minutes_since_open, day_of_week]
time_steps = 10
num_variates = 7  # [open, high, low, close, volume, minutes_since_open, day_of_week]
patch_size = 1

# Generate realistic OHLCV data
open_data = torch.tensor([100.0, 104.0, 107.0, 109.0, 111.0, 113.0, 115.0, 117.0, 119.0, 121.0])
high_data = torch.tensor([105.0, 108.0, 110.0, 112.0, 114.0, 116.0, 118.0, 120.0, 122.0, 124.0])
low_data = torch.tensor([99.0, 103.0, 106.0, 108.0, 110.0, 112.0, 114.0, 116.0, 118.0, 120.0])
close_data = torch.tensor([102.0, 106.0, 109.0, 111.0, 113.0, 115.0, 117.0, 119.0, 121.0, 123.0])
volume_data = torch.tensor([1000000, 1200000, 900000, 1100000, 950000, 1050000, 1150000, 1250000, 1350000, 1450000], dtype=torch.float32)
minutes_data = torch.tensor([0.0, 5.0, 10.0, 15.0, 20.0, 25.0, 30.0, 35.0, 40.0, 45.0])
dow_data = torch.tensor([0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0])

# Combine all features [time, dim]
# Format: [open, high, low, close, volume, minutes_since_open, day_of_week]
features = torch.stack([open_data, high_data, low_data, close_data, volume_data, minutes_data, dow_data], dim=1)
print(f"\nFeatures shape: {features.shape}")
print(f"Features:\n{features}")

# Add patch dimension [time, dim, patch]
features = features.unsqueeze(-1)

# Reshape to packed format: [time, dim, patch] -> [(dim * time), patch]
target_packed = rearrange(features, "t d p -> (d t) p")
print(f"\nPacked target shape: {target_packed.shape}")

# Create sample_id (all same sample)
sample_id = torch.ones(target_packed.shape[0], dtype=torch.long)

# Create variate_id
variate_id = repeat(torch.arange(num_variates), "d -> (d t)", t=time_steps)
print(f"Variate ID shape: {variate_id.shape}")
print(f"Unique variate IDs: {torch.unique(variate_id).tolist()}")

# All observed
observed_mask = torch.ones_like(target_packed, dtype=torch.bool)

print(f"\n✓ Data prepared successfully")

✓ All dependencies imported successfully

Features shape: torch.Size([10, 7])
Features:
tensor([[1.0000e+02, 1.0500e+02, 9.9000e+01, 1.0200e+02, 1.0000e+06, 0.0000e+00,
         0.0000e+00],
        [1.0400e+02, 1.0800e+02, 1.0300e+02, 1.0600e+02, 1.2000e+06, 5.0000e+00,
         0.0000e+00],
        [1.0700e+02, 1.1000e+02, 1.0600e+02, 1.0900e+02, 9.0000e+05, 1.0000e+01,
         0.0000e+00],
        [1.0900e+02, 1.1200e+02, 1.0800e+02, 1.1100e+02, 1.1000e+06, 1.5000e+01,
         0.0000e+00],
        [1.1100e+02, 1.1400e+02, 1.1000e+02, 1.1300e+02, 9.5000e+05, 2.0000e+01,
         0.0000e+00],
        [1.1300e+02, 1.1600e+02, 1.1200e+02, 1.1500e+02, 1.0500e+06, 2.5000e+01,
         1.0000e+00],
        [1.1500e+02, 1.1800e+02, 1.1400e+02, 1.1700e+02, 1.1500e+06, 3.0000e+01,
         1.0000e+00],
        [1.1700e+02, 1.2000e+02, 1.1600e+02, 1.1900e+02, 1.2500e+06, 3.5000e+01,
         1.0000e+00],
        [1.1900e+02, 1.2200e+02, 1.1800e+02, 1.2100e+02, 1.3500e+06, 4.0000e+01,
       

## Step 1: Create Group Mapping for OHLC Collective Normalization

Map variates to groups:
- OHLC (0,1,2,3) → group 0 (collective)
- Volume (4) → group 1 (individual)
- Others (5,6) → individual groups

In [2]:
print("\n" + "="*70)
print("STEP 1: Create Group Mapping for OHLC Collective Normalization")
print("="*70)

# Create group mapping
group_id = torch.zeros_like(variate_id, dtype=torch.long)

# Define indices
open_idx, high_idx, low_idx, close_idx = 0, 1, 2, 3
volume_idx = 4
minutes_idx, dow_idx = 5, 6

# Create masks
ohlc_mask = torch.isin(variate_id, torch.tensor([open_idx, high_idx, low_idx, close_idx]))
volume_mask = (variate_id == volume_idx)
other_mask = ~(ohlc_mask | volume_mask)

# Assign group IDs
group_id[ohlc_mask] = 0  # OHLC group
group_id[volume_mask] = 1  # Volume group
group_id[other_mask] = variate_id[other_mask] + 2  # Individual groups for others

print(f"\nGroup mapping created:")
print(f"  OHLC mask count: {ohlc_mask.sum().item()}")
print(f"  Volume mask count: {volume_mask.sum().item()}")
print(f"  Other mask count: {other_mask.sum().item()}")

print(f"\nGroup ID distribution:")
for g_id in torch.unique(group_id):
    count = (group_id == g_id).sum().item()
    print(f"  Group {g_id}: {count} positions")

print(f"\n✓ Group mapping created successfully")


STEP 1: Create Group Mapping for OHLC Collective Normalization

Group mapping created:
  OHLC mask count: 40
  Volume mask count: 10
  Other mask count: 20

Group ID distribution:
  Group 0: 40 positions
  Group 1: 10 positions
  Group 7: 10 positions
  Group 8: 10 positions

✓ Group mapping created successfully


## Step 2: Create Identity Mask for Sample and Group

In [3]:
print("\n" + "="*70)
print("STEP 2: Create Identity Mask")
print("="*70)

# Create identity mask for sample and group
id_mask = torch.logical_and(
    torch.eq(sample_id.unsqueeze(-1), sample_id.unsqueeze(-2)),
    torch.eq(group_id.unsqueeze(-1), group_id.unsqueeze(-2)),
)

print(f"\nIdentity mask shape: {id_mask.shape}")
print(f"Identity mask dtype: {id_mask.dtype}")

# Count True values per group
print(f"\nIdentity mask statistics:")
print(f"  Total True values: {id_mask.sum().item()}")
print(f"  Total positions: {id_mask.shape[0] * id_mask.shape[1]}")

print(f"\n✓ Identity mask created successfully")


STEP 2: Create Identity Mask

Identity mask shape: torch.Size([70, 70])
Identity mask dtype: torch.bool

Identity mask statistics:
  Total True values: 1900
  Total positions: 4900

✓ Identity mask created successfully


## Step 3: Compute Total Observations per Group

In [4]:
print("\n" + "="*70)
print("STEP 3: Compute Total Observations per Group")
print("="*70)

# Compute total observations per group using einops.reduce
tobs = reduce(
    id_mask * reduce(observed_mask, "... seq dim -> ... 1 seq", "sum"),
    "... seq1 seq2 -> ... seq1 1",
    "sum",
)

print(f"\nTotal observations shape: {tobs.shape}")
print(f"Total observations dtype: {tobs.dtype}")

# Show unique values
unique_tobs = torch.unique(tobs)
print(f"\nUnique observation counts: {unique_tobs.tolist()}")

# Show distribution
print(f"\nObservation count distribution:")
for count in unique_tobs:
    num_positions = (tobs == count).sum().item()
    print(f"  {count.item()} observations: {num_positions} positions")

print(f"\n✓ Total observations computed successfully")


STEP 3: Compute Total Observations per Group

Total observations shape: torch.Size([70, 1])
Total observations dtype: torch.int64

Unique observation counts: [10, 40]

Observation count distribution:
  10 observations: 30 positions
  40 observations: 40 positions

✓ Total observations computed successfully


## Step 4: Compute Group-wise Mean (Location Parameter)

In [5]:
print("\n" + "="*70)
print("STEP 4: Compute Group-wise Mean")
print("="*70)

# Compute group-wise mean using einops.reduce
loc_grouped = reduce(
    id_mask * reduce(target_packed * observed_mask, "... seq dim -> ... 1 seq", "sum"),
    "... seq1 seq2 -> ... seq1 1",
    "sum",
)
loc_grouped = safe_div(loc_grouped, tobs)

print(f"\nGroup-wise mean shape: {loc_grouped.shape}")
print(f"Group-wise mean dtype: {loc_grouped.dtype}")

# Show unique values
unique_means = torch.unique(loc_grouped)
print(f"\nUnique mean values: {unique_means.tolist()}")

# Show statistics
print(f"\nMean statistics:")
print(f"  Min: {loc_grouped.min().item():.6f}")
print(f"  Max: {loc_grouped.max().item():.6f}")
print(f"  Mean: {loc_grouped.mean().item():.6f}")

# Show per-group means
print(f"\nMeans per group:")
for g_id in torch.unique(group_id):
    mask = (group_id == g_id)
    group_means = loc_grouped[mask]
    unique_group_means = torch.unique(group_means)
    print(f"  Group {g_id}: {unique_group_means.tolist()}")

print(f"\n✓ Group-wise mean computed successfully")


STEP 4: Compute Group-wise Mean

Group-wise mean shape: torch.Size([70, 1])
Group-wise mean dtype: torch.float32

Unique mean values: [0.5, 22.5, 112.67500305175781, 1140000.0]

Mean statistics:
  Min: 0.500000
  Max: 1140000.000000
  Mean: 162924.781250

Means per group:
  Group 0: [112.67500305175781]
  Group 1: [1140000.0]
  Group 7: [22.5]
  Group 8: [0.5]

✓ Group-wise mean computed successfully


## Step 5: Compute Group-wise Standard Deviation (Scale Parameter)

In [6]:
print("\n" + "="*70)
print("STEP 5: Compute Group-wise Standard Deviation")
print("="*70)

# Compute group-wise variance using einops.reduce
var_grouped = reduce(
    id_mask
    * reduce(
        ((target_packed - loc_grouped) ** 2) * observed_mask,
        "... seq dim -> ... 1 seq",
        "sum",
    ),
    "... seq1 seq2 -> ... seq1 1",
    "sum",
)
var_grouped = safe_div(var_grouped, (tobs - 1))  # Bessel's correction
scale_grouped = torch.sqrt(var_grouped + 1e-5)  # Add minimum_scale

print(f"\nGroup-wise std shape: {scale_grouped.shape}")
print(f"Group-wise std dtype: {scale_grouped.dtype}")

# Show unique values
unique_stds = torch.unique(scale_grouped)
print(f"\nUnique std values: {unique_stds.tolist()}")

# Show statistics
print(f"\nStd statistics:")
print(f"  Min: {scale_grouped.min().item():.6f}")
print(f"  Max: {scale_grouped.max().item():.6f}")
print(f"  Mean: {scale_grouped.mean().item():.6f}")

# Show per-group stds
print(f"\nStds per group:")
for g_id in torch.unique(group_id):
    mask = (group_id == g_id)
    group_stds = scale_grouped[mask]
    unique_group_stds = torch.unique(group_stds)
    print(f"  Group {g_id}: {unique_group_stds.tolist()}")

print(f"\n✓ Group-wise std computed successfully")


STEP 5: Compute Group-wise Standard Deviation

Group-wise std shape: torch.Size([70, 1])
Group-wise std dtype: torch.float32

Unique std values: [0.5270557999610901, 6.564815521240234, 15.138252258300781, 176068.171875]

Std statistics:
  Min: 0.527056
  Max: 176068.171875
  Mean: 25158.582031

Stds per group:
  Group 0: [6.564815521240234]
  Group 1: [176068.171875]
  Group 7: [15.138252258300781]
  Group 8: [0.5270557999610901]

✓ Group-wise std computed successfully


## Step 6: Apply Group-wise Statistics to All Positions

In [7]:
print("\n" + "="*70)
print("STEP 6: Apply Group-wise Statistics to All Positions")
print("="*70)

# Initialize loc and scale tensors
loc = torch.zeros_like(target_packed, dtype=target_packed.dtype)
scale = torch.ones_like(target_packed, dtype=target_packed.dtype)

print(f"\nInitialized loc shape: {loc.shape}")
print(f"Initialized scale shape: {scale.shape}")

# For each position, find its group and apply the corresponding statistics
for i in range(target_packed.shape[0]):
    s_id = sample_id[i]
    g_id = group_id[i]
    
    # Find the group statistics for this sample and group
    mask = torch.logical_and(
        torch.eq(sample_id, s_id),
        torch.eq(group_id, g_id),
    )
    
    if mask.any():
        # Get the first position with this sample and group
        idx = mask.nonzero(as_tuple=True)[0][0]
        loc[i] = loc_grouped[idx]
        scale[i] = scale_grouped[idx]

print(f"\nApplied group-wise statistics to all positions")

# Show statistics per variate
print(f"\nStatistics per variate:")
for v_id in torch.unique(variate_id):
    mask = (variate_id == v_id)
    var_locs = loc[mask]
    var_scales = scale[mask]
    
    unique_loc = torch.unique(var_locs)
    unique_scale = torch.unique(var_scales)
    
    print(f"\n  Variate {v_id}:")
    print(f"    Unique loc values: {unique_loc.tolist()}")
    print(f"    Unique scale values: {unique_scale.tolist()}")

print(f"\n✓ Group-wise statistics applied successfully")


STEP 6: Apply Group-wise Statistics to All Positions

Initialized loc shape: torch.Size([70, 1])
Initialized scale shape: torch.Size([70, 1])

Applied group-wise statistics to all positions

Statistics per variate:

  Variate 0:
    Unique loc values: [112.67500305175781]
    Unique scale values: [6.564815521240234]

  Variate 1:
    Unique loc values: [112.67500305175781]
    Unique scale values: [6.564815521240234]

  Variate 2:
    Unique loc values: [112.67500305175781]
    Unique scale values: [6.564815521240234]

  Variate 3:
    Unique loc values: [112.67500305175781]
    Unique scale values: [6.564815521240234]

  Variate 4:
    Unique loc values: [1140000.0]
    Unique scale values: [176068.171875]

  Variate 5:
    Unique loc values: [22.5]
    Unique scale values: [15.138252258300781]

  Variate 6:
    Unique loc values: [0.5]
    Unique scale values: [0.5270557999610901]

✓ Group-wise statistics applied successfully


## Step 7: Apply Mid-Range Normalization for Time Features

In [8]:
print("\n" + "="*70)
print("STEP 7: Apply Mid-Range Normalization for Time Features")
print("="*70)

# Define mid-range parameters
minutes_mid = 195.0
minutes_range = 97.5
dow_mid = 2.0
dow_range = 1.0

# Apply minutes_since_open normalization
minutes_mask = (variate_id == minutes_idx)
if minutes_mask.any():
    loc[minutes_mask] = minutes_mid
    scale[minutes_mask] = minutes_range
    print(f"\nApplied minutes_since_open normalization:")
    print(f"  Positions: {minutes_mask.sum().item()}")
    print(f"  Mid: {minutes_mid}")
    print(f"  Range: {minutes_range}")

# Apply day_of_week normalization
dow_mask = (variate_id == dow_idx)
if dow_mask.any():
    loc[dow_mask] = dow_mid
    scale[dow_mask] = dow_range
    print(f"\nApplied day_of_week normalization:")
    print(f"  Positions: {dow_mask.sum().item()}")
    print(f"  Mid: {dow_mid}")
    print(f"  Range: {dow_range}")

# Verify time features
print(f"\nVerification:")
minutes_loc_unique = torch.unique(loc[minutes_mask])
minutes_scale_unique = torch.unique(scale[minutes_mask])
print(f"  Minutes loc: {minutes_loc_unique.tolist()}")
print(f"  Minutes scale: {minutes_scale_unique.tolist()}")

dow_loc_unique = torch.unique(loc[dow_mask])
dow_scale_unique = torch.unique(scale[dow_mask])
print(f"  Day of week loc: {dow_loc_unique.tolist()}")
print(f"  Day of week scale: {dow_scale_unique.tolist()}")

print(f"\n✓ Mid-range normalization applied successfully")


STEP 7: Apply Mid-Range Normalization for Time Features

Applied minutes_since_open normalization:
  Positions: 10
  Mid: 195.0
  Range: 97.5

Applied day_of_week normalization:
  Positions: 10
  Mid: 2.0
  Range: 1.0

Verification:
  Minutes loc: [195.0]
  Minutes scale: [97.5]
  Day of week loc: [2.0]
  Day of week scale: [1.0]

✓ Mid-range normalization applied successfully


## Step 8: Verify OHLC Collective Normalization

In [9]:
print("\n" + "="*70)
print("STEP 8: Verify OHLC Collective Normalization")
print("="*70)

print(f"\nVerification of OHLC Collective Normalization:")
open_loc = torch.unique(loc[variate_id == open_idx])
high_loc = torch.unique(loc[variate_id == high_idx])
low_loc = torch.unique(loc[variate_id == low_idx])
close_loc = torch.unique(loc[variate_id == close_idx])

print(f"  Open loc: {open_loc.tolist()}")
print(f"  High loc: {high_loc.tolist()}")
print(f"  Low loc: {low_loc.tolist()}")
print(f"  Close loc: {close_loc.tolist()}")

# Verify they all have the same value
assert len(open_loc) == 1 and len(high_loc) == 1 and len(low_loc) == 1 and len(close_loc) == 1, "OHLC should have single loc value"
assert torch.isclose(open_loc[0], high_loc[0], atol=1e-4), "Open and High should have same loc"
assert torch.isclose(open_loc[0], low_loc[0], atol=1e-4), "Open and Low should have same loc"
assert torch.isclose(open_loc[0], close_loc[0], atol=1e-4), "Open and Close should have same loc"
print(f"  ✓ OHLC collective normalization verified!")

# Verify Volume is different
print(f"\nVerification of Volume Individual Normalization:")
volume_loc = torch.unique(loc[variate_id == volume_idx])
print(f"  Volume loc: {volume_loc.tolist()}")
assert not torch.isclose(volume_loc[0], open_loc[0], atol=1e-4), "Volume should differ from OHLC"
print(f"  ✓ Volume individual normalization verified!")

print(f"\n✓ All verifications passed!")


STEP 8: Verify OHLC Collective Normalization

Verification of OHLC Collective Normalization:
  Open loc: [112.67500305175781]
  High loc: [112.67500305175781]
  Low loc: [112.67500305175781]
  Close loc: [112.67500305175781]
  ✓ OHLC collective normalization verified!

Verification of Volume Individual Normalization:
  Volume loc: [1140000.0]
  ✓ Volume individual normalization verified!

✓ All verifications passed!


## Summary

Successfully implemented OHLC collective normalization with Close included:

✓ Open, High, Low, Close share the same mean and std (Group 0)
✓ Volume has individual mean and std (Group 1)
✓ Time features use fixed mid-range values
✓ All operations are vectorized using einops.reduce