# OTUS | $p p > Z > e^+ e^-$

This notebook obtains the npz file of the trained model's results on validation data for the ablation study. Below are details about the problem.

This notebooks applies OTUS to our first test case: $Z$ boson decaying into an electron ($e^-$) positron ($e^+$).

Our physical latent-space is the $e^+$, $e^-$ 4-momentum information produced by the program MadGraph and our data-space data is the $e^+$, $e^-$ 4-momentum information produced by the program Delphes.

We arrange this information into 8 dimensional vectors
- Latent space (z): [$p^{\mu}_{e^-}$,$p^{\mu}_{e^+}$]
- Data space (x):   [$p^{\mu}_{e^-}$,$p^{\mu}_{e^+}$]

where $p^{\mu}=[p_x, p_y, p_z, E]$ is the 4-momentum of the given particle.

###### Additional Losses and Constraints:
We impose the following additional losses and constraints in this problem.

First, we impose a constraint on the learned mappings via "anchor losses". This constrains the direction of the electron's 3-momenta when transforming from x-space to z-space and vice versa.

Second, we explicitly enforce the Minkowski metric in the output of the networks. Namely, the networks predict the 3-momenta ($\vec{p}$) of the particles. Energy information is then restored using the Minkowski metric: $E^2 = |\vec{p}|^2 + m^2$.

See the paper for more details: https://arxiv.org/abs/2101.08944.

# Load Required Libraries

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import torch
import numpy as np
import os

root_dir = '../../../../'

#-- Add utilityFunctions/ to easily use utility .py files --#
import sys
sys.path.append(os.path.join(root_dir, "utilityFunctions/"))

#-- Determine if using GPU or CPU --#
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '2'  # Set to '-1' to disable GPU
from configs import device, data_dims

print('Using device:', device)

Using device: cpu


# Meta Parameters

In [2]:
#-- Set appropriate beta value --#
# allBetas    = [0, 10, 20, 50, 100, 200]
selectBetas = [0, 50, 100]
beta = selectBetas[0]
print('beta = ', beta)

beta =  0


In [3]:
#-- Set directory short-cuts --#
data_directory    = os.path.join(root_dir, "data/")
dataset_name      = 'ppzee'

#-- Set random seeds --#
seed = 1
torch.manual_seed(seed)
np.random.seed(seed)

#-- Set data type --#
from configs import float_type
print('Using data type: ', float_type)

Using data type:  float32


# Load Validation Data for Ablation Study

In [4]:
from func_utils import get_dataset, standardize
from torch.utils.data import DataLoader

#-- Get training and validation dataset --#
dataset = get_dataset(dataset_name, data_dir=data_directory) 
z_data, x_data = dataset['z_data'], dataset['x_data']

MET = False # Exclude Missing Transverse Energy (MET) from x-space data
if MET == False:
    x_data = x_data[:, :-4]
print("Data total shapes: ",z_data.shape, x_data.shape)

x_dim = int(x_data.shape[1])
z_dim = int(z_data.shape[1])

#-- Split into training and validation sets --#
train_size = 291699
val_size = 40000  # Validation set used to evaluate/tune models

x_train = x_data[:train_size, :]
x_val = x_data[train_size:train_size+val_size, :]

z_train = z_data[:train_size, :]
z_val = z_data[train_size:train_size+val_size, :]

#-- Convert data to proper type --#
x_train, x_val, z_train, z_val = list(map(lambda x: x.astype(float_type), [x_train, x_val, z_train, z_val]))

#-- Obtain mean and std information --#
# This is needed to standardize/unstandardize data
x_train_mean, x_train_std = np.mean(x_train, axis=0), np.std(x_train, axis=0) 
z_train_mean, z_train_std = np.mean(z_train, axis=0), np.std(z_train, axis=0)

#-- Set evaluation parameters --#
eval_batch_size = 20000  # Always use high batch size on validation set to accurately assess performance
eval_loaders = DataLoader(dataset=x_val, batch_size=eval_batch_size, shuffle=True), \
               DataLoader(dataset=z_val, batch_size=eval_batch_size, shuffle=True)

print("z_train shape, x_train shape: ", z_train.shape, x_train.shape)
print("z_val   shape, x_val   shape: ", z_val.shape, x_val.shape)


Data total shapes:  (331699, 8) (331699, 8)
z_train shape, x_train shape:  (291699, 8) (291699, 8)
z_val   shape, x_val   shape:  (40000, 8) (40000, 8)


In [5]:
#-- Define dictionary object for easy reference --#
all_arrs = {'train': {}, 'val': {}}  # This will store all numpy arrays of interest
all_arrs['train']['x'] = x_train
all_arrs['train']['z'] = z_train
all_arrs['val']['x']   = x_val
all_arrs['val']['z']   = z_val

In [6]:
#-- Define target invariant masses --#
x_inv_masses = np.zeros(2)
z_inv_masses = np.zeros(2)

## Import Training Specific Libraries and Functions

In [7]:
import torch
from torch import optim
from ppzee_utils import train_and_val

## Define Meta Network Parameters

In [8]:
cond_noise = True  # Whether to use conditional Gaussian (instead of standard normal) for noise in enc/dec
if cond_noise:
    from models import CondNoiseAutoencoder
    Autoencoder = CondNoiseAutoencoder  # Define alias 
else:
    from models import Autoencoder

## Define Model and Hyperparameters

###### Latent loss function: 
Finite sample approximation of Sliced Wasserstein Distance (SWD) between $p(z)$ and $p_E(z) = \int_x p(x) p_E(z|x)$
- $L_{latent}(Z, \tilde{Z}) = \frac{1}{L * M} \sum_{l=1}^{L} \sum_{m=1}^{M} c((\theta_l \cdot z_m)_{sorted}, (\theta_l \cdot \tilde{z}_m)_{sorted})$ 

where $c(\cdot, \cdot) = |\cdot - \cdot|^2$

###### Data loss function: 
- $L_{data}(X, \tilde{X}) = \frac{1}{M} \sum_{m=1}^M c(x_m,  \tilde{x}_m)$

where $c(\cdot, \cdot) = |\cdot - \cdot|^2$

###### Additional loss functions: 
Encoder and decoder anchor losses
- $L_{A}(X, \tilde{Z}) = \frac{1}{M} \sum_{m=1}^M c_A(x_m, \tilde{z}_m)$
- $L_{A}(Z, \tilde{X}') = \frac{1}{M} \sum_{m=1}^M c_A(z_m, \tilde{x}'_m)$

see the paper for additional details about $c_A$.

###### Full loss function:
- $L_{tot} = \beta L_{data}(X, \tilde{X}) + \lambda L_{latent}(Z, \tilde{Z}) + \nu_e L_{A}(X, \tilde{Z}) + \nu_d L_{A}(Z, \tilde{X}')$

###### Core Hyperparameters
The hyperparameter definitions are as follows:
- num_hidden_layers:    The number of hidden layers in both the encoder and decoder networks
- dim_per_hidden_layer: The dimensions per hidden layer in both the encoder and decoder networks
- lr: The learning rate of the networks
- lamb: The $\lambda$ coefficient in front of the latent loss term
- num_slices: Number of random projections used for computing SWD
- epochs: The number of epochs used during training

Hyperparameters for other losses that were tried, but use during main training is currently discouraged:
- tau: Coefficient in front of the alternate data-space loss ("alt_x_loss"), which is the SWD between $p(x)$ and $p_D(x):=\int_z p(z) p_D(x|z)$
- rho: Coefficient in front of an additional decoder constraint loss (based on soft-penalty approach to learning hard thresholds/ttbar_constraints)

###### Joint Training Hyperparameters

- beta: Coefficient in front of data loss, $L_{data}$     
- nu_e: Coefficient in front of the encoder "anchor loss" 
- nu_d: Coefficient in front of the decoder "anchor loss" 

In [9]:
config = {
    'num_hidden_layers': 1,
    'dim_per_hidden_layer': 128,
    'lr': 0.001,
    'beta': beta,  # weight of the data reconstruction loss
    'lamb': 1.,    # weight of the latent space matching loss
    'tau': 0,  
    'rho': 0,  
}



num_hidden_layers, dim_per_hidden_layer = config['num_hidden_layers'], config['dim_per_hidden_layer']
hidden_layer_dims = num_hidden_layers * [dim_per_hidden_layer]

activation = torch.nn.ReLU
sigma_fun = 'softplus'  # Default is 'exp'
model = Autoencoder(x_dim=x_dim, z_dim=z_dim, hidden_layer_dims=hidden_layer_dims, raw_io=True,
                    x_stats=np.stack([x_train_mean, x_train_std]), z_stats=np.stack([z_train_mean, z_train_std]),
                    x_inv_masses=x_inv_masses, z_inv_masses=z_inv_masses,
                    stoch_enc=True, stoch_dec=True, activation=activation, sigma_fun=sigma_fun)

In [10]:
# Print model 
model

CondNoiseAutoencoder(
  (encoder): CondNoiseMLP(
    (sigma_fun): Softplus(beta=1, threshold=20)
    (output_nn): Sequential(
      (0): Linear(in_features=14, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=6, bias=True)
    )
    (cond_noise_nn): Sequential(
      (0): Linear(in_features=8, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=12, bias=True)
    )
  )
  (decoder): CondNoiseMLP(
    (sigma_fun): Softplus(beta=1, threshold=20)
    (output_nn): Sequential(
      (0): Linear(in_features=14, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=6, bias=True)
    )
    (cond_noise_nn): Sequential(
      (0): Linear(in_features=8, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=12, bias=True)
    )
  )
)

# Evaluate Trained Model on Validation Data

For easier downstream analysis we also evaluate the trained model on our testing dataset.

In [11]:
#-- Reset random seeds --#
seed = 1
torch.manual_seed(seed)
np.random.seed(seed)

#-- Evaluate trained model on validation dataset --#

# Use CPU instead of GPU
model.to('cpu')
model.encoder.output_stats.to('cpu')
model.decoder.output_stats.to('cpu')

#-- Set save directory location for npz files --#
save_dir      = './npzFiles/'
save_filename = f'swae-beta={beta}.npz'

#-- Load model's trained weights and set to evaluation mode --#
model.load_state_dict(torch.load(f'swae-beta={beta}.pkl', map_location=torch.device('cpu')))
model.eval()

CondNoiseAutoencoder(
  (encoder): CondNoiseMLP(
    (sigma_fun): Softplus(beta=1, threshold=20)
    (output_nn): Sequential(
      (0): Linear(in_features=14, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=6, bias=True)
    )
    (cond_noise_nn): Sequential(
      (0): Linear(in_features=8, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=12, bias=True)
    )
  )
  (decoder): CondNoiseMLP(
    (sigma_fun): Softplus(beta=1, threshold=20)
    (output_nn): Sequential(
      (0): Linear(in_features=14, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=6, bias=True)
    )
    (cond_noise_nn): Sequential(
      (0): Linear(in_features=8, out_features=128, bias=True)
      (1): ReLU()
      (2): Linear(in_features=128, out_features=12, bias=True)
    )
  )
)

In [12]:
#-- Get validation dataset set into dictionary --#
# z_val and x_val already defined from above
print(all_arrs['val']['x'].shape, all_arrs['val']['z'].shape)

# Evaluate trained model on validation dataset
arrs = all_arrs['val']

arrs['z_decoded'] = model.decode(torch.from_numpy(arrs['z'])) # p_D(x) = \int_z p(z) p_D(x|z)  "x_pred_truth"
arrs['x_encoded'] = model.encode(torch.from_numpy(arrs['x'])) # p_E(z) = \int_x p(x) p_E(z|x)  "z_pred"
arrs['x_reconstructed'] = model.decode(arrs['x_encoded'])     # p_D(y) = \int_x \int_z p(x) p_E(z|x) p_D(y|z) "x_pred"
       
# Feed the same z input to the decoder multiple times and study the stochastic output
num_repeats = 100
num_diff_zs = 100

arrs['z_rep'] = np.array([np.repeat(arrs['z'][i:i+1], num_repeats, axis=0) for i in range(num_diff_zs)])       # "z_fixed"
z_rep_tensor = torch.from_numpy(arrs['z_rep'])                                                                 # tmp
arrs['z_decoded_rep'] = np.array([model.decode(z_rep_tensor[i]).detach().numpy() for i in range(num_diff_zs)]) # "x_pred_truth_fixed"
arrs['x_rep'] = np.array([np.repeat(arrs['x'][i:i+1], num_repeats, axis=0) for i in range(num_diff_zs)])       # "x_fixed"

# Convert all results to numpy arrays
for (field, arr) in arrs.items():
    if isinstance(arr, torch.Tensor):
        arrs[field] = arr.detach().numpy()

(40000, 8) (40000, 8)


In [13]:
save_path = save_dir + save_filename
np.savez(save_path, **all_arrs['val'])
print('Model results saved at', save_path)

Model results saved at ./npzFiles/swae-beta=0.npz
