# Bayesian D-PDDM Tutorial

In this tutorial, we will use Bayesian D-PDDM to monitor deteriorating shifts in the UCI Heart Disease dataset. The preprocessed dataset is available in ``data/uci_data/``.

In [1]:
import os
os.chdir('../')
import torch
import numpy as np

SEED = 9927
np.random.seed(SEED)
torch.random.manual_seed(SEED)

<torch._C.Generator at 0x77793fb2a5b0>

### Import data

We begin with importing data:

In [2]:
from data.data_utils import UCIDataset
data_dict = torch.load('data/uci_data/uci_heart_torch.pt')
uci_dict = {}
for k, data in data_dict.items():
    data = list(zip(*data))
    X, y = torch.stack(data[0]), torch.tensor(data[1], dtype=torch.int)
    
    # normalize
    min_ = torch.min(X, dim=0).values
    max_ = torch.max(X, dim=0).values
    X = (X - min_) / (max_ - min_)
    uci_dict[k] = UCIDataset(X, y)

We will use ``train`` to train the base model, ``valid`` to validate the base model. Then, ``valid`` will be used to train the distribution of i.i.d. disagreement rates Phi.

We will monitor on both ``train`` and ``iid_test`` in order to validate that our monitor is well-calibrated, i.e. it resists flagging in-distribution unseen samples. 

Finally, we monitor ``ood_test`` to assert that our monitor detects deteriorating changes from the dataset.

In [11]:
dataset_dict = {}
dataset_dict['train'] = uci_dict['train']
dataset_dict['valid'] = uci_dict['val']
dataset_dict['dpddm_train'] = uci_dict['val']
dataset_dict['dpddm_id'] = uci_dict['iid_test']
dataset_dict['dpddm_ood'] = uci_dict['ood_test']

### Import Bayesian D-PDDM components

Bayesian D-PDDM involves two primary components: 
    
- the base model
- the monitor

The base model will depend on the data modality. For tabular data, we work with ``MLPModel``.  For images, we work with ``ConvModel``.
The monitor takes in a base model, training and validation datasets, and a configuration file. We parse these using ``hydra-core``.

In [4]:
import hydra
from omegaconf import DictConfig
from bayesian_dpddm import MLPModel, DPDDMBayesianMonitor
from experiments.utils import get_configs

In [5]:
hydra.initialize(config_path='../experiments/configs', version_base='1.2')
args = hydra.compose(config_name="uci")

In [6]:
model_config, train_config = get_configs(args)

In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
base_model = MLPModel(model_config, train_size=len(dataset_dict['train']))
monitor = DPDDMBayesianMonitor(
        model=base_model,
        trainset=dataset_dict['train'],
        valset=dataset_dict['valid'],
        train_cfg=train_config,
        device=device,
    )

### Base Classifier Training

We are now ready to train the model per our configurations. Simply run ``monitor.train_model``. 

In [8]:
output_metrics = monitor.train_model(tqdm_enabled=True)

  2%|▏         | 1/50 [00:00<00:13,  3.61it/s]

Epoch:  0, train loss: 240.5318
Epoch:  0, valid loss: 2.1850


 22%|██▏       | 11/50 [00:01<00:04,  7.96it/s]

Epoch: 10, train loss: 214.5692
Epoch: 10, valid loss: 1.3339


 42%|████▏     | 21/50 [00:02<00:03,  8.31it/s]

Epoch: 20, train loss: 201.3016
Epoch: 20, valid loss: 1.1879


 62%|██████▏   | 31/50 [00:03<00:02,  9.17it/s]

Epoch: 30, train loss: 192.7989
Epoch: 30, valid loss: 1.0537


 82%|████████▏ | 41/50 [00:05<00:01,  8.80it/s]

Epoch: 40, train loss: 186.4880
Epoch: 40, valid loss: 0.9390


100%|██████████| 50/50 [00:05<00:00,  8.34it/s]


### Training of the maximum i.i.d. disagreement rate distribution

We now have a base Bayesian model fitted to the training data. We now seek the disagreement rate with respect to our base classifier of models that:

- agree with our base classifier on the training data
- disagree with our base classifier on unseen i.i.d. data

In order to achieve this, we batch sample from our belief over the decision surface, and repeatedly select the decision surface with the strongest disagreement rate. We collect these disagreement rates into model.Phi.


In [9]:
monitor.pretrain_disagreement_distribution(dataset=dataset_dict['dpddm_train'],
                                           n_post_samples=args.dpddm.n_post_samples,
                                           data_sample_size=args.dpddm.data_sample_size,
                                           Phi_size=args.dpddm.Phi_size, 
                                           temperature=args.dpddm.temp,
                                           )

100%|██████████| 500/500 [00:02<00:00, 189.10it/s]


### Compute FPRs and TPRs

We are essentially done. Our monitor has the essential components:

- a trained base classifier on i.i.d. data
- a trained distribution of i.i.d. disagreement rates

This base classifier is now ready to be deployed on any deployment data, as long as we monitor periodically by running either ``monitor.dpddm_test`` or ``monitor.repeat_tests`` (for repeated testing, useful to compute statistics) on future data.

In [10]:
stats = {}
dis_rates = {}

for k,dataset in {
    'dpddm_train': dataset_dict['dpddm_train'],
    'dpddm_id': dataset_dict['dpddm_id'],
    'dpddm_ood': dataset_dict['dpddm_ood']
}.items():
    rate, max_dis_rates = monitor.repeat_tests(n_repeats=args.dpddm.n_repeats,
                                    dataset=dataset, 
                                    n_post_samples=args.dpddm.n_post_samples,
                                    data_sample_size=args.dpddm.data_sample_size,
                                    temperature=args.dpddm.temp
                                    )
    print(f"{k}: {rate}")
    stats[k] = rate
    dis_rates[k] = (np.mean(max_dis_rates), np.std(max_dis_rates))


100%|██████████| 100/100 [00:00<00:00, 188.84it/s]


dpddm_train: 0.04


100%|██████████| 100/100 [00:00<00:00, 188.21it/s]


dpddm_id: 0.0


100%|██████████| 100/100 [00:00<00:00, 189.40it/s]

dpddm_ood: 1.0





In the above, we notice that the TPR for OOD samples is 1.0, i.e. our bayesian D-PDDM monitor is able to correctly identify deteriorating changes in the data distribution.

We further notice that for a held-out in-distribution sample, the model does not identify the sample as out-of-distribution. Indeed, the base classifier achieves similar performance on this held-out set than on the validation set, proving that bayesian D-PDDM incurs low false positive rates for in-distribution samples.