# Computing Entropy

In this notebook we will walkthrough entropy computations and some of the options associated with them.

### Imports

For all entropy calculations we will use the neural tangent kernel. Therefore, we will exclusively be using the neural tangents library.

In [1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

import znrnd
from neural_tangents import stax
import optax

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import jax
jax.default_backend()

2023-01-02 15:09:42.140673: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-01-02 15:09:45.639478: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/slurm/lib:/software/opt/focal/x86_64/spack/2021.12/spack/opt/spack/linux-ubuntu20.04-x86_64_v2/gcc-11.2.0/cudnn-8.2.4.15-11.4-r5srvd2bjed7zlr75cesfus3nwsjprw6/lib64:/software/opt/focal/x86_64/spack/2021.12/spack/opt/spack/linux-ubuntu20.04-x86_64_v2/gcc-11.2.0/cuda-11.4.2-jefqkwdwi245u5nbdg5tw3ufrucvsnag/lib64:/opt/slurm/lib:
2023-01-02 15:09:45.639991: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such

'cpu'

### Data generators

For the sake of covereage, we will look at the entropy of the all the data generators on small networks

In [2]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=' ', skipinitialspace=True)

dataset = raw_dataset.copy()
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, columns=['Origin'], prefix='', prefix_sep='')


dataset = (dataset-dataset.mean())/dataset.std()

class MPGDataGenerator(znrnd.data.DataGenerator):
    """
    Data generator for the MPG dataset.
    """
    def __init__(self, dataset: pd.DataFrame):
        """
        Constructor for the data generator.
        
        Parameters
        ----------
        dataset
        """        
        train_ds = dataset.sample(frac=0.8, random_state=0)
        train_labels = train_ds.pop("MPG")
        test_ds = dataset.drop(train_ds.index)
        test_labels = test_ds.pop("MPG")
        
        self.train_ds = {"inputs": train_ds.to_numpy(), "targets": train_labels.to_numpy().reshape(-1, 1)}
        self.test_ds = {"inputs": test_ds.to_numpy(), "targets": test_labels.to_numpy().reshape(-1, 1)}
        
        self.data_pool = self.train_ds["inputs"]


In [3]:
fuel_generator = MPGDataGenerator(dataset)

###  Networks and Models

Now we can define the network architectures for which we will compute the entropy of the data. Let's use a dense network for the fuel data and a convolutional network for the others.

In [4]:
dense_network = stax.serial(
    stax.Dense(32),
    stax.Relu(),
    stax.Dense(32),
)

In [5]:
fuel_model = znrnd.models.NTModel(
    nt_module=dense_network,
    optimizer=optax.adam(learning_rate=0.001),
    loss_fn=znrnd.loss_functions.MeanPowerLoss(order=2),
    input_shape=(9,),
    training_threshold=0.001,
    batch_size=5
)

### Computing the Entropy

Let's compute the entropy of a small subset of the data points, let's say, 10 points from each. To do so, we will perform the following steps:

1. Select the subset of data.
2. Compute the NTK matrix for each model.
3. Instantiate an entropy calculator for each matrix.
4. Compute the entropy for each matrix.

In [6]:
# Step 1

fuel_data = fuel_generator[0:10]

In [7]:
# Step 2

fuel_ntk = fuel_model.compute_ntk(fuel_data, normalize=False)["empirical"]

In [8]:
# Step 3

fuel_calculator = znrnd.analysis.EntropyAnalysis(matrix=fuel_ntk)

In [9]:
# Step 4

fuel_entropy = fuel_calculator.compute_von_neumann_entropy(
    effective=True, normalize_eig=True
)

In [10]:
print(fuel_entropy)

0.73508817
