## Gaussian Network Moments (GNMs)

In [1]:
import torch
from torch import nn
from pprint import pprint
from types import ModuleType
import network_moments.torch as nm

seed = 73  # for reproducability

def traverse(obj, exclude=[]):
    data = []
    if type(obj) is not ModuleType:
        return data
    for e in dir(obj):
        if not e.startswith('_') and all(e != s for s in exclude):
            sub = traverse(obj.__dict__[e], exclude)
            data.append(e if len(sub) == 0 else {e:sub})
    return data

print(nm.__doc__)
print('Network Moments Structure:')
pprint(traverse(nm, exclude=['tests', 'general']))

Network Moments (NMs).

Let x be a random variable with some mean M and covariance S.
x can be multivariate of size (N), so S of size (N, N) and M of size (N).
S = C - outer_product(M, M), where C is the correlation matrix.
M and C are the expectations of x and outer_product(x, x), respectively.
The n-th moment of x is the expectation of x_to_the_power_n.
The diagonal of S and C are the variance and second_moment, respectively.
The second_moment is the expectation of x_squared.
The variance = second_moment - M_squared.

For any function acting on x (e.g., f(x)),
we want to compute its probability density function (i.e., of f(x)).
A simpler task maybe is to find the n-th-moment of the function for all n > 0.

This module is trying to find closed form expressions for the output
probabilistic moments of some functions given some input distributions.

Network Moments Structure:
[{'gaussian': [{'affine': ['covariance', 'mean', 'variance']},
               {'affine_relu_affine': ['mean',
   

### Testing the tightness of the expressions on ReLU

In [2]:
help(nm.gaussian.relu.tests.tightness)

Help on function tightness in module network_moments.gaussian.relu.tests:

tightness(length=1, count=1000000, seed=None, dtype=torch.float64, device=None)
    Test the tightness of the expressions against Monte-Carlo estimations.
    
    The expressions are for the ReLU function f(x) = max(x, 0).
    
    Args:
        length: Size of the vector.
        count: Number of samples for Monte-Carlo estimation.
        seed: Seed for the random number generator.
        dtype: The data type.
        device: In which device.
    
    Returns:
        (out_mu, mc_mu), (out_var, mc_var)
        out_mu: The output mean computed using the expressions.
        mc_mu: The output mean estimated using Monte-Carlo sampling.
        out_var: The output variance computed using the expressions.
        mc_var: The output variance estimated using Monte-Carlo sampling.



In [3]:
(out_mu, mc_mu), (out_var, mc_var) = nm.gaussian.relu.tests.tightness(1, seed=seed)
print('Monte-Carlo mean / Analytical mean:')
print((mc_mu / out_mu).cpu().numpy())
print('Monte-Carlo variance / Analytical variance:')
print((mc_var / out_var).cpu().numpy())

Monte-Carlo mean / Analytical mean:
[0.99934677]
Monte-Carlo variance / Analytical variance:
[0.99964285]


### Testing the tightness of the expressions on the affine transformation

In [4]:
(out_mu, mc_mu), (out_var, mc_var) = nm.gaussian.affine.tests.tightness(3, seed=seed)
print('Monte-Carlo mean / Analytical mean:')
print((mc_mu / out_mu).cpu().numpy())
print('Monte-Carlo variance / Analytical variance:')
print((mc_var / out_var).cpu().numpy())

Monte-Carlo mean / Analytical mean:
[0.99832705 0.99963103 0.99941707]
Monte-Carlo variance / Analytical variance:
[0.99934219 0.99814858 1.00044002]


### Checking the correctness of the batch implementation

In [5]:
print(nm.general.affine.tests.batch_mean.__doc__)

Test the correctness of batch implementation of mean().

    This function will stack `[1 * mu, 2 * mu, ..., batch * mu]`.
    Then, it will see whether the batch output is accurate or not.

    Args:
        size: Tuple size of matrix A.
        batch: The batch size > 0.
        dtype: data type.
        device: In which device.
        mu: To test a specific mean mu.
        A: To test a specific A matrix.
        b: To test a specific bias b.

    Returns:
        A scalar, the closer it is to zero,
        the more accurate the implementation.
    


In [6]:
print('If these values are close to zero, the batch implementation is correct:')
print(nm.general.affine.tests.batch_mean())
print(nm.general.affine.tests.batch_variance())
print(nm.general.affine.tests.batch_covariance())
print(nm.gaussian.relu.tests.batch_mean())
print(nm.gaussian.relu.tests.batch_zero_mean_correlation())
print(nm.gaussian.relu.tests.batch_zero_mean_covariance())

If these values are close to zero, the batch implementation is correct:
2.220446049250313e-16
(0.0, 0.0)
(2.220446049250313e-16, 5.551115123125783e-17)
(0.0, 0.0, 0.0)
0.0
0.0


### Testing the tightness of the expressions on affine-ReLU-affine networks

In [7]:
rand = nm.utils.rand
gnm = nm.gaussian.affine_relu_affine
print(gnm.special_variance.__doc__)

Output variance of Affine-ReLU-Affine for special Gaussian input.

    f(x) = B*max(A*x+c1, 0)+c2, where c1 = -A*input_mean.

    For this specific c1, this function doesn't depend on
    neither the input mean nor the biases.

    Args:
        covariance: Input covariance matrix (Batch, Size, Size)
            or variance vector (Batch, Size) for diagonal covariance.
        A: The A matrix (M, Size).
        B: The B matrix (N, M).
        variance: Whether the input covariance is a diagonal matrix.
        stability: For accurate results this should be zero
            if used in training, use a value like 1e-4 for stability.

    Returns:
        Output variance of Affine-ReLU-Affine for Gaussian input
        with mean = `mean` and covariance matrix = `covariance`
        where the bias of the first affine = -A*`mean`.
    


In [8]:
length = 3
count = 1000000
dtype = torch.float64
device = torch.device('cpu', 0)
torch.manual_seed(seed)

# input mean and covariance
mu = torch.randn(length, dtype=dtype, device=device)
cov = rand.definite(length, dtype=dtype, device=device,
                    positive=True, semi=False, norm=1.0)

# variables
A = torch.randn(length, length, dtype=dtype, device=device)
c1 = -A.matmul(mu)  # torch.randn(length, dtype=dtype)
B = torch.randn(length, length, dtype=dtype, device=device)
c2 = torch.randn(length, dtype=dtype, device=device)

# analytical output mean and variance
out_mu = gnm.mean(mu, cov, A, c1, B, c2)
out_var = gnm.special_variance(cov, A, B)

# Monte-Carlo estimation of the output mean and variance
normal = torch.distributions.MultivariateNormal(mu, cov)
samples = normal.sample((count,))
out_samples = samples.matmul(A.t()) + c1
out_samples = torch.max(out_samples, torch.zeros([], dtype=dtype, device=device))
out_samples = out_samples.matmul(B.t()) + c2
mc_mu = torch.mean(out_samples, dim=0)
mc_var = torch.var(out_samples, dim=0)

# printing the ratios
print('Monte-Carlo mean / Analytical mean:')
print((mc_mu / out_mu).cpu().numpy())
print('Monte-Carlo variance / Analytical variance:')
print((mc_var / out_var).cpu().numpy())

Monte-Carlo mean / Analytical mean:
[0.99859776 0.9986586  1.00122188]
Monte-Carlo variance / Analytical variance:
[1.00007968 0.99763406 0.99830018]


### Linearization

In [9]:
batch = 1
num_classes = 10
image_size = (28, 28)
dtype = torch.float64
device = torch.device('cpu', 0)

size = torch.prod(torch.tensor(image_size)).item()
x = torch.rand(batch, *image_size, dtype=dtype, device=device)
model = nn.Sequential(
    nm.utils.flatten,
    nn.Linear(size, num_classes),
)
model.type(dtype)
if device.type != 'cpu':
    model.cuda(device.index)

jac, bias = nm.utils.linearize(model, x)
A = list(model.children())[1].weight
print('Tightness of A (best is zero): {}'.format(
    torch.max(torch.abs(jac - A)).item()))

b = list(model.children())[1].bias
print('Tightness of b (best is zero): {}'.format(
    torch.max(torch.abs(bias - b)).item()))

Tightness of A (best is zero): 0.0
Tightness of b (best is zero): 0.0


### Two-stage linearization

In [10]:
count = 10000
num_classes = 10
image_size = (28, 28)
dtype = torch.float64
device = torch.device('cpu', 0)
gnm = nm.gaussian.affine_relu_affine

size = torch.prod(torch.tensor(image_size)).item()
x = torch.rand(1, *image_size, dtype=dtype, device=device)

# deep model
first_part = nn.Sequential(
    nm.utils.flatten,
    nn.Linear(size, 500),
    nn.ReLU(),
    nn.Linear(500, 500),
    nn.ReLU(),
    nn.Linear(500, 300),
)
first_part.type(dtype)
relu = nn.Sequential(
    nn.ReLU(),
)
relu.type(dtype)
second_part = nn.Sequential(
    nn.Linear(300, 100),
    nn.ReLU(),
    nn.Linear(100, num_classes),
)
second_part.type(dtype)
if device.type != 'cpu':
    first_part.cuda(device.index)
    relu.cuda(device.index)
    second_part.cuda(device.index)
def model(x):
    return second_part(relu(first_part(x)))

# variables
A, c1 = nm.utils.linearize(first_part, x)
B, c2 = nm.utils.linearize(second_part, relu(first_part(x)).detach())
x.requires_grad_(False)
A.squeeze_()
c1.squeeze_()
B.squeeze_()
c2.squeeze_()

# analytical output mean and variance
mean = x.view(-1)
covariance = rand.definite(size, norm=0.1, dtype=dtype, device=device)
out_mu = gnm.mean(mean, covariance, A, c1, B, c2)
out_var = gnm.special_variance(covariance, A, B)

# Monte-Carlo estimation of the output mean and variance
normal = torch.distributions.MultivariateNormal(mean, covariance)
samples = normal.sample((count,))
out_samples = model(samples.view(-1, *image_size)).detach()
mc_mu = torch.mean(out_samples, dim=0)
mc_var = torch.var(out_samples, dim=0)

# printing the ratios
print('Monte-Carlo mean / Analytical mean:')
print((mc_mu / out_mu).cpu().numpy())
print('Monte-Carlo variance / Analytical variance:')
print((mc_var / out_var).cpu().numpy())

Monte-Carlo mean / Analytical mean:
[0.99706612 0.99764148 0.99880015 1.00019028 1.00096126 0.99965541
 0.99916396 0.99982956 0.99944559 0.99710325]
Monte-Carlo variance / Analytical variance:
[1.28257357 1.52839872 1.34012299 0.98683863 1.08856951 1.27928087
 1.26852927 1.32427244 1.1481256  1.2438608 ]
