# Robust-NTF Applied to Missing Data

Here, we generate a synthetic low-rank 3-dimensional tensor from known signals. Some of the data is removed (set to NaN) mirroring hyperspectral autofluorescence image cubes. The data is processed

## Setup

In [None]:
import torch
import numpy as np
import tensorly as tl
import matplotlib.pyplot as plt
import sys
from torch.nn.functional import normalize
from scipy import signal
from scipy.stats import gamma
from tensorly.kruskal_tensor import kruskal_to_tensor
from tensorly.decomposition.candecomp_parafac import non_negative_parafac
from tensorly.tenalg.outer_product import outer

sys.path.append("..")
from robust_ntf.robust_ntf import RntfConfig, RobustNTF, RntfStats

# Use the GPU at fp64 by default:
torch.set_default_tensor_type(torch.cuda.DoubleTensor)

# Make TensorLy use PyTorch:
tl.set_backend('pytorch')

# Set RNG seeds:
torch.manual_seed(33)
np.random.seed(33)

# Set an epsilon to protect against zeros:
eps = 1e-6

## Part 1: Generate synthetic tensor

### Generate ground truth factors:

Over here, we generate ground truth factor matrices to generate a rank-3 synthetic tensor with. They include,

* A Gaussian modulated sinusoid and take its real and imaginary parts, and its envelope to be the ground truth factors.
* Three different chirp signals.
* Three different Gamma PDFs.

In [None]:
#######################
## Mode-1 generation ##
#######################

# Sample 50 points:
mode1_support = np.linspace(-1, 1, 2*25, endpoint=False)

# Generate signal and plot:
x1, x2, x3 = signal.gausspulse(mode1_support, fc=3,
                               retquad=True, retenv=True)
x1 = 2 * np.abs(x1)
x2 = 2 * np.abs(x2)
x3 = 2 * np.abs(x3)

#######################
## Mode-2 generation ##
#######################

mode2_support = np.linspace(-1, 1, 96, endpoint=False)
y1 = signal.chirp(mode2_support, f0=4, t1=-0.5, f1=4)
y2 = signal.chirp(mode2_support, f0=2, t1=0.5, f1=3)
y3 = signal.chirp(mode2_support, f0=1, t1=0.1, f1=2)

y1 = y1 - y1.min()
y2 = y2 - y2.min()
y3 = y3 - y3.min()

#######################
## Mode-3 generation ##
#######################

mode3_support = np.linspace(0, 10, 20)

z1 = gamma(7).pdf(mode3_support)
z2 = gamma(2).pdf(mode3_support)
z3 = gamma(4).pdf(mode3_support)

### Plot ground truth factors:

In [None]:
# Set up figure size:
fig = plt.figure(figsize=(15,8))

# Plot factors:
plt.subplot(2,2,1)
plt.plot(mode1_support, x1,
         mode1_support, x2,
         mode1_support, x3)
plt.gca().set_title('Mode-1 factors')

plt.subplot(2,2,2)
plt.plot(mode2_support, y1,
         mode2_support, y2,
         mode2_support, y3)
plt.gca().set_title('Mode-2 factors')

plt.subplot(2,2,3)
plt.plot(mode3_support, z1,
         mode3_support, z2,
         mode3_support, z3)
plt.gca().set_title('Mode-3 factors')

### Cast factors to PyTorch and/or make positive:

In [None]:
# Mode-1:
X = np.array([x1, x2, x3])
X = torch.from_numpy(X).cuda() + eps

# Mode-2:
Y = np.array([y1, y2, y3])
Y = torch.from_numpy(Y).cuda() + eps

# Mode-3:
Z = np.array([z1, z2, z3])
Z = torch.from_numpy(Z).cuda() + eps

### Construct ground truth tensor to factorize:

In [None]:
# Construct Kruskal tensor in TensorLy format:
ktens = (None, [X.t(), Y.t(), Z.t()])

# Construct dense tensor:
data = kruskal_to_tensor(ktens)

In [None]:
for i in range(data.shape[-1]):
    data[-int(1.5*i)-1:, :, i+1:] = np.nan
np.isnan(data.cpu().numpy()).sum() / data.cpu().numpy().size

### Visualize some slices of the tensor in false color:

In [None]:
fig = plt.figure(figsize=(15,8))

# XY
plt.subplot(2,2,1)
XY = data[:, :, 0:3].data.cpu().numpy()
XY = XY / np.nanmax(XY)
plt.imshow(XY)

# XZ
plt.subplot(2,2,2)
XZ = data[:, 0:3, :].data.cpu().numpy()
XZ = XZ.transpose([0, 2, 1])
XZ = XZ / np.nanmax(XZ)
plt.imshow(XZ)

# ZY
plt.subplot(2,2,3)
ZY = data[0:3, :, :].data.cpu().numpy()
ZY = ZY.transpose([2, 1, 0])
ZY = ZY / np.nanmax(ZY)
plt.imshow(ZY)

## Part 2: Compare methods

Run the cells below with error tolerance 1e-2 and then 1e-3 and compare.

In [None]:
ERROR_TOLERANCE = 1e-2

cfg = RntfConfig(3, 2, 0.1, ERROR_TOLERANCE, max_iter=200000, print_every=100, save_every=100, save_folder="./out")
rntf = RobustNTF(cfg)
rntf.run(data)
rntf_01_factors = rntf.matrices
rntf_01_outlier = rntf.outlier
vals = rntf.stats

Visualize objective, error, and reconstruction accuracy statistics. At 1e-3 local minima should be visible that dip below 1e-2, which would cause early stopping at 1e-2. The regularization term also drops sharply at one point, a possible indicator of convergence.

In [None]:
fig = plt.figure(figsize=(8,15))

inds = list(range(len(vals["error"])))
x_end = np.log10(inds[-1]) - 0.02

plt.subplot(4,1,1)
obj = vals[RntfStats.OBJ].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(obj[0:]))
plt.annotate("Objective", xy=(x_end, np.log10(obj[-1])+0.1), horizontalalignment="right")
fit = vals[RntfStats.FIT].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(fit[0:]), linestyle="dashed", color="gray")
plt.annotate("Fitness (Beta Divergence)", xy=(x_end, np.log10(fit[-1])+0.1), horizontalalignment="right")
reg = vals[RntfStats.REG].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(reg[0:]), linestyle=":", color="gray")
plt.annotate("Regularization Term ($L_{2,1}$ Norm)", xy=(x_end, np.log10(reg[-1])-0.08), horizontalalignment="right", verticalalignment="top")
plt.title("Objective Function")

plt.subplot(4,1,2)
err = vals[RntfStats.ERR].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(err[0:]))
plt.title("Relative Change in Objective Function (Error)")

plt.subplot(4,1,3)
L2_acc = vals[RntfStats.L2_ACC].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(L2_acc[0:]))
plt.title("Accuracy ($L_{2}$ Norm)")

plt.subplot(4,1,4)
Linf_acc = vals[RntfStats.LINF_ACC].to_numpy()
plt.plot(np.log10(inds[0:]), np.log10(Linf_acc[0:]))
plt.title("Accuracy ($L_{inf}$ Norm)")

fig.tight_layout()

## Plot some factors:
Here, mode-3 factors for NTF and rNTF are plotted. At 1e-2 they are noisy. At 1e-3 they are very close to the input mode-3 factors.

In [None]:
# Set up figure size:
fig = plt.figure(figsize=(15,5))

plt.plot(normalize(rntf_01_factors[2], dim=0).data.cpu().numpy())
plt.gca().set_title('robust-NTF Mode-3 results')

## Visualize reconstructions: Data vs rNTF

At 1e-2 the reconstruction is nothing like the original dataset. At 1e-3 they are almost exactly identical.

In [None]:
# Reconstruct rNTF factors:
rntf_recon = torch.zeros(50,96,20)

for i in range(3):
    rntf_recon = rntf_recon + outer([rntf_01_factors[0][:,i],
                                     rntf_01_factors[1][:,i],
                                     rntf_01_factors[2][:,i]])

## Plot results:
# Set up figure size:
fig = plt.figure(figsize=(10, 15))

# Plot original data:
plt.subplot(3,1,1)
plt.imshow(data[:, 0, :].data.cpu().numpy())
plt.gca().set_title('Original data slices')
# Plot rNTF reconstruction:
plt.subplot(3,1,2)
plt.imshow(rntf_recon[:, 0, :].data.cpu().numpy())
plt.gca().set_title('Robust-NTF reconstruction slices')

## Visualize outliers

The percentage value is the maximum value of the outliers divided by the maximum value of the original data. This is a measure of the severity of the outliers. Since there is no noise in the original data, we expect a near-perfect reconstruction, and the maximum outlier value to be very small.

At 1e-2 the outliers are noisy, and contain considerable information that belongs in the reconstruction, reflected in the large percentage. At 1e-3 almost no information is in the outliers.

In [None]:
## Plot results:
# Set up figure size:
fig = plt.figure(figsize=(10, 15))

# Plot original data:
plt.subplot(3,1,1)
plt.imshow(data[:, 0, :].data.cpu().numpy())
plt.gca().set_title('Original data slices')

# Plot rNTF reconstruction:
plt.subplot(3,1,2)
plt.imshow(rntf_01_outlier[25, :, :].data.cpu().numpy())
plt.gca().set_title('Robust-NTF outlier slices')

out = np.nanmax(rntf_01_outlier[:, :, :].cpu().numpy()) / np.nanmax(data[:, :, :].cpu().numpy())
print("Outlier max / data max: {:.1%}".format(out.item()))