# 10-714 Project: Needle-DAGMM

Needle (NEcessary Elements of Deep Learning) Implementation for the Deep Autoencoding Gaussian Mixture Model (DAGMM)

## Introduction

In [1]:
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/
!mkdir -p 10714
%cd /content/drive/MyDrive/10714/project
!pip3 install pybind11

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive
/content/drive/MyDrive/10714/project
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pybind11
  Downloading pybind11-2.10.2-py3-none-any.whl (222 kB)
[K     |████████████████████████████████| 222 kB 30.7 MB/s 
[?25hInstalling collected packages: pybind11
Successfully installed pybind11-2.10.2


In [2]:
!make

-- Found pybind11: /usr/local/lib/python3.8/dist-packages/pybind11/include (found version "2.10.2")
-- Found cuda, building cuda backend
Sun Dec 25 00:27:01 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                               

In [3]:
import sys
sys.path.append('./python')

## Needle Framework

The following operators are implemented in `python/needle/ops.py`:

- `squeeze`
- `unsqueeze`
- `bmm`
- `norm`
- `cosine_similarity`
- `pairwise_distance`
- `relative_distance`
- `softmax`
- `Inverse` and `inv`
- `Det` and `det`
- `Phi` and `phi`
- `Cholesky` and `cholesky`



### Squeeze and Unsqueeze

The `needle.ops.squeeze` and `needle.ops.unsqueeze` are implemented to compute new tensors with dimensions removed or inserted at the specified position. These operator functions are verified by [`torch.squeeze`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) and [`torch.unsqueeze`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html).

In [4]:
!python3 -m pytest -l -v -k "op_squeeze"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 108 deselected                                           [0m

tests/test_ops.py::test_op_squeeze[forward-cpu-x_shape0-None] [32mPASSED[0m[36m     [  6%][0m
tests/test_ops.py::test_op_squeeze[forward-cpu-x_shape1-0] [32mPASSED[0m[36m        [ 12%][0m
tests/test_ops.py::test_op_squeeze[forward-cpu-x_shape2-1] [32mPASSED[0m[36m        [ 18%][0m
tests/test_ops.py::test_op_squeeze[forward-cpu-x_shape3-1] [32mPASSED[0m[36m        [ 25%][0m
tests/test_ops.py::test_op_squeeze[forward-cuda-x_shape0-None] [32mPASSED[0m[36m    [ 31%][0m
tests/test_ops.py::test_op_squeeze[forward-cuda-x_shape1-0] [32mPASSED[0m[36m       [ 37%][0m
tests/test_ops.py::test_op_squeeze[forward-cuda-x_shape2-1] [32mPASSED[0m[36m       [ 43%][0m
tests/test_ops.py::test_op_squeez

In [5]:
!python3 -m pytest -l -v -k "op_unsqueeze"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 104 deselected                                           [0m

tests/test_ops.py::test_op_unsqueeze[forward-cpu-x_shape0-0] [32mPASSED[0m[36m      [  5%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cpu-x_shape1-1] [32mPASSED[0m[36m      [ 10%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cpu-x_shape2-0] [32mPASSED[0m[36m      [ 15%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cpu-x_shape3-1] [32mPASSED[0m[36m      [ 20%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cpu-x_shape4-2] [32mPASSED[0m[36m      [ 25%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cuda-x_shape0-0] [32mPASSED[0m[36m     [ 30%][0m
tests/test_ops.py::test_op_unsqueeze[forward-cuda-x_shape1-1] [32mPASSED[0m[36m     [ 35%][0m
tests/test_ops.py::test_op_unsque

### Norm

The `needle.ops.norm` is implemented to compute the vector norm of a tensor. This operator function is verified by [`torch.norm`](https://pytorch.org/docs/stable/generated/torch.norm.html).

In [6]:
!python3 -m pytest -l -v -k "op_norm"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 112 deselected                                           [0m

tests/test_ops.py::test_op_norm[forward-cpu-x_shape0-0] [32mPASSED[0m[36m           [  8%][0m
tests/test_ops.py::test_op_norm[forward-cpu-x_shape1-0] [32mPASSED[0m[36m           [ 16%][0m
tests/test_ops.py::test_op_norm[forward-cpu-x_shape2-1] [32mPASSED[0m[36m           [ 25%][0m
tests/test_ops.py::test_op_norm[forward-cuda-x_shape0-0] [32mPASSED[0m[36m          [ 33%][0m
tests/test_ops.py::test_op_norm[forward-cuda-x_shape1-0] [32mPASSED[0m[36m          [ 41%][0m
tests/test_ops.py::test_op_norm[forward-cuda-x_shape2-1] [32mPASSED[0m[36m          [ 50%][0m
tests/test_ops.py::test_op_norm[backward-cpu-x_shape0-0] [32mPASSED[0m[36m          [ 58%][0m
tests/test_ops.py::test_op_norm[b

### Cosine Similarity

The `needle.ops.cosine_similarity` is implemented to compute the cosine similarity of two vectors:
$$ \frac{\mathbf{x} \cdot \mathbf{x}'}{\lVert \mathbf{x} \rVert_2 \lVert \mathbf{x}' \rVert_2} . $$
This operator function is verified by [`torch.nn.functional.cosine_similarity`](https://pytorch.org/docs/stable/generated/torch.nn.functional.cosine_similarity.html).

In [7]:
!python3 -m pytest -l -v -k "op_cosine_similarity"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 112 deselected                                           [0m

tests/test_ops.py::test_op_cosine_similarity[forward-cpu-x_shape0-0] [32mPASSED[0m[36m [  8%][0m
tests/test_ops.py::test_op_cosine_similarity[forward-cpu-x_shape1-0] [32mPASSED[0m[36m [ 16%][0m
tests/test_ops.py::test_op_cosine_similarity[forward-cpu-x_shape2-1] [32mPASSED[0m[36m [ 25%][0m
tests/test_ops.py::test_op_cosine_similarity[forward-cuda-x_shape0-0] [32mPASSED[0m[36m [ 33%][0m
tests/test_ops.py::test_op_cosine_similarity[forward-cuda-x_shape1-0] [32mPASSED[0m[36m [ 41%][0m
tests/test_ops.py::test_op_cosine_similarity[forward-cuda-x_shape2-1] [32mPASSED[0m[36m [ 50%][0m
tests/test_ops.py::test_op_cosine_similarity[backward-cpu-x_shape0-0] [32mPASSED[0m[36m [ 58%][0m
tests/te

### Pairwise Distance

The `needle.ops.pairwise_distance` is implemented to compute the pairwise distance between input vectors. This operator function is verified by [`torch.nn.functional.pairwise_distance`](https://pytorch.org/docs/stable/generated/torch.nn.functional.pairwise_distance.html).

In [8]:
!python3 -m pytest -l -v -k "op_pairwise_distance"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 116 deselected                                           [0m

tests/test_ops.py::test_op_pairwise_distance[forward-cpu-x_shape0-0] [32mPASSED[0m[36m [ 12%][0m
tests/test_ops.py::test_op_pairwise_distance[forward-cpu-x_shape1-1] [32mPASSED[0m[36m [ 25%][0m
tests/test_ops.py::test_op_pairwise_distance[forward-cuda-x_shape0-0] [32mPASSED[0m[36m [ 37%][0m
tests/test_ops.py::test_op_pairwise_distance[forward-cuda-x_shape1-1] [32mPASSED[0m[36m [ 50%][0m
tests/test_ops.py::test_op_pairwise_distance[backward-cpu-x_shape0-0] [32mPASSED[0m[36m [ 62%][0m
tests/test_ops.py::test_op_pairwise_distance[backward-cpu-x_shape1-1] [32mPASSED[0m[36m [ 75%][0m
tests/test_ops.py::test_op_pairwise_distance[backward-cuda-x_shape0-0] [32mPASSED[0m[36m [ 87%][0m
tests/

### Softmax

The `needle.ops.softmax` is implemented to apply the softmax function. This operator function is verified by [`torch.nn.functional.softmax`](https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html).

In [9]:
!python3 -m pytest -l -v -k "op_softmax"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 112 deselected                                           [0m

tests/test_ops.py::test_op_softmax[forward-cpu-x_shape0-0] [32mPASSED[0m[36m        [  8%][0m
tests/test_ops.py::test_op_softmax[forward-cpu-x_shape1-1] [32mPASSED[0m[36m        [ 16%][0m
tests/test_ops.py::test_op_softmax[forward-cpu-x_shape2-1] [32mPASSED[0m[36m        [ 25%][0m
tests/test_ops.py::test_op_softmax[forward-cuda-x_shape0-0] [32mPASSED[0m[36m       [ 33%][0m
tests/test_ops.py::test_op_softmax[forward-cuda-x_shape1-1] [32mPASSED[0m[36m       [ 41%][0m
tests/test_ops.py::test_op_softmax[forward-cuda-x_shape2-1] [32mPASSED[0m[36m       [ 50%][0m
tests/test_ops.py::test_op_softmax[backward-cpu-x_shape0-0] [32mPASSED[0m[36m       [ 58%][0m
tests/test_ops.py::test_op_softma

### Batch Matrix Multiplication

The `needle.ops.bmm` is implemented to compute a batch matrix-matrix product of matrices. This operator function is verified by [`torch.bmm`](https://pytorch.org/docs/stable/generated/torch.bmm.html).

In [10]:
!python3 -m pytest -l -v -k "op_bmm"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 116 deselected                                           [0m

tests/test_ops.py::test_op_bmm[forward-cpu-A_shape0-B_shape0] [32mPASSED[0m[36m     [ 12%][0m
tests/test_ops.py::test_op_bmm[forward-cpu-A_shape1-B_shape1] [32mPASSED[0m[36m     [ 25%][0m
tests/test_ops.py::test_op_bmm[forward-cuda-A_shape0-B_shape0] [32mPASSED[0m[36m    [ 37%][0m
tests/test_ops.py::test_op_bmm[forward-cuda-A_shape1-B_shape1] [32mPASSED[0m[36m    [ 50%][0m
tests/test_ops.py::test_op_bmm[backward-cpu-A_shape0-B_shape0] [32mPASSED[0m[36m    [ 62%][0m
tests/test_ops.py::test_op_bmm[backward-cpu-A_shape1-B_shape1] [32mPASSED[0m[36m    [ 75%][0m
tests/test_ops.py::test_op_bmm[backward-cuda-A_shape0-B_shape0] [32mPASSED[0m[36m   [ 87%][0m
tests/test_ops.py::test_op_bmm[ba

### Matrix Inverse

The `needle.ops.Inverse` and `needle.ops.inv` are implemented to compute the inverse of a square matrix. The reverse mode can be calculatedc by $ \bar{A} = - C^{T} \bar{C} C^{T} $, where $ C = A^{-1} $. This operator function is verified by [`torch.linalg.inv`](https://pytorch.org/docs/stable/generated/torch.linalg.inv.html).

In [11]:
!python3 -m pytest -l -v -k "op_inv"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 116 deselected                                           [0m

tests/test_ops.py::test_op_inv[forward-cpu-A_shape0] [32mPASSED[0m[36m              [ 12%][0m
tests/test_ops.py::test_op_inv[forward-cpu-A_shape1] [32mPASSED[0m[36m              [ 25%][0m
tests/test_ops.py::test_op_inv[forward-cuda-A_shape0] [32mPASSED[0m[36m             [ 37%][0m
tests/test_ops.py::test_op_inv[forward-cuda-A_shape1] [32mPASSED[0m[36m             [ 50%][0m
tests/test_ops.py::test_op_inv[backward-cpu-A_shape0] [32mPASSED[0m[36m             [ 62%][0m
tests/test_ops.py::test_op_inv[backward-cpu-A_shape1] [32mPASSED[0m[36m             [ 75%][0m
tests/test_ops.py::test_op_inv[backward-cuda-A_shape0] [32mPASSED[0m[36m            [ 87%][0m
tests/test_ops.py::test_op_inv[ba

### Matrix Determinant

The `needle.ops.Det` and `needle.ops.det` are implemented to compute the determinant of a square matrix. The reverse mode can be calculatedc by $ \bar{A} = \bar{C} C A^{-T} $, where $ C = \text{det} A $. This operator function is verified by [`torch.linalg.det`](https://pytorch.org/docs/stable/generated/torch.linalg.det.html).

In [12]:
!python3 -m pytest -l -v -k "op_det"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 116 deselected                                           [0m

tests/test_ops.py::test_op_det[forward-cpu-A_shape0] [32mPASSED[0m[36m              [ 12%][0m
tests/test_ops.py::test_op_det[forward-cpu-A_shape1] [32mPASSED[0m[36m              [ 25%][0m
tests/test_ops.py::test_op_det[forward-cuda-A_shape0] [32mPASSED[0m[36m             [ 37%][0m
tests/test_ops.py::test_op_det[forward-cuda-A_shape1] [32mPASSED[0m[36m             [ 50%][0m
tests/test_ops.py::test_op_det[backward-cpu-A_shape0] [32mPASSED[0m[36m             [ 62%][0m
tests/test_ops.py::test_op_det[backward-cpu-A_shape1] [32mPASSED[0m[36m             [ 75%][0m
tests/test_ops.py::test_op_det[backward-cuda-A_shape0] [32mPASSED[0m[36m            [ 87%][0m
tests/test_ops.py::test_op_det[ba

### Matrix Diagonal

The `needle.ops.Diagonal` and `needle.ops.diagonal` are implemented to get the diagonal of a square matrix. This operator function is verified by [`torch.diag`](https://pytorch.org/docs/stable/generated/torch.diag.html).

In [13]:
!python3 -m pytest -l -v -k "op_diagonal"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 120 deselected                                           [0m

tests/test_ops.py::test_op_diagonal[forward-cpu-A_shape0] [32mPASSED[0m[36m         [ 25%][0m
tests/test_ops.py::test_op_diagonal[forward-cuda-A_shape0] [32mPASSED[0m[36m        [ 50%][0m
tests/test_ops.py::test_op_diagonal[backward-cpu-A_shape0] [32mPASSED[0m[36m        [ 75%][0m
tests/test_ops.py::test_op_diagonal[backward-cuda-A_shape0] [32mPASSED[0m[36m       [100%][0m



### Cholesky Decomposition

The Cholesky decomposition a symmetric positive definite matrix $ \Sigma $ is to compute the unique lower-triangular matrix $ L $ with positive diagonal elements such that $ \Sigma = L L^{T} $. The reverse mode of the Cholesky decomposition can be obtained by $ \text{tril}{\bar{\Sigma}} = \Phi \left( L^{-T} \left( P + P^{T} \right) L^{-1} \right) $, where $ P = \Phi \left(L^{T} \bar{L} \right) $ and $ \Phi $ takes the lower-triangular part of a matrix and halves its diagonal:
$$
\Phi_{ij} (A) = \begin{cases}
A_{ij} & i > j \\
A_{ii} / 2 & i = j \\
0 & i < j
\end{cases} .
$$
The `needle.ops.Cholesky` and `needle.ops.cholesky` are implemented with auxiliary functions `needle.ops.Phi` and `needle.ops.phi`. This operator function is verified by [`torch.cholesky`](https://pytorch.org/docs/stable/generated/torch.cholesky.html).

In [14]:
!python3 -m pytest -l -v -k "op_cholesky"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 116 deselected                                           [0m

tests/test_ops.py::test_op_cholesky[forward-cpu-A_shape0] [32mPASSED[0m[36m         [ 12%][0m
tests/test_ops.py::test_op_cholesky[forward-cpu-A_shape1] [32mPASSED[0m[36m         [ 25%][0m
tests/test_ops.py::test_op_cholesky[forward-cuda-A_shape0] [32mPASSED[0m[36m        [ 37%][0m
tests/test_ops.py::test_op_cholesky[forward-cuda-A_shape1] [32mPASSED[0m[36m        [ 50%][0m
tests/test_ops.py::test_op_cholesky[backward-cpu-A_shape0] [32mPASSED[0m[36m        [ 62%][0m
tests/test_ops.py::test_op_cholesky[backward-cpu-A_shape1] [32mPASSED[0m[36m        [ 75%][0m
tests/test_ops.py::test_op_cholesky[backward-cuda-A_shape0] [32mPASSED[0m[36m       [ 87%][0m
tests/test_ops.py::test_op_choles

## Deep Autoencoding Gaussian Mixture Model

### Model Overview

Deep Autoencoding Gaussian Mixture Model (DAGMM):
* A compression network, and 
* An estimation network.

### Compression Network

Encoder: $ \mathbf{z}_c = h \left( \mathbf{x}; \theta_e \right) $, where $ \theta_e $ is the parameter of the encoder.

Decoder: $ \mathbf{x}' = g \left( \mathbf{z}_c; \theta_d \right) $, where $ \theta_d $ is the parameter of the decoder.

Reconstruction features: $ \mathbf{z}_r = f(\mathbf{x}, \mathbf{x}') $, where the relative Euclidean distance $ \frac{\lVert \mathbf{x} - \mathbf{x}' \rVert_2}{\lVert \mathbf{x} \rVert_2} $ and cosine similarity $ \frac{\mathbf{x} \cdot \mathbf{x}'}{\lVert \mathbf{x} \rVert_2 \lVert \mathbf{x}' \rVert_2} $ are used. 

Low-dimensional representation: $ \mathbf{z} = \left[ \mathbf{z}_c, \mathbf{z}_r \right] $.


### Estimation Network

Estimation network: $ \mathbf{p} = \text{MLN} \left( \mathbf{z}; \theta_m \right) $, where $ \theta_m $ is the parameter of the estimation network.

Soft mixture component membership prediction: $ \hat{\gamma} = \text{softmax} (\mathbf{p}) $.

Gaussian Mixture Model (GMM) parameters:
* Mixture probability: $$ \hat{\phi}_k = \sum_{i=1}^{N} \frac{\hat{\gamma}_{ik}}{N} .$$
* Mean: $$ \hat{\mu}_k = \frac{\sum_{i=1}^{N} \hat{\gamma}_{ik} \mathbf{z}_i}{\sum_{i=1}^{N} \hat{\gamma}_{ik}} . $$
* Covariance: $$ \hat{\mathbf{\Sigma}}_k = \frac{\sum_{i=1}^{N} \hat{\gamma}_{ik} \left( \mathbf{z}_i - \hat{\mu}_k \right) \left( \mathbf{z}_i - \hat{\mu}_k \right)^{T}}{\sum_{i=1}^{N} \hat{\gamma}_{ik}} . $$

Sample energy:
$$
E(\mathbf{z}) = - \log \left( \sum_{k=1}^{K} \hat{\phi}_k \frac{\exp \left( -\frac{1}{2} \left( \mathbf{z} - \hat{\mu}_k \right)^{T} \hat{\mathbf{\Sigma}}_k^{-1} \left( \mathbf{z} - \hat{\mu}_k \right)^{T} \right)}{\sqrt{\left| 2 \pi \hat{\mathbf{\Sigma}}_k \right|}} \right)
$$

In [15]:
!python3 -m pytest -l -v -k "gmm_parameters"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 122 deselected                                           [0m

tests/test_dagmm.py::test_gmm_parameters[cpu-16-120-3-4] [32mPASSED[0m[36m          [ 50%][0m
tests/test_dagmm.py::test_gmm_parameters[cuda-16-120-3-4] [32mPASSED[0m[36m         [100%][0m



In [16]:
!python3 -m pytest -l -v -k "sample_energy"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 122 deselected                                           [0m

tests/test_dagmm.py::test_sample_energy[cpu-16-120-3-4] [32mPASSED[0m[36m           [ 50%][0m
tests/test_dagmm.py::test_sample_energy[cuda-16-120-3-4] [32mPASSED[0m[36m          [100%][0m



### Objective Function

Objective function:
$$
J(\theta_e, \theta_d, \theta_m) = \frac{1}{N} \sum_{i=1}{N} L \left( \mathbf{x}_i, \mathbf{x}'_i \right) + \frac{\lambda_1}{N} \sum_{i=1}{N} E \left( \mathbf{z}_i \right) + \lambda_2 P \left( \hat{\mathbf{\Sigma}} \right) ,
$$
where $ L \left( \mathbf{x}_i, \mathbf{x}'_i \right) $ is the reconstruction error, $ E \left( \mathbf{z}_i \right) $ is the sample energy, $ P \left( \hat{\mathbf{\Sigma}} \right) $ is the penalty loss for the diagonal entries of the covariance  matrix: $ P \left( \hat{\mathbf{\Sigma}} \right) = \sum_{k=1}^{K} \sum_{j=1}^{d} \frac{1}{\hat{\mathbf{\Sigma}}_{kjj}} $, and the meta parameters are $ \lambda_1 = 0.1 $ and $ \lambda_2 = 0.005 $.

In [17]:
!python3 -m pytest -l -v -k "model_dagmm"

platform linux -- Python 3.8.16, pytest-3.6.4, py-1.11.0, pluggy-0.7.1 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/10714/project, inifile:
plugins: typeguard-2.7.1
collected 124 items / 122 deselected                                           [0m

tests/test_dagmm.py::test_model_dagmm[cpu-16-120-3-4] [32mPASSED[0m[36m             [ 50%][0m
tests/test_dagmm.py::test_model_dagmm[cuda-16-120-3-4] [32mPASSED[0m[36m            [100%][0m



## Experiment

### KDD CUP Dataset

In [18]:
import pandas as pd
from sklearn.datasets import fetch_kddcup99
data = fetch_kddcup99(shuffle=False, percent10=True, as_frame=True)
df_kddcup = data.frame.convert_dtypes()

In [19]:
df_kddcup

Unnamed: 0,duration,protocol_type,service,flag,src_bytes,dst_bytes,land,wrong_fragment,urgent,hot,...,dst_host_srv_count,dst_host_same_srv_rate,dst_host_diff_srv_rate,dst_host_same_src_port_rate,dst_host_srv_diff_host_rate,dst_host_serror_rate,dst_host_srv_serror_rate,dst_host_rerror_rate,dst_host_srv_rerror_rate,labels
0,0,b'tcp',b'http',b'SF',181,5450,0,0,0,0,...,9,1.0,0.0,0.11,0.0,0.0,0.0,0.0,0.0,b'normal.'
1,0,b'tcp',b'http',b'SF',239,486,0,0,0,0,...,19,1.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,b'normal.'
2,0,b'tcp',b'http',b'SF',235,1337,0,0,0,0,...,29,1.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,b'normal.'
3,0,b'tcp',b'http',b'SF',219,1337,0,0,0,0,...,39,1.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,b'normal.'
4,0,b'tcp',b'http',b'SF',217,2032,0,0,0,0,...,49,1.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,b'normal.'
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
494016,0,b'tcp',b'http',b'SF',310,1881,0,0,0,0,...,255,1.0,0.0,0.01,0.05,0.0,0.01,0.0,0.0,b'normal.'
494017,0,b'tcp',b'http',b'SF',282,2286,0,0,0,0,...,255,1.0,0.0,0.17,0.05,0.0,0.01,0.0,0.0,b'normal.'
494018,0,b'tcp',b'http',b'SF',203,1200,0,0,0,0,...,255,1.0,0.0,0.06,0.05,0.06,0.01,0.0,0.0,b'normal.'
494019,0,b'tcp',b'http',b'SF',291,1200,0,0,0,0,...,255,1.0,0.0,0.04,0.05,0.04,0.01,0.0,0.0,b'normal.'


In [20]:
df_kddcup.dtypes

duration                         Int64
protocol_type                   object
service                         object
flag                            object
src_bytes                        Int64
dst_bytes                        Int64
land                             Int64
wrong_fragment                   Int64
urgent                           Int64
hot                              Int64
num_failed_logins                Int64
logged_in                        Int64
num_compromised                  Int64
root_shell                       Int64
su_attempted                     Int64
num_root                         Int64
num_file_creations               Int64
num_shells                       Int64
num_access_files                 Int64
num_outbound_cmds                Int64
is_host_login                    Int64
is_guest_login                   Int64
count                            Int64
srv_count                        Int64
serror_rate                    Float64
srv_serror_rate          

### Model Training and Testing

### Experiment Results

## Conclusion

## References

* Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection: [paper](https://openreview.net/forum?id=BJJLHbb0-).
* Differentiation of the Cholesky decomposition: [paper](https://arxiv.org/abs/1602.07527).
* A PyTorch implementation for the Deep Autoencoding Gaussian Mixture Model: [GitHub repository](https://github.com/mperezcarrasco/PyTorch-DAGMM).
* Another PyTorch implementation for the Deep Autoencoding Gaussian Mixture Model: [GitHub repository](https://github.com/lixiangwang/DAGMM-pytorch).
* Differentiating the Cholesky decomposition: [GitHub repository](https://github.com/imurray/chol-rev).
* An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation: [report](https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf).
* PyTorch: [documentation](https://pytorch.org/docs/stable/index.html).
* KDD Cup 1999 Data: [dataset](https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data).
* Loading the KDD Cup 1999 dataset: [API](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_kddcup99.html).
