Merge pull request #428 from yzhao062/development

v1.0.4
yzhao062 · Jul 29, 2022 · 0027221 · 0027221
2 parents c2839e0 + fe5eb15
commit 0027221
Show file tree

Hide file tree

Showing 22 changed files with 1,660 additions and 83 deletions.
diff --git a/CHANGES.txt b/CHANGES.txt
@@ -164,4 +164,7 @@ v<1.0.2>, <06/21/2022> -- Add GMM detector (#402).
 v<1.0.2>, <06/23/2022> -- Add ADBench Benchmark.
 v<1.0.3>, <06/27/2022> -- Change default generation to new behaviors (#409).
 v<1.0.3>, <07/04/2022> -- Add AnoGAN (#412).
+v<1.0.4>, <07/29/2022> -- General improvement of code quality and test coverage.
+v<1.0.4>, <07/29/2022> -- Add LUNAR (#413).
+v<1.0.4>, <07/29/2022> -- Add LUNAR (#415).
 
diff --git a/README.rst b/README.rst
@@ -233,7 +233,7 @@ Key Attributes of a fitted model:
 ADBench Benchmark
 ^^^^^^^^^^^^^^^^^
 
-We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
+We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_ [#Han2022ADBench]_.
 The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
 
 The organization of **ADBench** is provided below:
@@ -353,6 +353,8 @@ Neural Networks      SO_GAAL             Single-Objective Generative Adversarial
 Neural Networks      MO_GAAL             Multiple-Objective Generative Adversarial Active Learning                                               2019   [#Liu2019Generative]_
 Neural Networks      DeepSVDD            Deep One-Class Classification                                                                           2018   [#Ruff2018Deep]_
 Neural Networks      AnoGAN              Anomaly Detection with Generative Adversarial Networks                                                  2017   [#Schlegl2017Unsupervised]_
+Graph-based          R-Graph             Outlier detection by R-graph                                                                            2017   [#You2017Provable]_
+Graph-based          LUNAR               LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   [#Goodge2022Lunar]_
 ===================  ==================  ======================================================================================================  =====  ========================================
 
 
@@ -579,8 +581,12 @@ Reference
 
 .. [#Goldstein2012Histogram] Goldstein, M. and Dengel, A., 2012. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. In *KI-2012: Poster and Demo Track*\ , pp.59-63.
 
+.. [#Goodge2022Lunar] Goodge, A., Hooi, B., Ng, S.K. and Ng, W.S., 2022, June. Lunar: Unifying local outlier detection methods via graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence.
+
 .. [#Gopalan2019PIDForest] Gopalan, P., Sharan, V. and Wieder, U., 2019. PIDForest: Anomaly Detection via Partial Identification. In Advances in Neural Information Processing Systems, pp. 15783-15793.
 
+.. [#Han2022ADBench] Han, S., Hu, X., Huang, H., Jiang, M. and Zhao, Y., 2022. ADBench: Anomaly Detection Benchmark. arXiv preprint arXiv:2206.09426.
+
 .. [#Hardin2004Outlier] Hardin, J. and Rocke, D.M., 2004. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. *Computational Statistics & Data Analysis*\ , 44(4), pp.625-638.
 
 .. [#He2003Discovering] He, Z., Xu, X. and Deng, S., 2003. Discovering cluster-based local outliers. *Pattern Recognition Letters*\ , 24(9-10), pp.1641-1650.
@@ -633,6 +639,8 @@ Reference
 
 .. [#Wang2020adVAE] Wang, X., Du, Y., Lin, S., Cui, P., Shen, Y. and Yang, Y., 2019. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. *Knowledge-Based Systems*.
 
+.. [#You2017Provable] You, C., Robinson, D.P. and Vidal, R., 2017. Provable self-representation based outlier detection in a union of subspaces. In Proceedings of the IEEE conference on computer vision and pattern recognition.
+
 .. [#Zhao2018XGBOD] Zhao, Y. and Hryniewicki, M.K. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. *IEEE International Joint Conference on Neural Networks*\ , 2018.
 
 .. [#Zhao2019LSCP] Zhao, Y., Nasrullah, Z., Hryniewicki, M.K. and Li, Z., 2019, May. LSCP: Locally selective combination in parallel outlier ensembles. In *Proceedings of the 2019 SIAM International Conference on Data Mining (SDM)*, pp. 585-593. Society for Industrial and Applied Mathematics.

diff --git a/docs/benchmark.rst b/docs/benchmark.rst
@@ -4,7 +4,7 @@ Benchmarks
 Latest ADBench (2022)
 ---------------------
 
-We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_.
+We just released a 36-page, the most comprehensive `anomaly detection benchmark paper <https://www.andrew.cmu.edu/user/yuezhao2/papers/22-preprint-adbench.pdf>`_ :cite:`a-han2022adbench`.
 The fully `open-sourced ADBench <https://github.com/Minqi824/ADBench>`_ compares 30 anomaly detection algorithms on 55 benchmark datasets.
 
 The organization of **ADBench** is provided below:

diff --git a/docs/index.rst b/docs/index.rst
@@ -200,6 +200,8 @@ Neural Networks      SO_GAAL           Single-Objective Generative Adversarial A
 Neural Networks      MO_GAAL           Multiple-Objective Generative Adversarial Active Learning                                               2019   :class:`pyod.models.mo_gaal.MO_GAAL`                 :cite:`a-liu2019generative`
 Neural Networks      DeepSVDD          Deep One-Class Classification                                                                           2018   :class:`pyod.models.deep_svdd.DeepSVDD`              :cite:`a-ruff2018deepsvdd`
 Neural Networks      AnoGAN            Anomaly Detection with Generative Adversarial Networks                                                  2017   :class:`pyod.models.anogan.AnoGAN`                   :cite:`a-schlegl2017unsupervised`
+Graph-based          R-Graph           Outlier detection by R-graph                                                                            2017   :class:`pyod.models.rgraph.RGraph`                   :cite:`you2017provable`
+Graph-based          LUNAR             LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks                               2022   :class:`pyod.models.lunar.LUNAR`                     :cite:`a-goodge2022lunar`
 ===================  ================  ======================================================================================================  =====  ===================================================  ======================================================
 
 

diff --git a/docs/pyod.models.rst b/docs/pyod.models.rst
@@ -34,7 +34,7 @@ pyod.models.auto\_encoder\_torch module
 
 .. automodule:: pyod.models.auto_encoder_torch
     :members:
-    :undoc-members:
+    :exclude-members: inner_autoencoder
     :show-inheritance:
     :inherited-members:
 
@@ -209,6 +209,15 @@ pyod.models.loci module
     :show-inheritance:
     :inherited-members:
 
+pyod.models.lunar module
+------------------------
+
+.. automodule:: pyod.models.lunar
+    :members:
+    :exclude-members: SCORE_MODEL, WEIGHT_MODEL
+    :show-inheritance:
+    :inherited-members:
+
 pyod.models.lscp module
 -----------------------
 
@@ -267,6 +276,14 @@ pyod.models.pca module
     :show-inheritance:
     :inherited-members:
 
+pyod.models.rgraph module
+-------------------------
+
+.. automodule:: pyod.models.rgraph
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :inherited-members:
 
 pyod.models.rod module
 ----------------------

diff --git a/docs/zreferences.bib b/docs/zreferences.bib
@@ -433,4 +433,29 @@ @inproceedings{schlegl2017unsupervised
   pages={146--157},
   year={2017},
   organization={Springer}
+}
+
+@inproceedings{goodge2022lunar,
+  title={Lunar: Unifying local outlier detection methods via graph neural networks},
+  author={Goodge, Adam and Hooi, Bryan and Ng, See-Kiong and Ng, Wee Siong},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={36},
+  number={6},
+  pages={6737--6745},
+  year={2022}
+}
+
+@article{han2022adbench,
+  title={ADBench: Anomaly Detection Benchmark},
+  author={Han, Songqiao and Hu, Xiyang and Huang, Hailiang and Jiang, Mingqi and Zhao, Yue},
+  journal={arXiv preprint arXiv:2206.09426},
+  year={2022}
+}
+
+@inproceedings{you2017provable,
+  title={Provable self-representation based outlier detection in a union of subspaces},
+  author={You, Chong and Robinson, Daniel P and Vidal, Ren{\'e}},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={3395--3404},
+  year={2017}
 }
diff --git a/examples/ALL.png b/examples/ALL.png
diff --git a/examples/compare_all_models.py b/examples/compare_all_models.py
@@ -37,6 +37,10 @@
 from pyod.models.ocsvm import OCSVM
 from pyod.models.pca import PCA
 from pyod.models.lscp import LSCP
+from pyod.models.inne import INNE
+from pyod.models.gmm import GMM
+from pyod.models.kde import KDE
+from pyod.models.lmdd import LMDD
 
 # TODO: add neural networks, LOCI, SOS, COF, SOD
 
@@ -87,26 +91,20 @@
         contamination=outliers_fraction),
     'Average KNN': KNN(method='mean',
                        contamination=outliers_fraction),
-    # 'Median KNN': KNN(method='median',
-    #                   contamination=outliers_fraction),
     'Local Outlier Factor (LOF)':
         LOF(n_neighbors=35, contamination=outliers_fraction),
-    # 'Local Correlation Integral (LOCI)':
-    #     LOCI(contamination=outliers_fraction),
     'Minimum Covariance Determinant (MCD)': MCD(
         contamination=outliers_fraction, random_state=random_state),
     'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction),
     'Principal Component Analysis (PCA)': PCA(
         contamination=outliers_fraction, random_state=random_state),
-    # 'Stochastic Outlier Selection (SOS)': SOS(
-    #     contamination=outliers_fraction),
     'Locally Selective Combination (LSCP)': LSCP(
         detector_list, contamination=outliers_fraction,
         random_state=random_state),
-    # 'Connectivity-Based Outlier Factor (COF)':
-    #     COF(n_neighbors=35, contamination=outliers_fraction),
-    # 'Subspace Outlier Detection (SOD)':
-    #     SOD(contamination=outliers_fraction),
+    'INNE': INNE(contamination=outliers_fraction),
+    'GMM': GMM(contamination=outliers_fraction),
+    'KDE': KDE(contamination=outliers_fraction),
+    'LMDD': LMDD(contamination=outliers_fraction),
 }
 
 # Show all detectors
@@ -125,7 +123,7 @@
     X = np.r_[X, np.random.uniform(low=-6, high=6, size=(n_outliers, 2))]
 
     # Fit the model
-    plt.figure(figsize=(15, 12))
+    plt.figure(figsize=(15, 16))
     for i, (clf_name, clf) in enumerate(classifiers.items()):
         print()
         print(i + 1, 'fitting', clf_name)
@@ -139,11 +137,11 @@
 
         Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) * -1
         Z = Z.reshape(xx.shape)
-        subplot = plt.subplot(3, 4, i + 1)
+        subplot = plt.subplot(4, 4, i + 1)
         subplot.contourf(xx, yy, Z, levels=np.linspace(Z.min(), threshold, 7),
                          cmap=plt.cm.Blues_r)
-        a = subplot.contour(xx, yy, Z, levels=[threshold],
-                            linewidths=2, colors='red')
+        # a = subplot.contour(xx, yy, Z, levels=[threshold],
+        #                     linewidths=2, colors='red')
         subplot.contourf(xx, yy, Z, levels=[threshold, Z.max()],
                          colors='orange')
         b = subplot.scatter(X[:-n_outliers, 0], X[:-n_outliers, 1], c='white',
@@ -152,8 +150,12 @@
                             s=20, edgecolor='k')
         subplot.axis('tight')
         subplot.legend(
-            [a.collections[0], b, c],
-            ['learned decision function', 'true inliers', 'true outliers'],
+            [
+                # a.collections[0],
+                b, c],
+            [
+                # 'learned decision function', 
+                'true inliers', 'true outliers'],
             prop=matplotlib.font_manager.FontProperties(size=10),
             loc='lower right')
         subplot.set_xlabel("%d. %s (errors: %d)" % (i + 1, clf_name, n_errors))

diff --git a/examples/lscp_example.py b/examples/lscp_example.py
@@ -17,7 +17,6 @@
 
 from pyod.models.lscp import LSCP
 from pyod.models.lof import LOF
-from pyod.utils.utility import standardizer
 from pyod.utils.data import generate_data
 from pyod.utils.data import evaluate_print
 from pyod.utils.example import visualize

diff --git a/examples/lunar_example.py b/examples/lunar_example.py
@@ -0,0 +1,52 @@
+# -*- coding: utf-8 -*-
+"""Example of using LUNAR for outlier detection
+"""
+# Author: Adam Goodge <a.goodge@u.nus.edu>
+#
+
+from __future__ import division
+from __future__ import print_function
+
+import os
+import sys
+
+# temporary solution for relative imports in case pyod is not installed
+# if pyod is installed, no need to use the following line
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
+
+from pyod.models.lunar import LUNAR
+from pyod.utils.data import generate_data
+from pyod.utils.data import evaluate_print
+
+if __name__ == "__main__":
+    contamination = 0.1  # percentage of outliers
+    n_train = 5000  # number of training points
+    n_test = 1000  # number of testing points
+    n_features = 100  # number of features
+
+    # Generate sample data
+    X_train, X_test, y_train, y_test = \
+        generate_data(n_train=n_train,
+                      n_test=n_test,
+                      n_features=n_features,
+                      contamination=contamination,
+                      random_state=42)
+
+    # train LUNAR detector
+    clf_name = 'LUNAR'
+    clf = LUNAR()
+    clf.fit(X_train)
+
+    # get the prediction labels and outlier scores of the training data
+    y_train_pred = clf.labels_  # binary labels (0: inliers, 1: outliers)
+    y_train_scores = clf.decision_scores_  # raw outlier scores
+
+    # get the prediction on the test data
+    y_test_pred = clf.predict(X_test)  # outlier labels (0 or 1)
+    y_test_scores = clf.decision_function(X_test)  # outlier scores
+
+    # evaluate and print the results
+    print("\nOn Training Data:")
+    evaluate_print(clf_name, y_train, y_train_scores)
+    print("\nOn Test Data:")
+    evaluate_print(clf_name, y_test, y_test_scores)