Skip to content

Commit

Permalink
Merge pull request #407 from yzhao062/development
Browse files Browse the repository at this point in the history
V1.0.1
  • Loading branch information
yzhao062 committed May 13, 2022
2 parents 7695979 + 940721d commit bb3a14e
Show file tree
Hide file tree
Showing 16 changed files with 499 additions and 20 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ pyod.egg-info/
.cache/
.pytest_cache
__pycache__
.idea/
.vscode/
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -158,4 +158,6 @@ v<1.0.0>, <04/04/2022> -- Add KDE detector (#382).
v<1.0.0>, <04/06/2022> -- Disable the bias term in DeepSVDD (#385).
v<1.0.0>, <04/21/2022> -- Fix a set of issues of autoencoders (#313, #390, #391).
v<1.0.0>, <04/23/2022> -- Add sampling based detector (#384).
v<1.0.1>, <04/27/2022> -- Add INNE (#396).
v<1.0.1>, <05/13/2022> -- Urgent fix for iForest (#406).

12 changes: 8 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,9 @@ Alternatively, you could clone and run setup.py file:
* Python 3.6+
* combo>=0.1.3
* joblib
* numpy>=1.13
* numba>=0.35
* scipy>=1.3.1
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.20.0
* six
* statsmodels
Expand Down Expand Up @@ -324,6 +324,7 @@ Proximity-Based MedKNN Median kNN (use the median distance to
Proximity-Based SOD Subspace Outlier Detection 2009 [#Kriegel2009Outlier]_
Proximity-Based ROD Rotation-based Outlier Detection 2020 [#Almardeny2020A]_
Outlier Ensembles IForest Isolation Forest 2008 [#Liu2008Isolation]_
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
Expand All @@ -343,11 +344,12 @@ Neural Networks DeepSVDD Deep One-Class Classification
=================== ================ ===================================================================================================== ===== ========================================
Type Abbr Algorithm Year Ref
=================== ================ ===================================================================================================== ===== ========================================
Outlier Ensembles Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles FB Feature Bagging 2005 [#Lazarevic2005Feature]_
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 [#Zhao2019LSCP]_
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 [#Zhao2018XGBOD]_
Outlier Ensembles LODA Lightweight On-line Detector of Anomalies 2016 [#Pevny2016Loda]_
Outlier Ensembles SUOD SUOD: Accelerating Large-scale Unsupervised Heterogeneous Outlier Detection **(Acceleration)** 2021 [#Zhao2021SUOD]_
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 [#Bandaragoda2018Isolation]_
Combination Average Simple combination by averaging the scores 2015 [#Aggarwal2015Theoretical]_
Combination Weighted Average Simple combination by averaging the scores with detector weights 2015 [#Aggarwal2015Theoretical]_
Combination Maximization Simple combination by taking the maximum scores 2015 [#Aggarwal2015Theoretical]_
Expand Down Expand Up @@ -539,6 +541,8 @@ Reference
.. [#Arning1996A] Arning, A., Agrawal, R. and Raghavan, P., 1996, August. A Linear Method for Deviation Detection in Large Databases. In *KDD* (Vol. 1141, No. 50, pp. 972-981).
.. [#Bandaragoda2018Isolation] Bandaragoda, T. R., Ting, K. M., Albrecht, D., Liu, F. T., Zhu, Y., and Wells, J. R., 2018, Isolation-based anomaly detection using nearest-neighbor ensembles. *Computational Intelligence*\ , 34(4), pp. 968-998.
.. [#Breunig2000LOF] Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. *ACM Sigmod Record*\ , 29(2), pp. 93-104.
.. [#Burgess2018Understanding] Burgess, Christopher P., et al. "Understanding disentangling in beta-VAE." arXiv preprint arXiv:1804.03599 (2018).
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ Proximity-Based MedKNN Median kNN (use the median distance to k
Proximity-Based SOD Subspace Outlier Detection 2009 :class:`pyod.models.sod.SOD` :cite:`a-kriegel2009outlier`
Proximity-Based ROD Rotation-based Outlier Detection 2020 :class:`pyod.models.rod.ROD` :cite:`a-almardeny2020novel`
Outlier Ensembles IForest Isolation Forest 2008 :class:`pyod.models.iforest.IForest` :cite:`a-liu2008isolation,a-liu2012isolation`
Outlier Ensembles INNE Isolation-based Anomaly Detection Using Nearest-Neighbor Ensembles 2018 :class:`pyod.models.inne.INNE` :cite:`a-bandaragoda2018isolation`
Outlier Ensembles FB Feature Bagging 2005 :class:`pyod.models.feature_bagging.FeatureBagging` :cite:`a-lazarevic2005feature`
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 :class:`pyod.models.lscp.LSCP` :cite:`a-zhao2019lscp`
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection **(Supervised)** 2018 :class:`pyod.models.xgbod.XGBOD` :cite:`a-zhao2018xgbod`
Expand Down
6 changes: 3 additions & 3 deletions docs/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ Alternatively, you could clone and run setup.py file:
* Python 3.6+
* combo>=0.1.3
* joblib
* numpy>=1.13
* numba>=0.35
* scipy>=1.3.1
* numpy>=1.19
* numba>=0.51
* scipy>=1.5.1
* scikit_learn>=0.20.0
* six
* statsmodels
Expand Down
10 changes: 10 additions & 0 deletions docs/pyod.models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,16 @@ pyod.models.iforest module
:show-inheritance:
:inherited-members:

pyod.models.inne module
-----------------------

.. automodule:: pyod.models.inne
:members:
:exclude-members:
:undoc-members:
:show-inheritance:
:inherited-members:

pyod.models.kde module
----------------------

Expand Down
4 changes: 2 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ joblib
keras
matplotlib
nose
numpy>=1.13
numpy>=1.19
numba==0.53 # need to lift this later see github for issue
pytest
scipy>=1.3.1
scipy>=1.5.1
scikit_learn>=0.20.0
six
sphinx-rtd-theme
Expand Down
11 changes: 11 additions & 0 deletions docs/zreferences.bib
Original file line number Diff line number Diff line change
Expand Up @@ -413,4 +413,15 @@ @article{sugiyama2013rapid
journal={Advances in neural information processing systems},
volume={26},
year={2013}
}

@article{bandaragoda2018isolation,
title={Isolation-based anomaly detection using nearest-neighbor ensembles},
author={Bandaragoda, Tharindu R and Ting, Kai Ming and Albrecht, David and Liu, Fei Tony and Zhu, Ye and Wells, Jonathan R},
journal={Computational Intelligence},
volume={34},
number={4},
pages={968--998},
year={2018},
publisher={Wiley Online Library}
}
57 changes: 57 additions & 0 deletions examples/inne_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""Example of using INNE for outlier detection
"""
# Author: Xin Han <xinhan197@gmail.com>
# License: BSD 2 clause

from __future__ import division
from __future__ import print_function

import os
import sys

# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))

from pyod.models.inne import INNE
from pyod.utils.data import generate_data

from pyod.utils.data import evaluate_print
from pyod.utils.example import visualize

if __name__ == "__main__":
contamination = 0.1 # percentage of outliers
n_train = 200 # number of training points
n_test = 100 # number of testing points

# Generate sample data
X_train, y_train, X_test, y_test = \
generate_data(n_train=n_train,
n_test=n_test,
n_features=2,
contamination=contamination,
random_state=42)

# train INNE detector
clf_name = 'INNE'
clf = INNE(contamination=contamination, max_samples=4)
clf.fit(X_train)

# get the prediction labels and outlier scores of the training data
y_train_pred = clf.labels_ # binary labels (0: inliers, 1: outliers)
y_train_scores = clf.decision_scores_ # raw outlier scores

# get the prediction on the test data
y_test_pred = clf.predict(X_test) # outlier labels (0 or 1)
y_test_scores = clf.decision_function(X_test) # outlier scores

# evaluate and print the results
print("\nOn Training Data:")
evaluate_print(clf_name, y_train, y_train_scores)
print("\nOn Test Data:")
evaluate_print(clf_name, y_test, y_test_scores)

# visualize the results
visualize(clf_name, X_train, y_train, X_test, y_test, y_train_pred,
y_test_pred, show_figure=True, save_figure=False)
4 changes: 1 addition & 3 deletions pyod/models/iforest.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
import numpy as np
from joblib import Parallel
from joblib.parallel import delayed
from sklearn.utils.fixes import _joblib_parallel_args

from sklearn.ensemble import IsolationForest
from sklearn.utils.validation import check_is_fitted
Expand Down Expand Up @@ -306,8 +305,7 @@ def feature_importances_(self):
"""
check_is_fitted(self)
all_importances = Parallel(
n_jobs=self.n_jobs, **_joblib_parallel_args(prefer="threads")
)(
n_jobs=self.n_jobs)(
delayed(getattr)(tree, "feature_importances_")
for tree in self.detector_.estimators_
if tree.tree_.node_count > 1
Expand Down
Loading

0 comments on commit bb3a14e

Please sign in to comment.