# 离群点分析与异常检测

## abalone数据集分析

### 数据处理

<br/><font color=Blue>abalone数据集文件结构：</font>
<br/><font color=Blue>&emsp; 不同文件由不同的列组成，其中：</font>
<br/><font color=Blue>&emsp; ground.truth 标签用于表示该数据点是否是异常点 </font>
<br/><font color=Blue>&emsp; original.label 用于表明该点在原始数据上的分类</font>


<br/><font color=Blue>因此，将ground.truth作为标签，除去'point.id','motherset','origin','original.label'这些归属信息，剩下的列作为输入特征，进行离群点检测模型的训练。</font>

In [None]:
import pandas as pd
for i in range(start,len(file_lst)):
    file = file_lst[i]
    df = pd.read_csv(file)
    x = df.drop(['ground.truth','point.id','motherset','origin','original.label'],axis = 1).values
    y = df['ground.truth'].values
    y = [0 if i == 'nominal' else 1 for i in y]

<br/><font color=Blue>将数据集以5:2的比例划分训练集和测试集，由于使用的离群点检测的包'PyOD'中主要实现的模型和算法都是无监督类型，所以测试集和标签仅用来评估训练结果。</font>

In [None]:
from sklearn.model_selection import train_test_split
from pyod.utils.utility import standardizer
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4,random_state=random_state)
x_train_norm, x_test_norm = standardizer(x_train, x_test)

### 模型建立

<br/><font color=Blue>使用PyOD工具包提供的离群点检测模型进行离群点检测：</font>
<br/><font color=Blue>线性离群点检测模型：</font>
<br/><font color=Blue>&emsp; PCA</font>
<br/><font color=Blue>&emsp; MCD</font>
<br/><font color=Blue>&emsp; OCSVM</font>
<br/><font color=Blue>基于接近度的异常值检测模型：</font>
<br/><font color=Blue>&emsp; LOF</font>
<br/><font color=Blue>&emsp; CBLOF</font>
<br/><font color=Blue>&emsp; K-NN</font>
<br/><font color=Blue>&emsp; HBOS</font>
<br/><font color=Blue>离群值检测的概率模型：</font>
<br/><font color=Blue>&emsp; ABOD</font>
<br/><font color=Blue>集成组合框架：</font>
<br/><font color=Blue>&emsp; Isolation Forest</font>
<br/><font color=Blue>&emsp; Feature Bagging</font>

<br/><font color=Blue>首先通过'ground.truth'标签计算当前数据集的污染指数(离群点比例)，再通过工具包提供的模型接口创建模型。</font>

In [None]:
import numpy as np
from pyod.models.abod import ABOD
from pyod.models.cblof import CBLOF
from pyod.models.feature_bagging import FeatureBagging
from pyod.models.hbos import HBOS
from pyod.models.iforest import IForest
from pyod.models.knn import KNN
from pyod.models.lof import LOF
from pyod.models.mcd import MCD
from pyod.models.ocsvm import OCSVM
from pyod.models.pca import PCA
outliers_fraction = min(np.count_nonzero(y) / len(y),0.5)
outliers_percentage = round(outliers_fraction * 100, ndigits=4)

classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(
    contamination=outliers_fraction),
    'Cluster-based Local Outlier Factor': CBLOF(
        contamination=outliers_fraction, check_estimator=False,
        random_state=random_state),
    'Feature Bagging': FeatureBagging(contamination=outliers_fraction,
                                      random_state=random_state),
    'Histogram-base Outlier Detection (HBOS)': HBOS(
        contamination=outliers_fraction),
    'Isolation Forest': IForest(contamination=outliers_fraction,
                                random_state=random_state),
    'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),
    'Local Outlier Factor (LOF)': LOF(
        contamination=outliers_fraction),
    'Minimum Covariance Determinant (MCD)': MCD(
        contamination=outliers_fraction, random_state=random_state),
    'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction),
    'Principal Component Analysis (PCA)': PCA(
        contamination=outliers_fraction, random_state=random_state),
}

<br/><font color=Blue>使用训练数据拟合模型，记录每种模型的执行时间、prn和roc值</font>

In [None]:
from time import time
from pyod.utils.utility import precision_n_scores
from sklearn.metrics import roc_auc_score
for clf_name, clf in classifiers.items():
    try:
        t0 = time()
        clf.fit(x_train_norm)
        test_scores = clf.decision_function(x_test_norm)
        t1 = time()
        duration = round(t1 - t0, ndigits=4)
        roc = round(roc_auc_score(y_test, test_scores), ndigits=4)
        prn = round(precision_n_scores(y_test, test_scores), ndigits=4)
    except Exception as e:
        roc = 0
        prn = 0
        duration = 0

    time_list.append(duration)
    roc_list.append(roc)
    prn_list.append(prn)

### 整合
    ——前文为对一个文件的处理，为处理所有文件，将处理过程整合为一个函数

In [84]:
# 读取目录下所有文件
import fm
file_lst = fm.get_filelist('../../data/abalone/benchmarks/',[])
file_lst.sort()

In [85]:
df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',
          'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',
          'OCSVM', 'PCA']
roc_df = pd.DataFrame(columns=df_columns)
prn_df = pd.DataFrame(columns=df_columns)
time_df = pd.DataFrame(columns=df_columns)
random_state = np.random.RandomState(42)

In [125]:
def detect_file(file_lst,start,roc_df,prn_df,time_df,random_state):
    for i in range(start,len(file_lst)):
        file = file_lst[i]
        
        df = pd.read_csv(file)
        x = df.drop(['ground.truth','point.id','motherset','origin','original.label'],axis = 1).values
        y = df['ground.truth'].values
        y = [0 if i == 'nominal' else 1 for i in y]

        outliers_fraction = min(np.count_nonzero(y) / len(y),0.5)
        outliers_percentage = round(outliers_fraction * 100, ndigits=4)

        roc_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]
        prn_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]
        time_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]

        x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4,random_state=random_state)
        x_train_norm, x_test_norm = standardizer(x_train, x_test)



        classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(
            contamination=outliers_fraction),
            'Cluster-based Local Outlier Factor': CBLOF(
                contamination=outliers_fraction, check_estimator=False,
                random_state=random_state),
            'Feature Bagging': FeatureBagging(contamination=outliers_fraction,
                                              random_state=random_state),
            'Histogram-base Outlier Detection (HBOS)': HBOS(
                contamination=outliers_fraction),
            'Isolation Forest': IForest(contamination=outliers_fraction,
                                        random_state=random_state),
            'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),
            'Local Outlier Factor (LOF)': LOF(
                contamination=outliers_fraction),
            'Minimum Covariance Determinant (MCD)': MCD(
                contamination=outliers_fraction, random_state=random_state),
            'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction),
            'Principal Component Analysis (PCA)': PCA(
                contamination=outliers_fraction, random_state=random_state),
        }

        for clf_name, clf in classifiers.items():
            try:
                t0 = time()
                clf.fit(x_train_norm)
                test_scores = clf.decision_function(x_test_norm)
                t1 = time()
                duration = round(t1 - t0, ndigits=4)
                roc = round(roc_auc_score(y_test, test_scores), ndigits=4)
                prn = round(precision_n_scores(y_test, test_scores), ndigits=4)
            except Exception as e:
                roc = 0
                prn = 0
                duration = 0
                
            time_list.append(duration)
            roc_list.append(roc)
            prn_list.append(prn)

        temp_df = pd.DataFrame(time_list).transpose()
        temp_df.columns = df_columns
        time_df = pd.concat([time_df, temp_df], axis=0)

        temp_df = pd.DataFrame(roc_list).transpose()
        temp_df.columns = df_columns
        roc_df = pd.concat([roc_df, temp_df], axis=0)

        temp_df = pd.DataFrame(prn_list).transpose()
        temp_df.columns = df_columns
        prn_df = pd.concat([prn_df, temp_df], axis=0)

In [97]:
# 存储结果文件
time_df.to_csv("abalone-time.csv",index = False)
roc_df.to_csv("abalone-roc.csv",index = False)
prn_df.to_csv("abalone-prn.csv",index = False)

### 结果分析

In [6]:
time_1 = pd.read_csv("abalone-time.csv",index_col = "Data")
roc_1 = pd.read_csv("abalone-roc.csv",index_col = "Data")
prn_1 = pd.read_csv("abalone-prn.csv",index_col = "Data")

#### 性能

In [20]:
time_1.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,1888,9,48.3581,0.3604,0.112,0.2147,0.0041,0.2934,0.0585,0.024,0.9736,0.1037,0.0024
2,1888,9,50.0,0.3754,0.1149,0.2082,0.0033,0.2888,0.0601,0.026,0.9395,0.0962,0.0027
3,1888,9,50.0,0.3478,0.1294,0.1966,0.0033,0.2902,0.0573,0.0241,0.7979,0.0897,0.0024
4,1888,9,47.1928,0.3303,0.1057,0.2101,0.0034,0.2983,0.0607,0.026,1.0447,0.0886,0.0024
5,1888,9,49.3644,0.3392,0.1273,0.198,0.0033,0.2809,0.0581,0.0251,0.8198,0.104,0.0022


In [18]:
ans = time_1.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values()

PCA        0.003870
HBOS       0.005087
LOF        0.056318
KNN        0.084102
OCSVM      0.090292
CBLOF      0.116211
IForest    0.258679
ABOD       0.316479
FB         0.454158
MCD        1.021404
dtype: float64

<br/><font color=Blue>从平均时间来看，'PCA'所用时间最短，'MCD'所用时间最长</font>

#### ROC/AUC

In [21]:
roc_1.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,1888,9,48.3581,0.7144,0.5919,0.5696,0.4839,0.4096,0.7449,0.5617,0.7942,0.4871,0.4556
2,1888,9,50.0,0.7427,0.5308,0.5321,0.4566,0.3959,0.7416,0.5299,0.7536,0.4536,0.4412
3,1888,9,50.0,0.7414,0.6171,0.6115,0.356,0.3996,0.7673,0.6114,0.7724,0.4896,0.4259
4,1888,9,47.1928,0.7388,0.6124,0.5685,0.49,0.4539,0.7699,0.5807,0.8433,0.4773,0.4703
5,1888,9,49.3644,0.7366,0.5853,0.5662,0.4495,0.3473,0.7606,0.5531,0.7625,0.4519,0.4241


In [24]:
ans = roc_1.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values(ascending=False)

MCD        0.818319
KNN        0.807786
FB         0.779019
ABOD       0.778553
LOF        0.767699
OCSVM      0.767049
HBOS       0.748110
CBLOF      0.738439
PCA        0.719367
IForest    0.716047
dtype: float64

<br/><font color=Blue>从平均ROC来看，所有模型的ROC值都在0.7-0.8左右</font>
<br/><font color=Blue>其中，'MCD'最高</font>

#### PRN

In [28]:
prn_1.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,1888,9,48.3581,0.6438,0.562,0.5462,0.4776,0.4222,0.6755,0.5462,0.7124,0.496,0.4697
2,1888,9,50.0,0.664,0.5079,0.5265,0.4921,0.4101,0.6587,0.5317,0.6772,0.455,0.4603
3,1888,9,50.0,0.6812,0.5784,0.6041,0.4072,0.4473,0.6864,0.6093,0.7044,0.4884,0.4447
4,1888,9,47.1928,0.6453,0.5503,0.5168,0.4777,0.4302,0.6732,0.5251,0.743,0.4525,0.4497
5,1888,9,49.3644,0.6477,0.5501,0.5393,0.4399,0.374,0.6694,0.523,0.6775,0.4309,0.4255


In [32]:
ans = prn_1.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values(ascending=False)

KNN        0.416733
MCD        0.377072
OCSVM      0.374527
FB         0.362075
CBLOF      0.362065
LOF        0.354011
ABOD       0.335495
PCA        0.310596
HBOS       0.309602
IForest    0.290866
dtype: float64

<br/><font color=Blue>从平均准确率来看，效果最好的模型为KNN，约为0.42，效果最差的为IForest，约为0.29</font>

## spambase数据集分析

<br/><font color=Blue>因与abalone数据集处理方法相同，不分模块说明，仅写出整合后的完整代码</font>

In [None]:
random_state = np.random.RandomState(42)
df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',
              'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',
              'OCSVM', 'PCA']
roc_df = pd.DataFrame(columns=df_columns)
prn_df = pd.DataFrame(columns=df_columns)
time_df = pd.DataFrame(columns=df_columns)
file_lst = fm.get_filelist('../../data/spambase/benchmarks/',[])
file_lst.sort()
for file in file_lst:
    df = pd.read_csv(file)
    x = df.drop(['ground.truth','point.id','motherset','origin','original.label'],axis = 1).values
    y = df['ground.truth'].values
    y = [0 if i == 'nominal' else 1 for i in y]

    outliers_fraction = min(np.count_nonzero(y) / len(y),0.5)
    outliers_percentage = round(outliers_fraction * 100, ndigits=4)
    
    roc_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]
    prn_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]
    time_list = [file[-8:-4], x.shape[0], x.shape[1], outliers_percentage]
    
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4,random_state=random_state)
    x_train_norm, x_test_norm = standardizer(x_train, x_test)
    
    classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(
        contamination=outliers_fraction),
        'Cluster-based Local Outlier Factor': CBLOF(
            contamination=outliers_fraction, check_estimator=False,
            random_state=random_state),
        'Feature Bagging': FeatureBagging(contamination=outliers_fraction,
                                          random_state=random_state),
        'Histogram-base Outlier Detection (HBOS)': HBOS(
            contamination=outliers_fraction),
        'Isolation Forest': IForest(contamination=outliers_fraction,
                                    random_state=random_state),
        'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),
        'Local Outlier Factor (LOF)': LOF(
            contamination=outliers_fraction),
        'Minimum Covariance Determinant (MCD)': MCD(
            contamination=outliers_fraction, random_state=random_state),
        'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction),
        'Principal Component Analysis (PCA)': PCA(
            contamination=outliers_fraction, random_state=random_state),
    }
    
    for clf_name, clf in classifiers.items():
        try:
            t0 = time()
            clf.fit(x_train_norm)
            test_scores = clf.decision_function(x_test_norm)
            t1 = time()
            duration = round(t1 - t0, ndigits=4)
            roc = round(roc_auc_score(y_test, test_scores), ndigits=4)
            prn = round(precision_n_scores(y_test, test_scores), ndigits=4)
        except Exception as e:
            roc = 0
            prn = 0
            duration = 0
        
        time_list.append(duration)
        roc_list.append(roc)
        prn_list.append(prn)

    temp_df = pd.DataFrame(time_list).transpose()
    temp_df.columns = df_columns
    time_df = pd.concat([time_df, temp_df], axis=0)

    temp_df = pd.DataFrame(roc_list).transpose()
    temp_df.columns = df_columns
    roc_df = pd.concat([roc_df, temp_df], axis=0)

    temp_df = pd.DataFrame(prn_list).transpose()
    temp_df.columns = df_columns
    prn_df = pd.concat([prn_df, temp_df], axis=0)
time_df.to_csv("spambase-time.csv",index = False)
roc_df.to_csv("spambase-roc.csv",index = False)
prn_df.to_csv("spambase-prn.csv",index = False)
print("finished")

### 结果分析

In [35]:
time_2 = pd.read_csv("spambase-time.csv",index_col = "Data")
roc_2 = pd.read_csv("spambase-roc.csv",index_col = "Data")
prn_2 = pd.read_csv("spambase-prn.csv",index_col = "Data")

#### 性能

In [36]:
time_2.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,2511,58,41.0593,0.0,1.3066,2.8639,0.9236,0.3353,0.406,0.3844,1.4961,0.3488,0.0141
2,2511,58,39.7849,0.0,0.1994,2.7086,0.0159,0.3188,0.3953,0.3867,1.2631,0.3351,0.0129
3,2511,58,39.1079,0.0,0.138,2.9073,0.0167,0.3312,0.3949,0.3791,1.5047,0.3367,0.0128
4,2511,58,39.6256,0.0,0.1494,2.8392,0.0173,0.3313,0.3784,0.3898,1.4111,0.3478,0.0121
5,2511,58,39.6256,0.0,0.1469,2.6739,0.0166,0.3522,0.4064,0.3606,1.4056,0.3451,0.0128


In [37]:
ans = time_2.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values()

HBOS       0.033642
PCA        0.042355
CBLOF      0.245409
IForest    0.361053
OCSVM      0.515216
LOF        0.661631
KNN        0.686198
ABOD       0.826201
MCD        3.015449
FB         5.057830
dtype: float64

<br/><font color=Blue>从平均时间来看，'HBOS'所用时间最短，'FB'所用时间最长</font>
<br/><font color=Blue>'FB','MCD','ABOD'所耗时间相对较长，这一点与abalone数据集所观察到的结果相同，可以推测出这几种模型时间复杂度较高</font>
<br/><font color=Blue>另外，本数据集的耗时明显长于abalone数据集</font>

#### ROC/AUC  

In [38]:
roc_2.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,2511,58,41.0593,0.0,0.5881,0.3256,0.7159,0.6634,0.6022,0.3882,0.4575,0.5747,0.5894
2,2511,58,39.7849,0.0,0.596,0.3369,0.7131,0.6261,0.6093,0.3814,0.4904,0.5633,0.5806
3,2511,58,39.1079,0.0,0.6007,0.3611,0.7054,0.6732,0.6105,0.4086,0.436,0.571,0.5834
4,2511,58,39.6256,0.0,0.6291,0.3663,0.7068,0.686,0.6025,0.4275,0.4886,0.5844,0.5942
5,2511,58,39.6256,0.0,0.5398,0.351,0.7244,0.6632,0.6009,0.4063,0.4172,0.5369,0.5515


In [39]:
ans = roc_2.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values(ascending=False)

IForest    0.675383
MCD        0.667410
KNN        0.662475
HBOS       0.654777
OCSVM      0.643906
LOF        0.642503
PCA        0.641658
CBLOF      0.639057
FB         0.628230
ABOD       0.493942
dtype: float64

<br/><font color=Blue>从平均ROC来看，不同模型的ROC值在0.5-0.7左右，相对abalone数据集的表现较差</font>
<br/><font color=Blue>abalone数据集中效果最差的'IForest'在spambase数据集中效果最好，ROC值约为0.68，说明不同数据集适用的模型不同</font>

#### PRN

In [41]:
prn_2.head()

Unnamed: 0_level_0,#Samples,# Dimensions,Outlier Perc,ABOD,CBLOF,FB,HBOS,IForest,KNN,LOF,MCD,OCSVM,PCA
Data,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,2511,58,41.0593,0.0,0.46,0.2615,0.6005,0.5254,0.4746,0.3075,0.339,0.4636,0.46
2,2511,58,39.7849,0.0,0.447,0.25,0.5859,0.4747,0.4596,0.2753,0.3434,0.4419,0.4495
3,2511,58,39.1079,0.0,0.4219,0.25,0.5911,0.5182,0.4427,0.2786,0.2812,0.4167,0.4375
4,2511,58,39.6256,0.0,0.5036,0.3126,0.5943,0.568,0.5036,0.358,0.3675,0.4821,0.4821
5,2511,58,39.6256,0.0,0.399,0.243,0.5882,0.5294,0.4399,0.289,0.2737,0.4153,0.399


In [42]:
ans = prn_2.loc[:,"ABOD":"PCA"].apply(lambda x: x.mean())
ans.sort_values(ascending=False)

IForest    0.269408
KNN        0.259158
PCA        0.248830
CBLOF      0.245386
OCSVM      0.244198
MCD        0.243359
HBOS       0.243084
LOF        0.214420
FB         0.209865
ABOD       0.207996
dtype: float64

<br/><font color=Blue>从平均准确率来看，效果最好的模型为IForest，约为0.27，效果最差的为ABOD，约为0.21</font>
<br/><font color=Blue>IForest在abalone数据集中效果最差，而在约为spambase数据集中效果最好，再次不同数据集适用的模型不同</font>
<br/><font color=Blue>KNN模型在两个数据集中效果都相对较好，体现了其稳定性</font>