# 异常检测Anomaly Detection


## 实验介绍
### 数据
- UNSW-NB15数据集：
- 攻击类型（7类）：Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance
- 数据维度：196
- 数据划分：训练集train_data.csv，测试集test_data.csv
### 任务
- 在UNSW-NB15数据集上进行异常检测，使用经典的无监督异常检测模型iForest、LOF、DeepSVDD。

### 要求
- 使用AUROC和AUPRC指标评估；
- 可以调包；
- 可以直接使用DeepOD仓库；
     https://github.com/xuhongzuo/DeepOD
- 在实验报告中列出表格对比这几类异常检测方法的性能和时间。

## 实验过程

In [8]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, average_precision_score
from pyod.models.iforest import IForest
from pyod.models.lof import LOF
from deepod.models.tabular import DeepSVDD
from deepod.models import DeepSAD

# 读取训练集和测试集数据
train_data = pd.read_csv('NB15/train_data.csv')
test_data = pd.read_csv('NB15/test_data.csv')

In [12]:
# # 将字符串标签转换为数字
# label_mapping = {
#     'Normal': 0,
#     'Fuzzers': 1,
#     'Analysis': 2,
#     'Backdoors': 3,
#     'DoS': 4,
#     'Exploits': 5,
#     'Generic': 6,
#     'Reconnaissance': 7
# }

# # 将训练集和测试集的标签列进行替换
# train_data['attack_cat'] = train_data['attack_cat'].replace(label_mapping)
# test_data['attack_cat'] = test_data['attack_cat'].replace(label_mapping)
# 划分特征和标签
X_train = train_data.drop(columns=['label'])
X_train = X_train.drop(columns=['attack_cat'])
y_train = train_data['label']
X_test = test_data.drop(columns=['label'])
X_test = X_test.drop(columns=['attack_cat'])
y_test = test_data['label']

# 定义模型

iForest=IForest()
lof=LOF()
Deepsvdd=DeepSVDD()


In [33]:
# 计算异常值比例
outlier_ratio = y_train.value_counts(normalize=True)[1]

print("异常值比例：", outlier_ratio)

异常值比例： 0.04999278950151421


In [13]:
# 训练并评估模型
results = {}
# iForest
iForest.fit(X_train)
y_pred = iForest.predict(X_test)
# print(y_pred,y_test)
# print(len(y_pred),len(y_test))
auc_roc = roc_auc_score(y_test, y_pred)
auc_pr = average_precision_score(y_test, y_pred)
results['iForest'] = {'AUROC': auc_roc, 'AUPRC': auc_pr}



  f"X has feature names, but {self.__class__.__name__} was fitted without"


In [14]:
# lof
lof.fit(X_train)
y_pred = lof.predict(X_test)
# print(y_pred,y_test)
# print(len(y_pred),len(y_test))
auc_roc = roc_auc_score(y_test, y_pred)
auc_pr = average_precision_score(y_test, y_pred)
results['lof'] = {'AUROC': auc_roc, 'AUPRC': auc_pr}

In [29]:
X_train = X_train.astype(int)
X_train_tensor = torch.tensor(X_train.values)
X_test = X_test.astype(int)
X_test_tensor = torch.tensor(X_test.values)

In [31]:
# DeepSVDD
Deepsvdd.fit(X_train_tensor, y=None)
y_pred = Deepsvdd.decision_function(X_test_tensor)

# print(y_pred,y_test)
# print(len(y_pred),len(y_test))
auc_roc = roc_auc_score(y_test, y_pred)
auc_pr = average_precision_score(y_test, y_pred)
results['DeepSVDD'] = {'AUROC': auc_roc, 'AUPRC': auc_pr}

Start Training...
ensemble size: 1
MLPnet(
  (network): Sequential(
    (0): LinearBlock(
      (linear): Linear(in_features=197, out_features=100, bias=False)
      (act_layer): ReLU()
    )
    (1): LinearBlock(
      (linear): Linear(in_features=100, out_features=50, bias=False)
      (act_layer): ReLU()
    )
    (2): LinearBlock(
      (linear): Linear(in_features=50, out_features=128, bias=False)
      (act_layer): Identity()
    )
  )
)
epoch  1, training loss: 1544587348597306.750000, time: 1.7s
epoch 10, training loss: 579896224539832.625000, time: 1.7s
epoch 20, training loss: 509487074070460.875000, time: 1.8s
epoch 30, training loss: 464311823544387.125000, time: 1.9s
epoch 40, training loss: 423255203162078.437500, time: 1.9s
epoch 50, training loss: 391129490393188.750000, time: 1.9s
epoch 60, training loss: 357890607623721.937500, time: 1.9s
epoch 70, training loss: 328246405095021.125000, time: 1.9s
epoch 80, training loss: 309279280715272.375000, time: 1.9s
epoch 90, t

testing: 100%|██████████| 976/976 [00:00<00:00, 1633.88it/s]
testing: 100%|██████████| 354/354 [00:00<00:00, 1682.08it/s]


In [32]:
# 输出结果
results_df = pd.DataFrame(results)
print(results_df)

        iForest       lof  DeepSVDD
AUROC  0.562695  0.508903  0.774321
AUPRC  0.211161  0.179957  0.392863
