Skip to content

Background

Ippoz edited this page Jun 18, 2020 · 3 revisions

Purpose of the Tool

The tool is shaped as a simple GUI tool that allows also non-experts or practitioners of other domains to use machine learning algorithms and apply them for anomaly detection. The tool itself does not provide novel technologies, algorithms or techniques, but instead adopts state-of-the art findings in an orchestrated and simplified fashion, providing the user an extensive pool of unsupervised algorithms which can be easily compared on different datasets.

In a nutshell, RELOAD is:

  • Easy to use, since it embeds known techniques by hiding many implementation details and variants to the final user, which is requested to select just a few inputs. Consequently, it is a powerful artifact that could also be used as a teaching support for bachelor or masters’ degrees.
  • Open source, since code is freely available on online public repositories.
  • Lightweight and portable, given that Java 8+ (both Oracle and OpenJDK) is installed in the target machine.
  • Easily extensible by adding new algorithms, with built-in interfaces to embed algorithms and other techniques e.g., feature selection strategies, depending on the needs of the user.
  • Shaped for anomaly detection, including a full – but almost transparent to the user – support to unsupervised algorithms, which show potential in the identification of unknowns, which is one of the key aspects of anomaly detection.

The tool at its current state can efficiently process static data sources i.e., CSV or ARFF datasets that were already created for off-line analyses. Support for data streams for run-time anomaly detection is currently under investigation.

Advantages in adopting RELOAD

Understanding the relevance of tools to evaluate and compare anomaly detection algorithms is intuitive. In fact, it is generally difficult to perform an extensive experimental campaign without supporting frameworks that automate execution of experiments and data analysis.

Frameworks such as ELKI, WEKA, RapidMiner, portals as OpenML or libraries such as Pandas/Scikit were created to allow the user to apply algorithms on datasets. ELKI and WEKA provide Java-based executables with built-in algorithms including unsupervised ones, while Pandas/Scikit are Phyton-based. These tools are very powerful, but do not focus on unsupervised algorithms, and often they are not really suited for beginners or entry-level data scientists. In RELOAD, the following actions could instead be easily achieved also from beginners:

  • Wide choice of unsupervised algorithms, which are often acknowledged as the most suitable way to identify unknown vulnerabilities as zero-day attacks: implementations of unsupervised algorithms are scattered in the above-mentioned frameworks (e.g., Isolation Forests can be found only in WEKA).
  • Integrate different feature selection techniques, allowing also to sequentially executing a pool of the above echniques. This way, it is possible to define a subset of relevant indicators to be monitored, and evaluate voting strategies to decide on anomalies on the basis of the scores from such indicators. This is important to maximize detection efficacy and minimize resource consumption.
  • Provide integrate support for Meta-Learning techniques as Bagging, Boosting, Stacking, Delegating, Cascading, Voting
  • Provide support for Sliding Window algorithms.
Clone this wiki locally