Skip to content

unipr-xAI-lab/asp-malware-explanation

Repository files navigation

Answer Set Programming for Feature-Based Explanation of Malware Prediction

This project combines machine learning and Answer Set Programming (ASP) to provide interpretable explanations for malware classification, based on feature manipulations derived from XGBoost models trained on the EMBER dataset.

Project Structure

.
├── dataset/                # EMBER dataset directory (download required)
├── export/                 # Output directory for generated samples and solutions
├── model/                  # Directory for saved models
│
├── lib/
│   ├── asp/                # XGBoost to ASP conversion logic
│   ├── dataset/            # EMBER dataset preprocessor
│   └── utils/              # Utility functions
│
├── metrics/
│   ├── booster.py          # Checking the boosters module
│   └── generation.py       # Checking the sample generation module
│
├── config.py               # Configuration file for directories and parameters
│
├── narrow_bounds.py        # ASP-based narrow bound solver
├── narrow_bounds_plot.py   # Visualization for narrow bounds
│
├── expand_bounds.py        # ASP-based expanded bound solver
├── expand_bounds_plot.py   # Visualization for expanded bounds
│
├── sample_generation.py    # Generate sample with desired malware probability
├── rule_extraction.py      # Train XGBoost, extract rules, and convert to ASP
│
├── LICENSE
├── requirements.txt
└── README.md

Requirements

  • Python 3.12
  • Install dependencies:
pip install -r requirements.txt
pip install git+https://github.com/blkdmr/ember.git

Dataset Setup

Download the EMBER 2018 dataset from:

https://ember.elastic.co/ember_dataset_2018_2.tar.bz2

Place the archive in the dataset/ folder and extract it.
If you are using custom folders, update the paths in config.py.


Usage

1. Train the Model & Extract Rules

python rule_extraction.py
  • Trains an XGBoost model
  • Dumps the model
  • Extracts decision rules
  • Converts them to an ASP program

The first time you run this script, it will initialize the EMBER dataset.


2. Find Narrow Bounds

python narrow_bounds.py
  • Finds minimal feature combinations to generate a sample with a target malware probability
  • Saves the solution in the export/ directory

To visualize:

python narrow_bounds_plot.py

3. Generate Expanded Bounds

First, create a sample with a specific malware probability p:

python sample_generation.py

Then, expand bounds for a target probability q:

python expand_bounds.py
  • Alters the sample to achieve the new malware probability q
  • Saves the result in the export/ directory

To visualize:

python expand_bounds_plot.py

4. Metrics

Booster Module

python metrics/booster.py
  • Evaluates the trained booster (XGBoost model)
  • Outputs performance metrics and checks internal booster statistics

Sample Generation Module

python metrics/generation.py
  • Evaluates the quality of sample generation
  • Outputs statistics related to malware probability manipulation and feature adjustments

License

This project is licensed under the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages