Skip to content
/ SubDisc Public
forked from SubDisc/SubDisc

This repository contains code and documents about the renewal of the Cortana subgroup discovery software application.

License

Notifications You must be signed in to change notification settings

wjp/SubDisc

 
 

Repository files navigation

SubDisc: Subgroup Discovery

SubDisc is a Data Mining tool for discovering local patterns in data. SubDisc features a generic Subgroup Discovery algorithm that can be configured in many ways, in order to implement various forms of local pattern discovery. The tool can deal with a range of data types, both for the input attributes as well as the target attributes, including nominal, numeric and binary.

A unique feature of SubDisc is its ability to deal with a range of Subgroup Discovery settings, determined by the type and number of target attributes. Where regular SD algorithms only consider a single target attribute, nominal or sometimes numeric, Cortana is able to deal with targets consisting of multiple attributes, in a setting called Exceptional Model Mining.

SubDisc was previously developed under the name Cortana.

screenshots

Features

  • Generic parameterized Subgroup Discovery algorithm.
  • Multiple data types supported.
  • Implemented in Java, so works on all major platforms, including Windows, Linux and Mac OS.
  • Works on propositional (tabular) data from flat files, .TXT or .ARFF.
  • Includes Exceptional Model Mining settings.
  • Statistical validation of mining results.
  • Graphical presentation of results, such as ROC curves, scatter plots, and exceptional models.
  • Additional bioinformatics module for literature-based gene set enrichment (see bioinformatics below).
  • Free binary version and open-source access.
  • Wrapper available for R (https://github.com/SubDisc/rSubDisc) and Python (soon)

The code is compatible with Java 15.

To use

  1. Either download the last released version jar file (https://github.com/SubDisc/SubDisc/releases/) or build it yourself (below).
  2. Double-click on the .jar file or use java cli (ex.: java -jar target/subdisc-gui.jar).

The interface should appear, and you are ready to open a data file and discover subgroups!

How to build

  1. Clone the repository: git clone https://github.com/SubDisc/SubDisc.git
  2. Use maven to assemble the .jar file: mvn package
  3. The .jar file is created in ./target and named something like subdisc-gui-2.1094.jar.

Scientific Publications

Technical details concerning the algorithms behind Cortana can be found in various scientific publications:

Contributors

The following people have contributed in various ways to the development of SubDisc/Cortana:

About

This repository contains code and documents about the renewal of the Cortana subgroup discovery software application.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 87.5%
  • TeX 12.5%