discretizer4j

This project provides a Java implementation of several discretization algorithms (aka binning).

This is often a useful step in order to cope with overfitting in machine learning models or overly specific explanations from XAI algorithms such as Anchors, when working with numerical data.

We concentrate on univariate algorithms, both supervised and unsupervised, to keep things simple and away from decision tree algorithms. We chose the Java language to achieve a reasonable performance, to easily integrate with AnchorsJ (and because we did not find any other suitable open source java package).

Current implementations:

Unsupervised:
- Equal Frequency in PercentileMedianDiscretizer
- Equal Size in EqualSizeDiscretizer
- Proportional k-Interval Discretizer in EqualSizeDiscretizer
- Manual Discretization in ManualDiscretizer
- Random Discretization in RandomDiscretizer
Supervised:
- FUSINTER Discretizer in FUSINTERDiscretizer
- Minimum Description Length Principle Discretizer in MDLPDiscretizer
- Ameva Discretizer in AmevaDiscretizer

Getting Started

Prerequisites and Installation

In order to use the core project, no installation other than Java (version 8+) is are required. The intended way of using the algorithms is to use them as a maven depencency. Our maven coordinates are as follows:

  <dependency>
    <groupId>de.viadee</groupId>
    <artifactId>discretizer4j</artifactId>
    <version>1.0.0</version>    
  </dependency>

There are no transitive dependencies.

Using the Algorithm

To discretize a continuous feature, one has to create a Discretizer (extending the AbstractDiscretizer). The Discretizer then has to be fitted. This may be built as follows:

Discretizer discretizer = new Discretizer();
discretizer.fit(values, labels);

The fitted discretizer can then be used to get all DiscretizerTransitions, that have been fitted by the algorithm. Or values can be applied to the discretizer, the apply function returns the discretized labels.

discretizer.getTransitions();
// returns:
// DiscretizationTransition From ]1, 14.5) to class 0.0
// DiscretizationTransition From [14.5, 19.5) to class 1.0
// DiscretizationTransition From [19.5, 22.5) to class 2.0
// DiscretizationTransition From [22.5, 36.5) to class 3.0
// DiscretizationTransition From [36.5, 40[ to class 4.0

discretizer.apply(new Double[]{1.5, 17.0, 10.0})
// returns:
// Double[0.0, 1.0, 0.0]

The fitting creates DiscretizerTransitions. These consist of a discretizedLabel (Double) and a discretizedOrigin. The Origin is either a unique value, if the UniqueValueDiscretizer was used, or a combination of a minValue and maxValue, which determine the Interval limits of the Transition.

Tutorials and Examples

Small examples for all implemented discretizers can be found in the unit tests.

To see these discretizers in a more complex project, please refer to the XAI Examples. Here discretization was used in the context of explainable artificial intelligence.

Collaboration

The project is operated and further developed by the viadee Consulting AG in Münster, Westphalia. Results from theses at the WWU Münster and the FH Münster have been incorporated. Contact person is Dr. Frank Köhne from viadee.

Implementation of additional Discretizers ar planned.
Community contributions to the project are welcome: Please open Github-Issues with suggestions (or PR), which we can then edit in the team.

Authors

Marvin Gronhorst - Marvin Gronhorst
Tobias Goerke - Tobias Goerke
Colin Juers - Colin Juers
Dr. Frank Köhne - Dr. Frank Köhne

License

BSD 3-Clause License

Acknowledgments

Garcia et al. for the extensive research of discretization techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

discretizer4j

Getting Started

Prerequisites and Installation

Using the Algorithm

Tutorials and Examples

Collaboration

Authors

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

viadee/discretizer4j

Folders and files

Latest commit

History

Repository files navigation

discretizer4j

Getting Started

Prerequisites and Installation

Using the Algorithm

Tutorials and Examples

Collaboration

Authors

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages