Boost

Author: Vaux Gomes
Contact: vauxgomes@gmail.com
Version: 0.1

Description

Here we have implementations of LAC, Adaboost, a version of Conf-Rated Adaboost and SLIPPER algorithms. To mine the association rules one can use either d-peeler, multidupehack and lcm softwares without having to adapt the code. However, it is not very complicated adapting the code.

Lazy Associative Classifier

LAC (Lazy Associative Classification) is a rule-based demand-driven lazy machine learning algorithm. For each test instance, the algorithm projects the data at the region that the test instance is. As effect, the algorithm decomposes the problem of fitting a unique function that explains the whole date in many smaller problems. In deed, there is a possibility that not all the regions of data points will be explored by the algorithm for a given test set. LAC predicts the class of a test instance by averaging the confidence value of the induced rules and taking a majority vote among the rules classes.

Adaptive Boosting

Boosting is a method for improving the accuracy of machine learning algorithms, used for combining classifier by assigning them voting influence values (or simply, weights). Essentially, boosting builds an additive model by iteratively combining many classifiers, so called weak hypotheses all generated by a base learner.

Conf-rated Adaboost

The conf-rated Adaboost algorithm is an adaptation of the original discrete Adaboost that allow classifiers to give a notion of certainty on their predicions.

SLIPPER

SLIPPER is a rule based algoritm that uses Adaboost's method to build sets of rules.

Scripts

Module `main.py`

Arguments

s Traininig set files (required)
t Testing set file (required)
i Itemsets file (required)
b Maximum number of rounds
z Original sizes of each class
w Pre-set weights
Z ZERO Adaboost
A Associative Classifier
D Discrete Adaboost
C Confidence-rated Adaboost
S SLIPPER Classifier
-free Use free itemsets
-rmode Associative Classifier
-seed Random objects

Settings

There is a files called settings.py in the utils directory. Within that file there are a few variables that can be set:

RANDOM_SEED Seed for controlling the randoming object
MIN_ROUNDS Minimum of rounds for the Adaboost algorithms
MAX_ROUNDS Maximum of rounds for the Adaboost algorithms
GAMMA Gamma value for the Discrete Adaboost algorithms (see Adaboost)
kICV Number of internal cross validations of the Slipper algorithm

Usage

Example 1

# Classes: TMP.class.0 TMP.class.1
# Itemsets: TMP.itemsets
# Test: TMP.testset
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets

Example 2

# Calling Discrete Adaboost routine
$ main.py -s TMP.class.* -t TMP.testset -i TMP.itemsets -D

Script `runner`

runner is used to mine rules for given train and test sets and after that call the booster module

Arguments:

h Displays help
s Training Files
t Testing files
z MINER: Minimum support size
m MINER: Use Multidupehack to mine the itemsets (default: D-peeler)
l MINER: Use LCM to mine the itemsets (default: D-peeler)
e BOOSTER: Activates Eager Mode
Z BOOSTER: Deactivates Zero classifier
A BOOSTER: Deactivates Associative Classifier
D BOOSTER: Activates Discrete Adaboost
C BOOSTER: Activates Conf-rated Adaboost
S BOOSTER: Activates SLIPPER Boost
o BOOSTER: Activates use of original train size
j BOOSTER: Activates use of Jaccard's index
b BOOSTER: Maximum number of rounds
f BOOSTER: Uses only free itemsets

Note: This code works only for luccskdd files.

Example 1

$ ./runner.sh -s train -t test

Example 2

# Calling Discrete Adaboost in the eager mode using multidupehack
$ ./runner.sh -s train -t test -emD

Note: It is interesting if you always call the option f with the option m.

Script `battery`

This script runs a battery of datasets using the runner script. See variable array in the script.

h Displays help
a Use multi and binary class problems datasets
x Files extention name*
m MINER: miner options occordingly with runner script*
p Path for result outputs
l Progress log file (default: .batt)
n Show notifications (default: false)

Example 1

# Calling a battery using options C and D of the main module
$ ./battery.sh -x ext -m "-CD"

Input format

The input format is formed of a series of rules in the following format:

<int_features> <class>

The LUCS-KDD format fits very well!

Output format

The output is formed of a header in the format:

# Mode: <Lazy/Eager> 
# Miner: <Multidupehack/D-peeler/Lcm>
# Train: <train file>
# Test: <test file>
# 
# Date: Wed Nov  2 01:06:52 BRST 2016
# Host: ubt13z

followed for an empty line and a set of lines white space separated representing the predictions of the algorithms. Each line follows the following format:

<correct_class> ~<alg1_name> <pred1_alg1> ... <predN_alg1> ... ~<algM_name> <pred1_algM> ... <predN_algM>

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
boost		boost
utils		utils
battery.sh		battery.sh
readme.md		readme.md
runner.sh		runner.sh
setup.py		setup.py

vauxgomes/black

Folders and files

Latest commit

History

Repository files navigation

Boost

Menu

Description

Lazy Associative Classifier

Adaptive Boosting

Conf-rated Adaboost

SLIPPER

Scripts

Module main.py

Arguments

Settings

Usage

Example 1

Example 2

Script runner

Arguments:

Example 1

Example 2

Script battery

Example 1

Input format

Output format

About

Topics

Resources

Stars

Watchers

Forks

Languages

Module `main.py`

Script `runner`

Script `battery`