# Rule Generator (Optimisation algorithm) Example

The Rule Generator (Optimisation algorithm) is used to create rules based on a labelled dataset. This algorithm generate rules by optimising the thresholds of single features and combining these one condition rules with AND conditions to create more complex rules.

## Requirements

To run, you'll need the following:

* A labelled, processed dataset (nulls imputed, categorical features encoded).

----

## Import packages

In [1]:
from iguanas.rule_generation import RuleGeneratorOpt
from iguanas.metrics.classification import FScore

import pandas as pd

## Read in data

Let's read in some labelled, processed dummy data.

In [2]:
X_train = pd.read_csv(
    'dummy_data/X_train.csv', 
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv', 
    index_col='eid'
).squeeze()
X_test = pd.read_csv(
    'dummy_data/X_test.csv', 
    index_col='eid'
)
y_test = pd.read_csv(
    'dummy_data/y_test.csv', 
    index_col='eid'
).squeeze()

----

## Generate rules

### Set up class parameters

Now we can set our class parameters for the Rule Generator. Here we're using the F1 score as the optimisation function (you can choose a different function from the `metrics.classification` module or create your own).

**Note that if you're using the FScore, Precision or Recall score as the optimisation function, use the *FScore*, *Precision* or *Recall* classes in the *metrics.classification* module rather than the same functions from Sklearn's *metrics* module, as the former are ~100 times faster on larger datasets.**

**Please see the class docstring for more information on each parameter.**

In [3]:
fs = FScore(beta=1)

In [4]:
params = {
    'metric': fs.fit,
    'n_total_conditions': 4,
    'num_rules_keep': 50,
    'n_points': 10,
    'ratio_window': 2,
    'remove_corr_rules': False,
    'target_feat_corr_types': 'Infer',
    'verbose': 1
}

### Instantiate class and run fit method

Once the parameters have been set, we can run the `fit` method to generate rules.

In [5]:
rg = RuleGeneratorOpt(**params)

In [6]:
X_rules = rg.fit(
    X=X_train, 
    y=y_train, 
    sample_weight=None
)

--- Generating one condition rules for numeric features ---
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 252.24it/s]
--- Generating one condition rules for OHE categorical features ---
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 88768.34it/s]
--- Generating pairwise rules ---
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

### Outputs

The `fit` method returns a dataframe giving the binary columns of the generated rules as applied to the training dataset. See the `Attributes` section in the class docstring for a description of each attribute generated:

In [7]:
X_rules.head()

Unnamed: 0_level_0,RGO_Rule_20211217_0,RGO_Rule_20211217_1,RGO_Rule_20211217_2,RGO_Rule_20211217_3,RGO_Rule_20211217_5,RGO_Rule_20211217_123,RGO_Rule_20211217_66,RGO_Rule_20211217_67,RGO_Rule_20211217_95,RGO_Rule_20211217_96,...,RGO_Rule_20211217_264,RGO_Rule_20211217_265,RGO_Rule_20211217_224,RGO_Rule_20211217_353,RGO_Rule_20211217_221,RGO_Rule_20211217_220,RGO_Rule_20211217_219,RGO_Rule_20211217_199,RGO_Rule_20211217_196,RGO_Rule_20211217_195
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
867-8837095-9305559,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
974-5306287-3527394,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
584-0112844-9158928,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
956-4190732-7014837,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
349-7005645-8862067,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


----

## Apply rules to a separate dataset

Use the `transform` method to apply the generated rules to a separate dataset.

In [12]:
X_rules_test = rg.transform(X=X_test)

### Outputs

The `transform` method returns a dataframe giving the binary columns of the rules as applied to the given dataset:

In [13]:
X_rules_test.head()

Unnamed: 0_level_0,RGO_Rule_20211217_0,RGO_Rule_20211217_1,RGO_Rule_20211217_2,RGO_Rule_20211217_3,RGO_Rule_20211217_5,RGO_Rule_20211217_123,RGO_Rule_20211217_66,RGO_Rule_20211217_67,RGO_Rule_20211217_95,RGO_Rule_20211217_96,...,RGO_Rule_20211217_264,RGO_Rule_20211217_265,RGO_Rule_20211217_224,RGO_Rule_20211217_353,RGO_Rule_20211217_221,RGO_Rule_20211217_220,RGO_Rule_20211217_219,RGO_Rule_20211217_199,RGO_Rule_20211217_196,RGO_Rule_20211217_195
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
975-8351797-7122581,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
785-6259585-7858053,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
057-4039373-1790681,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
095-5263240-3834186,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
980-3802574-0009480,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


----