# Simple Filter Example

The SimpleFilter class is used to filter out low performing rules from a set.

## Requirements

To run, you'll need the following:

* A rule set (specifically the binary columns of the rules as applied to a dataset).

----

## Import packages

In [1]:
from iguanas.rule_selection import SimpleFilter
from iguanas.metrics.classification import FScore

import pandas as pd

## Read in data

Let's read in some dummy rules (stored as binary columns) and the target column.

In [2]:
X_rules_train = pd.read_csv(
    'dummy_data/X_rules_train.csv', 
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv', 
    index_col='eid'
).squeeze()
X_rules_test = pd.read_csv(
    'dummy_data/X_rules_test.csv', 
    index_col='eid'
)
y_test = pd.read_csv(
    'dummy_data//y_test.csv', 
    index_col='eid'
).squeeze()

In [3]:
X_rules_train.columns.tolist()

['Rule1', 'Rule2', 'Rule3', 'Rule4', 'Rule5']

----

## Filter rules based on performance metrics

### Set up class parameters

Now we can set our class parameters for the `SimpleFilter` class. You need to provide the metric you want to filter by, as well as the threshold value and type of operator. Here, we'll be filtering out rules with an F1 score < 0.46. To filter on F1 score, we'll use the `FScore` class from the `metrics` module.

**Please see the class docstring for more information on each parameter.**

In [4]:
f1 = FScore(beta=1)

In [5]:
params = {
    'threshold': 0.46,
    'operator': '>=',
    'metric': f1.fit
}

### Instantiate class and run fit method

Once the parameters have been set, we can run the `fit` method to calculate which rules should be kept.

In [6]:
fr = SimpleFilter(**params)
fr.fit(
    X_rules=X_rules_train, 
    y=y_train
)

### Outputs

The `fit` method does not return anything. See the `Attributes` section in the class docstring for a description of each attribute generated:

In [7]:
fr.rules_to_keep

['Rule1', 'Rule2', 'Rule3']

----

## Drop filtered rules from another dataset

Use the `transform` method to drop the filtered rules from a given dataset.

In [8]:
X_rules_test_filtered = fr.transform(X_rules=X_rules_test)

### Outputs

The `transform` method returns a dataframe with the filtered rules dropped:

In [9]:
X_rules_test_filtered.head()

Unnamed: 0_level_0,Rule1,Rule2,Rule3
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,0,0
1,0,0,0
2,0,0,0
3,0,0,0
4,0,0,0


----

## Calculate filtered rules and drop them from a dataset (in one step)

You can also use the `fit_transform` method to calculate the filtered rules and drop them from the training set.

In [10]:
X_rules_train_filtered = fr.fit_transform(
    X_rules=X_rules_train, 
    y=y_train
)

### Outputs

The `fit_transform` method returns a dataframe with the filtered rules dropped:

In [11]:
fr.rules_to_keep

['Rule1', 'Rule2', 'Rule3']

In [12]:
X_rules_train_filtered.head()

Unnamed: 0_level_0,Rule1,Rule2,Rule3
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,0,0
1,0,0,0
2,0,0,0
3,0,0,0
4,0,0,0


----