# Correlated Filter Example

The `CorrelatedFilter` class is used to keep only those rules which are uncorrelated.

## Requirements

To run, you'll need the following:

* A rule set (specifically the binary columns of the rules as applied to a dataset).

----

## Import packages

In [2]:
from iguanas.rule_selection import CorrelatedFilter
from iguanas.correlation_reduction import AgglomerativeClusteringReducer
from iguanas.metrics.pairwise import JaccardSimilarity
from iguanas.metrics.classification import FScore

import pandas as pd

## Read in data

Let's read in some dummy rules (stored as binary columns) and their corresponding performance metric dataframes:

In [3]:
X_rules_train = pd.read_csv(
    'dummy_data/X_rules_train.csv', 
    index_col='eid'
)
X_rules_test = pd.read_csv(
    'dummy_data/X_rules_test.csv', 
    index_col='eid'
)
y_train = pd.read_csv(
    'dummy_data/y_train.csv', 
    index_col='eid'
).squeeze()
y_test = pd.read_csv(
    'dummy_data/y_test.csv', 
    index_col='eid'
).squeeze()

----

## Calculate uncorrelated rules

Firstly, we need to instantiate the class which will perform the correlation reduction. See the `correlation_reduction` module for more information. We'll be using the `AgglomerativeClusteringReducer` class from that module.

To instantiate the `AgglomerativeClusteringReducer` class, we need to first choose our similarity function. See the `metrics.pairwise` module for more information. In this example, we'll use the Jaccard similarity:

In [4]:
js = JaccardSimilarity()

Now we can instantiate the `AgglomerativeClusteringReducer` class with the necessary parameters. See the class docstring for more information regarding these:

In [5]:
f1 = FScore(1)

In [6]:
params = {
    'threshold': 0.1,
    'strategy': 'bottom_up', 
    'similarity_function': js.fit, 
    'metric': f1.fit
}

In [7]:
agg_clust = AgglomerativeClusteringReducer(**params)

Finally, we can instantiate the `CorrelatedFilter` class and run the `fit` method:

In [8]:
fcr = CorrelatedFilter(correlation_reduction_class=agg_clust)

In [9]:
fcr.fit(
    X_rules=X_rules_train, 
    y=y_train
)

### Outputs

The `fit` method does not return anything. See the `Attributes` section in the class docstring for a description of each attribute generated:

In [10]:
fcr.rules_to_keep

['Rule1', 'Rule2', 'Rule3', 'Rule5']

----

## Drop rules which are correlated

Use the `transform` method to drop the rules which are correlated from a given dataset:

In [11]:
X_rules_test_uncorr = fcr.transform(X_rules=X_rules_test)

### Outputs

The `transform` method returns a dataframe with the correlated rules dropped:

In [12]:
X_rules_test_uncorr.head()

Unnamed: 0_level_0,Rule1,Rule2,Rule3,Rule5
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,0,0,0,0


----

## Calculate correlated rules and drop them from a dataset (in one step)

You can also use the `fit_transform` method to calculate correlated rules and drop them from a dataset (in one step):

In [13]:
agg_clust = AgglomerativeClusteringReducer(**params)

In [14]:
fcr = CorrelatedFilter(correlation_reduction_class=agg_clust)

In [15]:
X_rules_train_uncorr = fcr.fit_transform(
    X_rules=X_rules_train, 
    y=y_train
)

### Outputs

The `fit_transform` method returns a dataframe with the correlated rules dropped. See the `Attributes` section in the class docstring for a description of each attribute generated:

In [16]:
fcr.rules_to_keep

['Rule1', 'Rule2', 'Rule3', 'Rule5']

In [17]:
X_rules_train_uncorr.head()

Unnamed: 0_level_0,Rule1,Rule2,Rule3,Rule5
eid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0,0,0,0
1,0,0,0,0
2,0,0,0,0
3,0,0,0,0
4,0,0,0,0


----