# Log Filters

The class `ProcessMiningTasks.LogFiltering.BasicFilters.BasicFilters` provides several functions to filter a log according to some input requirements. The `BasicFilters` class provides the following filtering functions:

1. `filter_time_range_contained` that filters a log on a time interval.
2. `filter_case_performance` that filters the log by a range of minimum performance and maximum performance, which is the duration of a case.
3. `filter_start_activities` that filters all the activities that start with the specified set of start activities.
4. `filter_end_activities` that filters all the activities that end with the specified set of end activities.
5. `filter_variants_top_k` retains the top-k variants of the log.
6. `filter_variants` filters a log by a specified set of variants.
7. `filter_event_attribute_values` filters a log by the values of some event attribute. 

We first import such a class and the input `xes` log.

In [None]:
import sys
import os
import pathlib

SCRIPT_DIR = pathlib.Path("../../../", "src").resolve()
sys.path.append(os.path.dirname(SCRIPT_DIR))

from src.Declare4Py.ProcessMiningTasks.LogFiltering.BasicFilters import BasicFilters
from src.Declare4Py.D4PyEventLog import D4PyEventLog

log_path = os.path.join("../../../", "tests", "test_logs","Sepsis Cases.xes.gz")
event_log = D4PyEventLog()
event_log.parse_xes_log(log_path)

Then, the `BasicFilters` class is built from the `event_log` object.

In [None]:
log_filters = BasicFilters(event_log)

## `filter_time_range_contained`

The method `filter_time_range_contained` filters a log on a time interval. It takes as input: 
- `start_date` as a string of type `2013-01-01 00:00:00`; 
- `end_date` a string of type `2013-01-01 00:00:00`; 
- `mode` is modality of filtering (takes as input the values `events`, `traces_contained`, `traces_intersecting`). `events`: any event that fits the time frame is retained; `traces_contained`: any trace completely contained in the timeframe is retained; `traces_intersecting`: any trace intersecting with the time-frame is retained.

In [None]:
filtered_log = log_filters.filter_time_range_contained("2013-01-01 00:00:00", "2015-12-31 23:59:59", mode='traces_contained')
print(f"Filtered log for time range:\n{filtered_log}")
print("--------------------------------------")

filtered_log = log_filters.filter_time_range_contained("2013-01-01 00:00:00", "2015-12-31 23:59:59", mode='traces_constrained')
print(f"Filtered log for time range:\n{filtered_log}")
print("--------------------------------------")

filtered_log = log_filters.filter_time_range_contained("2013-01-01 00:00:00", "2015-12-31 23:59:59", mode='events')
print(f"Filtered log for time range:\n{filtered_log}")

## `filter_case_performance`

The method `filter_case_performance` filters the log keeping the cases having a duration (the timestamp of the last event minus the timestamp of the first event) included between `min_performance` and `max_performance`. It takes as input:
- `min_performace`: a floating point value that represents the minimum value of the range;
- `max_performance`: a floating point value that represents the maximum value of the range.

In [None]:
filtered_log = log_filters.filter_case_performance(86400, 864000)
print(f"Filtered on case performance:\n{filtered_log}")

## `filter_start_activities`

The method `filter_start_activities` filters all the activities that start with the specified set of start activities. It takes as input:
- `activities` can be either a set or a list. It is the collection of start activities;
- `retain` a boolean value that if True, retains the traces containing the given start activities, if false, the traces are dropped, default values is: `True`.

In [None]:
start_activities = ["ER Registration", "CRP"]
filtered_log = log_filters.filter_start_activities(start_activities)
print(f"First event of the filtered log with {start_activities} as start activities:\n")
for case in filtered_log:
    print(case[0])

##  `filter_end_activities`

The method `filter_end_activities` filters all the activities that end with the specified set of end activities. It takes as input:
- `activities` can be either a set or a list. It is the collection of the end activities;
- `retain` a boolean value that if True, retains the traces containing the given start activities, if false, the traces are dropped, default values is: `True`.

In [None]:
end_activities = ["Release A", "Release C"]
filtered_log = log_filters.filter_end_activities(end_activities)
print(f"Last event of the filtered log with {end_activities} as end activities:\n")
for case in filtered_log:
    print(case[-1])

## `filter_variants_top_k`

The method `filter_variants_top_k` retains the top-k variants of the log. It takes as input:
- `k` number of variants that should be kept.

In [None]:
k = 2
filtered_variants = log_filters.filter_variants_top_k(k)
print(f"Filtered log on cases following one of the {k} most frequent variants:\n{filtered_variants}")

## `filter_variants`

The method `filter_variants` filters a log by a specified set of variants. It takes as input:
- `variants` can be either a set or a list. It is the collection of the variants by which we want to filter;
- `retain` a boolean value that if True, retains the traces containing the given start activities, if false, the traces are dropped, default values is: `True`.

In [None]:
filtered_variants = log_filters.filter_variants([("ER Registration", "Leucocytes", "CRP", "LacticAcid", "ER Triage", "ER Sepsis Triage", "IV Liquid", "IV Antibiotics", "Admission NC", "CRP,Leucocytes", "Leucocytes", "CRP", "Leucocytes", "CRP", "CRP", "Leucocytes", "Leucocytes", "CRP", "CRP", "Leucocytes", "Release A")])
print(f"Filtered variants on given collection:\n{filtered_variants}")

## `filter_event_attribute_values`

The method `filter_event_attribute_values` filters an event log by the values of some event attribute. It takes as inputs: 
- `attribute_key` attribute to filter;
- `values` admitted (or forbidden) values (accepted both sets and lists);
- `level` specifies how the filter should be applied, default values is: `case`, which filters the cases where at least one occurrence happens, `event` filter the events eventually trimming the cases;
- `retain` a boolean value that specifies if the values should be kept or removed, default values is: `True`.

In [None]:
# This filter keeps the cases where the attribute 'org:group' (i.e., the resource) takes 'A' or 'B' as values
filtered_log = log_filters.filter_event_attribute_values('org:group', ['A', 'B'], level="case", retain=True)
print(f"Cases where org:group is A or B:\n{filtered_log}")

In [None]:
# This filter keeps the cases where the attribute 'Age' is not 85 as values
filtered_log = log_filters.filter_event_attribute_values('Age', [85], level="case", retain=False)
print(f"Cases where age is not 85:\n{filtered_log}")