# Retrieve Simple Log Information with Declare4Py

This tutorial will go through the steps necessary to perform a simple analysis of logs with the Declare4Py library.
## Instantiation and simple Utility Functions

Necessary for this tutorial is the `src.declare4py.d4py_event_log.D4PyEventLog` class that contains all the methods for the analysis. For this reason we import `D4PyEventLog` from the `src.declare4py.d4py_event_log` package and the `os` package of python. Then set the path of the log, and instantiate an object of the `D4PyEventLog` class.

In [16]:
import sys
import os
import pathlib

SCRIPT_DIR = pathlib.Path("..", "src").resolve()
sys.path.append(os.path.dirname(SCRIPT_DIR))

from src.declare4py.d4py_event_log import D4PyEventLog

log_path = os.path.join("..", "tests", "Sepsis Cases.xes.gz")

event_log = D4PyEventLog()

The next step is the parsing of the log with the `parse_xes_log` function. Logs can be passed both in the `.xes` or `xes.gz` formats. 
<br> At the moment we are using the `.xes` parser of PM4PY, which might change in the future. 

In [17]:
# Parses a xes log to EventLog
event_log.parse_xes_log(log_path)

parsing log, completed traces ::   0%|          | 0/1050 [00:00<?, ?it/s]

The `event_log` object holds the parsed log, length of the log, frequent set of items and a binary encoding of the log. The last two attributes will be explained in a later paragraph.

Once the log has been successfully parsed, we can get the log itself, its length and the ids of the cases.

In [18]:
# Print the parsed log
print("This is the log:")
print(event_log.get_log())
print("--------------------------------------")

# Print the number of cases in the log
print("Number of cases:")
print(event_log.get_length())
print("--------------------------------------")

# Print the ids of the cases
print("Cases ids:")
print(event_log.get_trace_keys())
print("--------------------------------------")

This is the log:
[{'attributes': {'concept:name': 'A'}, 'events': [{'InfectionSuspected': True, 'org:group': 'A', 'DiagnosticBlood': True, 'DisfuncOrg': True, 'SIRSCritTachypnea': True, 'Hypotensie': True, 'SIRSCritHeartRate': True, 'Infusion': True, 'DiagnosticArtAstrup': True, 'concept:name': 'ER Registration', 'Age': 85, 'DiagnosticIC': True, 'DiagnosticSputum': False, 'DiagnosticLiquor': False, 'DiagnosticOther': False, 'SIRSCriteria2OrMore': True, 'DiagnosticXthorax': True, 'SIRSCritTemperature': True, 'time:timestamp': datetime.datetime(2014, 10, 22, 11, 15, 41, tzinfo=datetime.timezone(datetime.timedelta(seconds=7200))), 'DiagnosticUrinaryCulture': True, 'SIRSCritLeucos': False, 'Oligurie': False, 'DiagnosticLacticAcid': True, 'lifecycle:transition': 'complete', 'Diagnose': 'A', 'Hypoxie': False, 'DiagnosticUrinarySediment': True, 'DiagnosticECG': True}, '..', {'org:group': 'E', 'lifecycle:transition': 'complete', 'concept:name': 'Release A', 'time:timestamp': datetime.datetime(

A useful utility function is: `get_log_alphabet_attribute` for retrieving all the values of an attribute in a log.

In [19]:
# Print the set of resources that are in the log
print("Resources names:")
print(event_log.get_log_alphabet_attribute('org:group'))
print("--------------------------------------")

# Print the set of activities that are in the log
print("Activity names:")
print(event_log.get_log_alphabet_attribute('concept:name'))
print("--------------------------------------")

Resources names:
['L', 'O', 'D', 'R', 'K', 'F', 'T', 'B', 'G', 'N', 'X', 'Y', 'M', 'U', 'Q', 'W', 'E', 'J', 'H', 'S', 'V', 'A', 'C', '?', 'I', 'P']
--------------------------------------
Activity names:
['Return ER', 'LacticAcid', 'ER Sepsis Triage', 'Admission IC', 'IV Antibiotics', 'Release D', 'Release E', 'Release B', 'IV Liquid', 'Release A', 'Admission NC', 'Release C', 'ER Registration', 'ER Triage', 'CRP', 'Leucocytes']
--------------------------------------


A log is a complex data structure that can be explored along several dimensions. The function `attribute_log_projection` projects the cases in the log according to the given input attribute. A projection is a list (the log) of lists (the single cases) containing the value of the attribute.

In [None]:
# Activity projection
for idx, trace in enumerate(event_log.attribute_log_projection("concept:name")):
    print(f"{idx}- {trace}")
print("--------------------------------------")

# Resource projection
for idx, trace in enumerate(event_log.attribute_log_projection("org:group")):
    print(f"{idx}- {trace}")
print("--------------------------------------")

## Frequent Itemsets

`D4PyEventLog` offers support for computing the frequent itemsets of activities/resources in the log. The function `compute_frequent_itemsets` takes as input the `min_support` of the itemsets, the name of the case id attribute, a list with the names of the attributes you want to discover the itemsets, the `algorithm` to perform the computation (available `fpgrowth` and `apriori`) and `len_itemset` indicating the maximum length of the itemsets, the default is `None`.

In [21]:
frequent_itemsets = event_log.compute_frequent_itemsets(min_support=0.8, case_id_col='case:concept:name', categorical_attributes=['concept:name'], algorithm='fpgrowth', len_itemset=3)
frequent_itemsets



Unnamed: 0,support,itemsets,length
0,1.0,(concept:name_ER Triage),1
1,1.0,(concept:name_ER Registration),1
2,0.999048,(concept:name_ER Sepsis Triage),1
3,0.96381,(concept:name_Leucocytes),1
4,0.959048,(concept:name_CRP),1
5,0.819048,(concept:name_LacticAcid),1
6,1.0,"(concept:name_ER Triage, concept:name_ER Regis...",2
7,0.999048,"(concept:name_ER Registration, concept:name_ER...",2
8,0.999048,"(concept:name_ER Triage, concept:name_ER Sepsi...",2
9,0.999048,"(concept:name_ER Triage, concept:name_ER Regis...",3
