# Synthetic Log Generation from DECLARE Models

DECLARE4Py implements the generation of synthetic logs from DECLARE models with a solution based on Answer Set Programming that uses a Clingo solver. More details can be found in the paper of Chiariello, F., Maggi, F. M., & Patrizi, F. (2022, June). ASP-Based Declarative Process Mining. In *Proceedings of the AAAI Conference on Artificial Intelligence* (Vol. 36, No. 5, pp. 5539-5547).

As first step, it is necessary to import a `.decl` file containing the DECLARE constraints.

In [1]:
import os
from Declare4Py.ProcessModels.DeclareModel import DeclareModel
from Declare4Py.ProcessMiningTasks.ASPLogGeneration.asp_generator import AspGenerator

model_name = 'sepsis'
model: DeclareModel = DeclareModel().parse_from_file(os.path.join("../../../", "tests", "test_models", f"{model_name}.decl"))

Then, some general settings are needed to set the number of cases to generate and the minimum and maximum number of events for each case

In [2]:
# Number of cases that have be generated
num_of_cases = 50

# Minimum and maximum number of events a case can contain
(num_min_events, num_max_events) = (6,10)

The class `AspGenerator` has to be instantiated with the DECLARE model and the settings of above. Then, the `run` method will generate the cases and the `to_xes` method will save them in a `.xes` event log.

In [3]:
%%time
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)
asp_gen.run("matteo")
asp_gen.to_csv(f'matteo_{model_name}.csv')

DEBUG:ASP generator:Distribution for traces uniform
DEBUG:ASP generator:traces: 50, events can have a trace min(6) max(10)
INFO:ASP generator:Computing distribution
DEBUG:Distributor:Distribution() uniform min_mu: 6 max_sigma: 10 num_traces: 50 custom_prob: None
DEBUG:Distributor:Uniform() probabilities: [Fraction(1, 5), Fraction(1, 5), Fraction(1, 5), Fraction(1, 5), Fraction(1, 5)]
DEBUG:Distributor:Custom_dist() min_mu:6 max_sigma:10 num_traces:50
DEBUG:Distributor:Probabilities sum 1
DEBUG:Distributor:Distribution result: [10  9  7  8  7  7  7 10  9  6 10 10  8  9  6  7  9 10 10  8  7 10  7  8
 10  6  9 10  6  8 10  8  8  6  6  6 10 10 10  9  7  8  9  7 10  7 10  9
  8  8]
INFO:ASP generator:Distribution result Counter({10: 15, 7: 10, 8: 10, 9: 8, 6: 7})
DEBUG:ASP generator:Using custom traces length
INFO:ASP generator:Computing distribution
DEBUG:Distributor:Distribution() uniform min_mu: 6 max_sigma: 10 num_traces: 50 custom_prob: None
DEBUG:Distributor:Uniform() probabilities: [

CPU times: total: 25.4 s
Wall time: 5.79 s


In [4]:
%%time
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)
asp_gen.run("manpreet")
asp_gen.to_csv(f'manpreet_{model_name}.csv')

DEBUG:ASP generator:Distribution for traces uniform
DEBUG:ASP generator:traces: 50, events can have a trace min(6) max(10)
INFO:ASP generator:Computing distribution
DEBUG:Distributor:Distribution() uniform min_mu: 6 max_sigma: 10 num_traces: 50 custom_prob: None
DEBUG:Distributor:Uniform() probabilities: [Fraction(1, 5), Fraction(1, 5), Fraction(1, 5), Fraction(1, 5), Fraction(1, 5)]
DEBUG:Distributor:Custom_dist() min_mu:6 max_sigma:10 num_traces:50
DEBUG:Distributor:Probabilities sum 1
DEBUG:Distributor:Distribution result: [ 8  7  8  8  7 10 10 10  6  8  9  6  8  7  9  6  9  7  7 10  8  6  7  6
  8  9  6  7 10  6  8  9  6  7  7  6  8 10  9  9 10 10  7  7  8  8 10  8
 10  9]
INFO:ASP generator:Distribution result Counter({8: 12, 7: 11, 10: 10, 6: 9, 9: 8})
DEBUG:ASP generator:Using custom traces length
INFO:ASP generator:Computing distribution
DEBUG:Distributor:Distribution() uniform min_mu: 6 max_sigma: 10 num_traces: 50 custom_prob: None
DEBUG:Distributor:Uniform() probabilities: [

CPU times: total: 26.2 s
Wall time: 26.2 s


Logs can be generated with some **purposes** according to the needs of Process Mining algorithms. DECLARE4Py implements four useful purposes that can be set with the following methods of the `AspGenerator` class.


## 1. Setting up the Length Distribution of the Cases

Users can specify a probability distribution over the lengths of the generated traces. The method `set_distribution` takes as parameter the `distributor_type`. By setting this parameter with the `uniform` value, a uniform distribution in `[num_min_events, num_max_events]` is chosen.

In [None]:
asp_gen.set_distribution(distributor_type="uniform")

asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

A `gaussian` distribution requires a location (the mean) and a scale (the variance)

In [None]:
asp_gen.set_distribution(distributor_type="gaussian", loc=5, scale= 2)
asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

A `custom` distribution requires the user to set the probability for each length in `[num_min_events, num_max_events]`

In [None]:
asp_gen.set_distribution(distributor_type="custom",
                         custom_probabilities=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.3])

asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

## 2. Setting up the Variants

Users can generate variants by setting the number of repetitions of the workflow of each case. This is done with the `set_number_of_repetition_per_trace` method

In [None]:
asp_gen.set_number_of_repetition_per_trace(3)

asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

## Setting up Positive and Negative Traces

Users can specify some constraints to be violated in the synthetic cases to obtain labelled logs for binary classification, e.g., for deviance mining algorithms. The method `set_constraints_to_violate` takes as input:

1. `tot_negative_trace`: the number of negative cases to be violated;
2. `violate_all`: whether to violate *all* the specified constraints or let Clingo decide which constraints to be violated;
3. `constraints_list`: the list containing the subset of DECLARE constraints (specified as strings of text) to be violated.

In [None]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

asp_gen.set_constraints_to_violate(tot_negative_trace=10, violate_all=True, constraints_list=[
    "Init[ER Registration] | |",
    "Chain Response[ER Registration, ER Triage] |A.org:group is J |T.org:group is A |"])
asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

In addition, instead of giving the explicit text of the DECLARE constraint, an index can be used in the `set_constraints_to_violate_by_template_index` method

In [None]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

for id, constr_text in enumerate(model.serialized_constraints):
    print(f"{id} - {constr_text}")

asp_gen.set_constraints_to_violate_by_template_index(tot_negative_trace=10, violate_all=True, 
                                                 constraints_idx_list=[0, 3])
asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

## Setting up Rules for the Activation Conditions

Users can specify the number of activations of a DECLARE constraint in the synthetic cases. This can be done with the `set_activation_conditions` method by specifying an interval of activations for specific DECLARE constraints in the loaded model

In [None]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

asp_gen.set_activation_conditions({
'Response[CRP, Release B] |A.org:group is J |T.org:group is A |':
[2, 3]}) # activation should occur between 2 to 3 times

asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')

In addition, instead of giving the explicit text of the DECLARE constraints, an index can be used in the `set_activation_conditions_by_template_index` method

In [None]:
asp_gen: AspGenerator = AspGenerator(model, num_of_cases, num_min_events, num_max_events)

for id, constr_text in enumerate(model.serialized_constraints):
    print(f"{id} - {constr_text}")

asp_gen.set_activation_conditions_by_template_index({3: [2, 3]})
asp_gen.run()
asp_gen.to_xes(f'{model_name}.xes')