# Process causality project

In [1]:
from environment import *

case_id = 'case:concept:name'
activity_id = 'concept:name'

## Simulation
In order to have two versions of a process, we decided to create our own processes. For this purpose, we created two bpmn models representing a basic version of an order-to-cash process and a changed version. For a better understanding of the process, we created a so-called set of rules representing the activities.

In [2]:
unchanged_basic_ruleset = [
    "Check stock availability",
    "Check raw materials availability",
    (
        [
            "Request raw materials from Supplier 1",
            "Obtain raw materials from Supplier 1"
        ],
        [
            "Request raw materials from Supplier 2",
            "Obtain raw materials from Supplier 2"
        ]
    ),
    "Manufacture product",
    "Retrieve product from warehouse",
    "Confirm order",
    (
        [
            "Get shipping address",
            "Ship product"
        ],
        [
            "Emit invoice",
            "Receive Payment"
        ]
    ),
    "Archive order"
]
changed_basic_ruleset = [
    "Check stock availability",
    "Check raw materials availability",
    "Notify unavailability to customer",
    (
        "Request raw materials from Supplier 1",
        "Request raw materials from Supplier 2"
    ),
    (
        "Obtain raw materials from Supplier 1",
        "Obtain raw materials from Supplier 2"
    ),
    "Manufacture product",
    "Retrieve product from warehouse",
    "Confirm order",
    "Get shipping address",
    (
        "Ship product",
        [
            "Emit invoice",
            "Receive Payment"
        ]
    ),
    "Archive order"
]
print(unchanged_basic_ruleset)
print(changed_basic_ruleset)

['Check stock availability', 'Check raw materials availability', (['Request raw materials from Supplier 1', 'Obtain raw materials from Supplier 1'], ['Request raw materials from Supplier 2', 'Obtain raw materials from Supplier 2']), 'Manufacture product', 'Retrieve product from warehouse', 'Confirm order', (['Get shipping address', 'Ship product'], ['Emit invoice', 'Receive Payment']), 'Archive order']
['Check stock availability', 'Check raw materials availability', 'Notify unavailability to customer', ('Request raw materials from Supplier 1', 'Request raw materials from Supplier 2'), ('Obtain raw materials from Supplier 1', 'Obtain raw materials from Supplier 2'), 'Manufacture product', 'Retrieve product from warehouse', 'Confirm order', 'Get shipping address', ('Ship product', ['Emit invoice', 'Receive Payment']), 'Archive order']


These activities define two similar but different processes. For experimentation, we can now load the bpmn's and simulate some event logs.

In [3]:
from source.misc import read_bpmn
from source.simulation import basic_bpmn_petri_net

unchanged_bpmn = read_bpmn(BPMN_DIR_PATH,'Order-to-Cash-Model-1.bpmn')
changed_bpmn = read_bpmn(BPMN_DIR_PATH,'Order-to-Cash-Model-2.bpmn')

unchanged_eventlog = basic_bpmn_petri_net(unchanged_bpmn)
changed_eventlog = basic_bpmn_petri_net(changed_bpmn)

In [4]:
unchanged_eventlog[unchanged_eventlog[case_id].isin(['C0000','C0001','C0002'])].style

Unnamed: 0,concept:name,time:timestamp,case:concept:name
0,Check stock availability,1970-04-26 19:46:40,C0000
1,Check raw materials availability,1970-04-26 19:46:41,C0000
2,Request raw materials from Supplier 2,1970-04-26 19:46:42,C0000
3,Obtain raw materials from Supplier 2,1970-04-26 19:46:43,C0000
4,Request raw materials from Supplier 1,1970-04-26 19:46:44,C0000
5,Obtain raw materials from Supplier 1,1970-04-26 19:46:45,C0000
6,Manufacture product,1970-04-26 19:46:46,C0000
7,Confirm order,1970-04-26 19:46:47,C0000
8,Emit invoice,1970-04-26 19:46:48,C0000
9,Get shipping address,1970-04-26 19:46:49,C0000


In [5]:
changed_eventlog[changed_eventlog[case_id].isin(['C0000','C0001','C0002'])].style

Unnamed: 0,concept:name,time:timestamp,case:concept:name
0,Check stock availability,1970-04-26 19:46:40,C0000
1,Check raw materials availability,1970-04-26 19:46:41,C0000
2,Request raw materials from Supplier 2,1970-04-26 19:46:42,C0000
3,Request raw materials from Supplier 1,1970-04-26 19:46:43,C0000
4,Obtain raw materials from Supplier 2,1970-04-26 19:46:44,C0000
5,Obtain raw materials from Supplier 1,1970-04-26 19:46:45,C0000
6,Get shipping address,1970-04-26 19:46:46,C0000
7,Manufacture product,1970-04-26 19:46:47,C0000
8,Confirm order,1970-04-26 19:46:48,C0000
9,Emit invoice,1970-04-26 19:46:49,C0000


Despite having courios timestamps, both processes are simulated according to the bpmn. If we now apply some scenario data for the processes, we can get a more realistic version. But let's look at the scenarios first.

In [6]:
from source.misc import get_scenario

unchanged_scenario = get_scenario(SIMULATION_DATA_DIR_PATH, 'Order-to-Cash_unchanged.csv')
changed_scenario = get_scenario(SIMULATION_DATA_DIR_PATH, 'Order-to-Cash_changed.csv')

In [7]:
print('unchanged_scenario')
for kpi in unchanged_scenario:
    print('   kpi      :',kpi)
    print('   apply_to :', unchanged_scenario[kpi]['apply_to'])
    for activity, function in unchanged_scenario[kpi]['functions'].items():
        print('      ', activity+' '*(38-len(activity)), ':', function)
print('changed_scenario')
for kpi in changed_scenario:
    print('   kpi      :',kpi)
    print('   apply_to :', changed_scenario[kpi]['apply_to'])
    for activity, function in changed_scenario[kpi]['functions'].items():
        print('      ', activity+' '*(38-len(activity)), ':', function)

unchanged_scenario
   kpi      : time
   apply_to : None
       Check stock availability               : <function get_scenario.<locals>.<lambda> at 0x00000167287A1310>
       Check raw materials availability       : <function get_scenario.<locals>.<lambda> at 0x0000016728A333A0>
       Request raw materials from Supplier 1  : <function get_scenario.<locals>.<lambda> at 0x0000016728A338B0>
       Request raw materials from Supplier 2  : <function get_scenario.<locals>.<lambda> at 0x0000016728A339D0>
       Obtain raw materials from Supplier 1   : <function get_scenario.<locals>.<lambda> at 0x0000016728A33AF0>
       Obtain raw materials from Supplier 2   : <function get_scenario.<locals>.<lambda> at 0x0000016728A33C10>
       Manufacture product                    : <function get_scenario.<locals>.<lambda> at 0x0000016728A33D30>
       Retrieve product from warehouse        : <function get_scenario.<locals>.<lambda> at 0x0000016728A33E50>
       Confirm order                          :

It is hard to see, but all activities have been assigned functions to simulate the behavior in a process flow. If we now apply these methods, we get a more realistic event log.

In [8]:
from source.operation import apply_scenario

unchanged_eventlog = apply_scenario(unchanged_eventlog, unchanged_scenario, activity_id)
changed_eventlog = apply_scenario(changed_eventlog, changed_scenario, activity_id)

In [9]:
unchanged_eventlog[unchanged_eventlog[case_id].isin(['C0000','C0001','C0002'])].style

Unnamed: 0,concept:name,time:timestamp,case:concept:name,time,cost
0,Check stock availability,1970-04-26 19:46:40,C0000,0.016667,1.833333
1,Check raw materials availability,1970-04-26 19:46:41,C0000,0.016667,1.833333
2,Request raw materials from Supplier 2,1970-04-26 19:46:42,C0000,0.016909,1.84544
3,Obtain raw materials from Supplier 2,1970-04-26 19:46:43,C0000,0.018227,1.911326
4,Request raw materials from Supplier 1,1970-04-26 19:46:44,C0000,0.015704,1.785224
5,Obtain raw materials from Supplier 1,1970-04-26 19:46:45,C0000,0.014321,1.716047
6,Manufacture product,1970-04-26 19:46:46,C0000,0.018878,1.943888
7,Confirm order,1970-04-26 19:46:47,C0000,0.016667,1.833333
8,Emit invoice,1970-04-26 19:46:48,C0000,0.016667,1.833333
9,Get shipping address,1970-04-26 19:46:49,C0000,0.016667,1.833333


In [10]:
changed_eventlog[changed_eventlog[case_id].isin(['C0000','C0001','C0002'])].style

Unnamed: 0,concept:name,time:timestamp,case:concept:name,time,cost
0,Check stock availability,1970-04-26 19:46:40,C0000,0.016667,1.833333
1,Check raw materials availability,1970-04-26 19:46:41,C0000,0.016667,1.833333
2,Request raw materials from Supplier 2,1970-04-26 19:46:42,C0000,0.016667,1.833333
3,Request raw materials from Supplier 1,1970-04-26 19:46:43,C0000,0.016667,1.833333
4,Obtain raw materials from Supplier 2,1970-04-26 19:46:44,C0000,0.016096,1.804814
5,Obtain raw materials from Supplier 1,1970-04-26 19:46:45,C0000,0.012863,1.643144
6,Get shipping address,1970-04-26 19:46:46,C0000,0.016667,1.833333
7,Manufacture product,1970-04-26 19:46:47,C0000,0.017522,1.876102
8,Confirm order,1970-04-26 19:46:48,C0000,0.016667,1.833333
9,Emit invoice,1970-04-26 19:46:49,C0000,0.016667,1.833333


## Data Transformation
Now, to get a view more suitable for machine learning, we can convert the event logs into case tables.

In [11]:
from source.operation import to_case_table

unchanged_case_table = to_case_table(unchanged_eventlog, case_id, activity_id, fillna=0, aggregate={'cost':'sum','time':'sum'})
changed_case_table = to_case_table(changed_eventlog, case_id, activity_id, fillna=0, aggregate={'cost':'sum','time':'sum'})

In [12]:
unchanged_case_table[:10].transpose().style

case:concept:name,C0000,C0001,C0002,C0003,C0004,C0005,C0006,C0007,C0008,C0009
cost Archive order,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Check raw materials availability,1.833333,1.833333,0.0,1.833333,0.0,0.0,0.0,0.0,1.833333,0.0
cost Check stock availability,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Confirm order,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Emit invoice,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Get shipping address,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Manufacture product,1.943888,1.657222,0.0,1.652875,0.0,0.0,0.0,0.0,1.862671,0.0
cost Obtain raw materials from Supplier 1,1.716047,1.853751,0.0,1.802199,0.0,0.0,0.0,0.0,1.815816,0.0
cost Obtain raw materials from Supplier 2,1.911326,1.921692,0.0,1.8518,0.0,0.0,0.0,0.0,1.779773,0.0
cost Receive Payment,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333


In [13]:
changed_case_table[:10].transpose().style

case:concept:name,C0000,C0001,C0002,C0003,C0004,C0005,C0006,C0007,C0008,C0009
cost Archive order,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,0.0,0.0
cost Check raw materials availability,1.833333,1.833333,1.833333,0.0,0.0,1.833333,0.0,0.0,0.0,0.0
cost Check stock availability,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Confirm order,1.833333,1.833333,0.0,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333
cost Emit invoice,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,0.0,0.0
cost Get shipping address,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,0.0,0.0
cost Manufacture product,1.876102,1.798202,0.0,0.0,0.0,1.867339,0.0,0.0,0.0,0.0
cost Notify unavailability to customer,0.0,0.0,1.833333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
cost Obtain raw materials from Supplier 1,1.643144,1.904371,0.0,0.0,0.0,1.842089,0.0,0.0,0.0,0.0
cost Obtain raw materials from Supplier 2,1.804814,1.860429,0.0,0.0,0.0,1.809934,0.0,0.0,0.0,0.0


Finally, we can apply the defined rules and calculate the result. In this case, the times are in hours and the costs are in euros.

In [14]:
from source.operation import calculate_outcome

unchanged_ruleset = {'time':unchanged_basic_ruleset,'cost':None}
changed_ruleset = {'time':changed_basic_ruleset,'cost':None}

unchanged_case_table = calculate_outcome(unchanged_case_table, unchanged_ruleset)
changed_case_table = calculate_outcome(changed_case_table, changed_ruleset)

In [15]:
unchanged_case_table[[case_id,'time','cost']][:10].style

Unnamed: 0,case:concept:name,time,cost
0,C0000,0.154013,23.798097
1,C0001,0.150244,23.827291
2,C0002,0.098146,14.464162
3,C0003,0.149788,23.798626
4,C0004,0.100746,14.703951
5,C0005,0.101362,14.734785
6,C0006,0.097326,14.475395
7,C0007,0.10227,14.780168
8,C0008,0.152653,23.797186
9,C0009,0.100451,14.689237


In [16]:
changed_case_table[[case_id,'time','cost']][:10].style

Unnamed: 0,case:concept:name,time,cost
0,C0000,0.166952,23.669063
1,C0001,0.167385,23.816485
2,C0002,0.05,5.5
3,C0003,0.048289,5.414439
4,C0004,0.049368,5.468387
5,C0005,0.167522,23.812311
6,C0006,0.051095,5.554733
7,C0007,0.04929,5.464524
8,C0008,0.050398,5.519896
9,C0009,0.050784,5.539204


In [17]:
unchanged_case_table.to_csv(CASE_TABLE_DIR_PATH/'unchanged.csv', index=False)
changed_case_table.to_csv(CASE_TABLE_DIR_PATH/'changed.csv', index=False)

## Machine Learning
However, in order to be able to implement our idea, preparation is still required. Since machine learning is involved in the end, it is necessary to take a closer look at the data and process it further if necessary.
### Preprocessing

In [18]:
unchanged_case_table.describe().style

Unnamed: 0,cost Archive order,cost Check raw materials availability,cost Check stock availability,cost Confirm order,cost Emit invoice,cost Get shipping address,cost Manufacture product,cost Obtain raw materials from Supplier 1,cost Obtain raw materials from Supplier 2,cost Receive Payment,cost Request raw materials from Supplier 1,cost Request raw materials from Supplier 2,cost Retrieve product from warehouse,cost Ship product,time Archive order,time Check raw materials availability,time Check stock availability,time Confirm order,time Emit invoice,time Get shipping address,time Manufacture product,time Obtain raw materials from Supplier 1,time Obtain raw materials from Supplier 2,time Receive Payment,time Request raw materials from Supplier 1,time Request raw materials from Supplier 2,time Retrieve product from warehouse,time Ship product,Num of Archive order,Num of Check raw materials availability,Num of Check stock availability,Num of Confirm order,Num of Emit invoice,Num of Get shipping address,Num of Manufacture product,Num of Obtain raw materials from Supplier 1,Num of Obtain raw materials from Supplier 2,Num of Receive Payment,Num of Request raw materials from Supplier 1,Num of Request raw materials from Supplier 2,Num of Retrieve product from warehouse,Num of Ship product,time,cost
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,1.833333,0.902,1.833333,1.833333,1.833333,1.833333,0.902457,0.900522,0.901537,1.833333,0.900042,0.901329,0.930511,1.835008,0.016667,0.0082,0.016667,0.016667,0.016667,0.016667,0.008209,0.00817,0.008191,0.016667,0.008161,0.008187,0.00845,0.0167,1.0,0.492,1.0,1.0,1.0,1.0,0.492,0.492,0.492,1.0,0.492,0.492,0.508,1.0,0.125864,19.173406
std,0.0,0.917008,0.0,0.0,0.0,0.0,0.919515,0.91735,0.91848,0.0,0.91672,0.918151,0.918262,0.081736,0.0,0.008336,0.0,0.0,0.0,0.0,0.008435,0.008387,0.008412,0.0,0.008371,0.008403,0.008411,0.001635,0.0,0.500186,0.0,0.0,0.0,0.0,0.500186,0.500186,0.500186,0.0,0.500186,0.500186,0.500186,0.0,0.025768,4.586142
min,1.833333,0.0,1.833333,1.833333,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,0.0,1.591948,0.016667,0.0,0.016667,0.016667,0.016667,0.016667,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.011839,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.094962,14.306407
25%,1.833333,0.0,1.833333,1.833333,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,0.0,1.779418,0.016667,0.0,0.016667,0.016667,0.016667,0.016667,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.015588,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.100537,14.658897
50%,1.833333,0.0,1.833333,1.833333,1.833333,1.833333,0.0,0.0,0.0,1.833333,0.0,0.0,1.653141,1.833271,0.016667,0.0,0.016667,0.016667,0.016667,0.016667,0.0,0.0,0.0,0.016667,0.0,0.0,0.013063,0.016665,1.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.104821,14.902093
75%,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.831841,1.829084,1.831067,1.833333,1.821965,1.832072,1.834246,1.892096,0.016667,0.016667,0.016667,0.016667,0.016667,0.016667,0.016637,0.016582,0.016621,0.016667,0.016439,0.016641,0.016685,0.017842,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.151673,23.821435
max,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,2.13378,2.102198,2.151632,1.833333,2.072065,2.061039,2.079386,2.163878,0.016667,0.016667,0.016667,0.016667,0.016667,0.016667,0.022676,0.022044,0.023033,0.016667,0.021441,0.021221,0.021588,0.023278,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.160968,24.531701


In [19]:
changed_case_table.describe().style

Unnamed: 0,cost Archive order,cost Check raw materials availability,cost Check stock availability,cost Confirm order,cost Emit invoice,cost Get shipping address,cost Manufacture product,cost Notify unavailability to customer,cost Obtain raw materials from Supplier 1,cost Obtain raw materials from Supplier 2,cost Receive Payment,cost Request raw materials from Supplier 1,cost Request raw materials from Supplier 2,cost Retrieve product from warehouse,cost Ship product,time Archive order,time Check raw materials availability,time Check stock availability,time Confirm order,time Emit invoice,time Get shipping address,time Manufacture product,time Notify unavailability to customer,time Obtain raw materials from Supplier 1,time Obtain raw materials from Supplier 2,time Receive Payment,time Request raw materials from Supplier 1,time Request raw materials from Supplier 2,time Retrieve product from warehouse,time Ship product,Num of Archive order,Num of Check raw materials availability,Num of Check stock availability,Num of Confirm order,Num of Emit invoice,Num of Get shipping address,Num of Manufacture product,Num of Notify unavailability to customer,Num of Obtain raw materials from Supplier 1,Num of Obtain raw materials from Supplier 2,Num of Receive Payment,Num of Request raw materials from Supplier 1,Num of Request raw materials from Supplier 2,Num of Retrieve product from warehouse,Num of Ship product,time,cost
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.487667,0.968,1.833333,1.353,0.487667,0.487667,0.487402,0.480333,0.488807,0.483912,0.487667,0.487667,0.487667,0.867233,0.487818,0.004433,0.0088,0.016667,0.0123,0.004433,0.004433,0.004428,0.004367,0.004456,0.004358,0.004433,0.004433,0.004433,0.007905,0.004436,0.266,0.528,1.0,0.738,0.266,0.266,0.266,0.262,0.266,0.266,0.266,0.266,0.266,0.472,0.266,0.081295,10.375838
std,0.81049,0.915686,0.0,0.806562,0.81049,0.81049,0.81136,0.806562,0.813491,0.805319,0.81049,0.81049,0.81049,0.919518,0.811898,0.007368,0.008324,0.0,0.007332,0.007368,0.007368,0.007417,0.007332,0.007454,0.007291,0.007368,0.007368,0.007368,0.008444,0.007424,0.442085,0.499465,0.0,0.439943,0.442085,0.442085,0.442085,0.439943,0.442085,0.442085,0.442085,0.442085,0.442085,0.499465,0.442085,0.051951,8.099945
min,0.0,0.0,1.833333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045324,5.266222
25%,0.0,0.0,1.833333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,5.5
50%,0.0,1.833333,1.833333,1.833333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05021,5.510504
75%,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,1.696533,1.833333,1.711955,1.700624,1.833333,1.833333,1.833333,1.833798,1.709565,0.016667,0.016667,0.016667,0.016667,0.016667,0.016667,0.013931,0.016667,0.014239,0.014012,0.016667,0.016667,0.016667,0.016676,0.014191,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.164202,23.542439
max,1.833333,1.833333,1.833333,1.833333,1.833333,1.833333,2.106876,1.833333,2.043705,2.071092,1.833333,1.833333,1.833333,2.072692,2.055282,0.016667,0.016667,0.016667,0.016667,0.016667,0.016667,0.022138,0.016667,0.020874,0.021422,0.016667,0.016667,0.016667,0.021454,0.021106,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.173793,24.300581


As can be seen, there are features on both sides that have no standard deviation or have a standard deviation close to zero due to the way they are represented. Furthermore, it is known that some characteristics can carry the same information due to the way they are represented. This is the case if a process step is always performed the same number of times and at the same times and costs (e.g., automatic invoice dispatch). Therefore, it must be checked whether there are features that carry identical information on an aligned scale.

In [20]:
from source.features import prepare_features

prepared_unchanged_case_table, prepared_changed_case_table = prepare_features(unchanged_case_table, changed_case_table)

In [21]:
prepared_unchanged_case_table.describe().style

Unnamed: 0,cost Manufacture product,cost Obtain raw materials from Supplier 1,cost Obtain raw materials from Supplier 2,cost Request raw materials from Supplier 1,cost Request raw materials from Supplier 2,cost Retrieve product from warehouse,cost Ship product,Num of Check raw materials availability,Num of Manufacture product,Num of Obtain raw materials from Supplier 1,Num of Obtain raw materials from Supplier 2,Num of Request raw materials from Supplier 1,Num of Request raw materials from Supplier 2,Num of Retrieve product from warehouse,time,cost
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.902457,0.900522,0.901537,0.900042,0.901329,0.930511,1.835008,0.492,0.492,0.492,0.492,0.492,0.492,0.508,0.125864,19.173406
std,0.919515,0.91735,0.91848,0.91672,0.918151,0.918262,0.081736,0.500186,0.500186,0.500186,0.500186,0.500186,0.500186,0.500186,0.025768,4.586142
min,0.0,0.0,0.0,0.0,0.0,0.0,1.591948,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.094962,14.306407
25%,0.0,0.0,0.0,0.0,0.0,0.0,1.779418,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.100537,14.658897
50%,0.0,0.0,0.0,0.0,0.0,1.653141,1.833271,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.104821,14.902093
75%,1.831841,1.829084,1.831067,1.821965,1.832072,1.834246,1.892096,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.151673,23.821435
max,2.13378,2.102198,2.151632,2.072065,2.061039,2.079386,2.163878,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.160968,24.531701


In [22]:
prepared_changed_case_table.describe().style

Unnamed: 0,cost Emit invoice,cost Get shipping address,cost Manufacture product,cost Obtain raw materials from Supplier 1,cost Obtain raw materials from Supplier 2,cost Receive Payment,cost Request raw materials from Supplier 1,cost Request raw materials from Supplier 2,cost Retrieve product from warehouse,cost Ship product,Num of Archive order,Num of Check raw materials availability,Num of Confirm order,Num of Emit invoice,Num of Get shipping address,Num of Manufacture product,Num of Notify unavailability to customer,Num of Obtain raw materials from Supplier 1,Num of Obtain raw materials from Supplier 2,Num of Receive Payment,Num of Request raw materials from Supplier 1,Num of Request raw materials from Supplier 2,Num of Retrieve product from warehouse,Num of Ship product,time,cost
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,0.487667,0.487667,0.487402,0.488807,0.483912,0.487667,0.487667,0.487667,0.867233,0.487818,0.266,0.528,0.738,0.266,0.266,0.266,0.262,0.266,0.266,0.266,0.266,0.266,0.472,0.266,0.081295,10.375838
std,0.81049,0.81049,0.81136,0.813491,0.805319,0.81049,0.81049,0.81049,0.919518,0.811898,0.442085,0.499465,0.439943,0.442085,0.442085,0.442085,0.439943,0.442085,0.442085,0.442085,0.442085,0.442085,0.499465,0.442085,0.051951,8.099945
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045324,5.266222
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,5.5
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05021,5.510504
75%,1.833333,1.833333,1.696533,1.711955,1.700624,1.833333,1.833333,1.833333,1.833798,1.709565,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.164202,23.542439
max,1.833333,1.833333,2.106876,2.043705,2.071092,1.833333,1.833333,1.833333,2.072692,2.055282,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.173793,24.300581


### Validation
The next step is to divide the characteristics into generic and modified characteristics. The generic features describe the information that is absolutely necessary to represent the process as a model.

In [23]:
generic_features = prepared_unchanged_case_table.drop(columns=['case:concept:name','time','cost']).columns
for feature in generic_features:
    print(feature)

cost Manufacture product
cost Obtain raw materials from Supplier 1
cost Obtain raw materials from Supplier 2
cost Request raw materials from Supplier 1
cost Request raw materials from Supplier 2
cost Retrieve product from warehouse
cost Ship product
Num of Check raw materials availability
Num of Manufacture product
Num of Obtain raw materials from Supplier 1
Num of Obtain raw materials from Supplier 2
Num of Request raw materials from Supplier 1
Num of Request raw materials from Supplier 2
Num of Retrieve product from warehouse


The next step is to find out, which of these characteristics best describe the process. For this we need to choose a model. In this case, we use a regression model that is as simple as possible. Since the calculation of the KPIs is about linear functions, the use of a linear regression is obvious. In addition, all Sklearn compliant estimators are supported. The score is given as a negative mean square error. This means that greater is better or closer to zero is better.

In [24]:
from source.causality import feature_tracing
from sklearn.linear_model import LinearRegression

unchanged_time_feature_table = feature_tracing(LinearRegression(), prepared_unchanged_case_table, generic_features, 'time').sort_values('score', ascending=False)
unchanged_cost_feature_table = feature_tracing(LinearRegression(), prepared_unchanged_case_table, generic_features, 'cost').sort_values('score', ascending=False)

In [25]:
print('feature table for time:')
unchanged_time_feature_table.style.format({'score': '{:.64f}'})

feature table for time:


Unnamed: 0,features,dim,score
84,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Check raw materials availability']",8,-6.936893588e-07
89,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Request raw materials from Supplier 2']",8,-6.936893588e-07
88,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Request raw materials from Supplier 1']",8,-6.936893588e-07
87,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Obtain raw materials from Supplier 2']",8,-6.936893588e-07
86,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Obtain raw materials from Supplier 1']",8,-6.936893588e-07
85,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Manufacture product']",8,-6.936893588e-07
83,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse']",7,-6.936893588e-07
82,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Request raw materials from Supplier 2']",7,-6.936893588e-07
77,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Check raw materials availability']",7,-6.936893588e-07
78,"['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product']",7,-6.936893588e-07


In [26]:
print('feature table for cost:')
unchanged_cost_feature_table.style.format({'score': '{:.64f}'})

feature table for cost:


Unnamed: 0,features,dim,score
97,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Request raw materials from Supplier 2']",10,-0.0
96,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Request raw materials from Supplier 1']",10,-0.0
95,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Obtain raw materials from Supplier 2']",10,-0.0
98,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Retrieve product from warehouse']",10,-0.0
100,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Obtain raw materials from Supplier 2', 'Num of Request raw materials from Supplier 2']",11,-0.0
99,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Obtain raw materials from Supplier 2', 'Num of Request raw materials from Supplier 1']",11,-0.0
91,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 2']",9,-0.0
92,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Request raw materials from Supplier 1']",9,-0.0
90,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1']",9,-0.0
93,"['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Request raw materials from Supplier 2']",9,-0.0


Using these tables, we can now determine the really important features.

In [27]:
time_features = unchanged_time_feature_table.iloc[0]['features']
print('time features:')
print(time_features)
cost_features = unchanged_cost_feature_table.iloc[0]['features']
print('cost features:')
print(cost_features)

time features:
['cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Retrieve product from warehouse', 'Num of Check raw materials availability']
cost features:
['Num of Check raw materials availability', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Retrieve product from warehouse', 'cost Manufacture product', 'cost Request raw materials from Supplier 2', 'cost Obtain raw materials from Supplier 2', 'cost Request raw materials from Supplier 1', 'Num of Manufacture product', 'Num of Obtain raw materials from Supplier 1', 'Num of Request raw materials from Supplier 2']


### Causality Checking
Now we have made all the preparations to start the actual causality check. For this we use the generic features, as well as the associated data. The only thing left to do is to choose a model. We have already decided to use linear regression at the beginning. Therefore it is obvious to use one here as well.<br>
First, we determine the difference between what the unchanged process would output as a result under the same circumstances and what the changed process actually has as a result.

In [28]:
from source.causality import calculate_difference, UNCHANGED_PREDICTION, DIFFERENCE

time_difference = calculate_difference(LinearRegression(), prepared_unchanged_case_table, prepared_changed_case_table, 'time', time_features)
CHANGE = 'change relative'
time_difference[CHANGE] = time_difference[UNCHANGED_PREDICTION]/time_difference['time']
cost_difference = calculate_difference(LinearRegression(), prepared_unchanged_case_table, prepared_changed_case_table, 'cost', cost_features)
cost_difference[CHANGE] = cost_difference[UNCHANGED_PREDICTION]/cost_difference['cost']

In [29]:
print('time difference:')
time_difference[['time',UNCHANGED_PREDICTION,DIFFERENCE,CHANGE]][:10].style

time difference:


Unnamed: 0,time,unchanged prediction,difference,change relative
0,0.166952,23.798097,-0.129034,1.005452
1,0.167385,23.827291,-0.010806,1.000454
2,0.05,14.464162,-8.964162,2.629848
3,0.048289,23.798626,-18.384188,4.3954
4,0.049368,14.703951,-9.235565,2.688901
5,0.167522,14.734785,9.077526,0.618789
6,0.051095,14.475395,-8.920662,2.605957
7,0.04929,14.780168,-9.315644,2.704749
8,0.050398,23.797186,-18.27729,4.311165
9,0.050784,14.689237,-9.150033,2.651868


In [30]:
print('cost difference:')
cost_difference[['cost',UNCHANGED_PREDICTION,DIFFERENCE,CHANGE]][:10].style

cost difference:


Unnamed: 0,cost,unchanged prediction,difference,change relative
0,23.669063,23.798097,-0.129034,1.005452
1,23.816485,23.827291,-0.010806,1.000454
2,5.5,14.464162,-8.964162,2.629848
3,5.414439,23.798626,-18.384188,4.3954
4,5.468387,14.703951,-9.235565,2.688901
5,23.812311,14.734785,9.077526,0.618789
6,5.554733,14.475395,-8.920662,2.605957
7,5.464524,14.780168,-9.315644,2.704749
8,5.519896,23.797186,-18.27729,4.311165
9,5.539204,14.689237,-9.150033,2.651868


In the next step we have to try to explain the difference. For this we can use the same function that we used to examine the generic features. This time, however, we take the features of the changed process. The result can be interpreted in such a way that the better the combination of features can explain the difference, the more likely we can speak of a causality from the changes in the features and the changes in the result.<br>
For the score, the larger it is, the better the difference can be explained. By default, it is the negative mean squared error. That is, the closer the value tends to zero (becomes larger, with the absolute value becoming smaller), the more accurately the difference could be explained. However, the methods support all measurement variants implemented by Sklearn.

In [31]:
time_difference_features = time_difference.drop(columns=['case:concept:name','time','cost',UNCHANGED_PREDICTION,DIFFERENCE,CHANGE]).columns.tolist()
time_explanation = feature_tracing(LinearRegression(), time_difference, time_difference_features, 'time').sort_values('score', ascending=False)
cost_difference_features = cost_difference.drop(columns=['case:concept:name','time','cost',UNCHANGED_PREDICTION,DIFFERENCE,CHANGE]).columns.tolist()
cost_explanation = feature_tracing(LinearRegression(), cost_difference, cost_difference_features, 'cost').sort_values('score', ascending=False)

In [32]:
print('time explanation:')
time_explanation.style.format({'score': '{:.64f}'})

time explanation:


Unnamed: 0,features,dim,score
117,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order']",5,-1.415872063e-07
139,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'Num of Notify unavailability to customer']",6,-1.415872063e-07
121,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Notify unavailability to customer']",5,-1.415872063e-07
127,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse']",5,-1.415872063e-07
116,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Check raw materials availability']",5,-1.415872063e-07
134,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'cost Ship product']",6,-1.419464762e-07
129,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'cost Emit invoice']",6,-1.419938618e-07
130,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'cost Get shipping address']",6,-1.419938618e-07
131,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'cost Receive Payment']",6,-1.419938618e-07
132,"['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order', 'cost Request raw materials from Supplier 1']",6,-1.419938618e-07


In [33]:
print('cost explanation:')
cost_explanation.style.format({'score': '{:.64f}'})

cost explanation:


Unnamed: 0,features,dim,score
186,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Get shipping address']",9,-0.0
187,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Manufacture product']",9,-0.0
189,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Obtain raw materials from Supplier 1']",9,-0.0
190,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Obtain raw materials from Supplier 2']",9,-0.0
191,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Receive Payment']",9,-0.0
192,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Request raw materials from Supplier 1']",9,-0.0
193,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Request raw materials from Supplier 2']",9,-0.0
194,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Ship product']",9,-0.0
182,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'cost Request raw materials from Supplier 1']",9,-0.0
183,"['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'cost Request raw materials from Supplier 2']",9,-0.0


In [34]:
time_explanation.to_csv(CAUSALITY_FEATURE_TABLES_PATH/'time_explanation.csv', index=False)
cost_explanation.to_csv(CAUSALITY_FEATURE_TABLES_PATH/'cost_explanation.csv', index=False)

The last thing we can look at now is which combination of features provides the best explanation.

In [35]:
print('best time explanation:')
best_time_explanation = time_explanation.sort_values('score', ascending=False)
print('features:', best_time_explanation.iloc[0,0])
print('score:', best_time_explanation.iloc[0,2])
print('best cost explanation:')
best_cost_explanation = cost_explanation.sort_values('score', ascending=False)
print('features:', best_cost_explanation.iloc[0,0])
print('score:', best_cost_explanation.iloc[0,2])

best time explanation:
features: ['Num of Archive order', 'cost Manufacture product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Confirm order']
score: -1.415872063682274e-07
best cost explanation:
features: ['Num of Archive order', 'cost Manufacture product', 'cost Ship product', 'cost Obtain raw materials from Supplier 1', 'cost Obtain raw materials from Supplier 2', 'cost Retrieve product from warehouse', 'Num of Retrieve product from warehouse', 'cost Emit invoice', 'Num of Emit invoice', 'Num of Get shipping address']
score: -1.622292451587011e-29
