H_1 is not reduced, unnecessary rule candidate generation/checking #45

selting · 2022-02-01T00:20:52Z

Ok, let's do this the proper way. My PR #44 was not working because I prematurely tried to fix something that I did not fully understand.

Description

When generating rule candidates with 1-item consequents, no pruning is applied. This leads to unnecessary confidence computations for candidates that are below the confidence threshold.

Minimum Working Example

Going from the docstring for generate_rules_apriori, the following example will generate unnecessary candidates

transactions = [
    ('a', 'b', 'c', 'd'),
    ('a', 'b', 'c'),
]
n = len(transactions)
min_sup = 0.5
min_conf = 1.0

itemsets, _ = itemsets_from_transactions(transactions, min_sup, max_length=4, verbosity=1)
rules = list(generate_rules_apriori(itemsets, min_conf, n, verbosity=1))

When deriving rules of size 4, all of these candidates are being checked:

1) 	{b, c, d} -> {a} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
2) 	{a, c, d} -> {b} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
3) 	{a, b, d} -> {c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
4) 	{a, b, c} -> {d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
5) 	{c, d} -> {a, b} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
6) 	{b, d} -> {a, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
7) 	{b, c} -> {a, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
8) 	{a, d} -> {b, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
9) 	{a, c} -> {b, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
10)	{a, b} -> {c, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
11)	{d} -> {a, b, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)

In 10) we check {a, b} -> {c, d} although 4) {a, b, c} -> {d} (conf: 0.500) is already below the confidence threshold of 1.0.
Even the docstring says that it is not necessary to check 10) due to confidence-based pruning.

(In my fork, I added a few print statements to easily track what's being done for this very example in h_1_mwe.py)

Tasks

confirm that this behavior is not intended
find a way to prune H_1 properly
test whether it brings any speed improvement

The text was updated successfully, but these errors were encountered:

tommyod · 2022-02-01T13:23:28Z

Great job writing this clear issue @selting ! Thanks a lot! Do you have time to investigate further and propose a solution? If not I will try to take a look at it when I have the time, which might not be any time soon.

selting · 2022-02-02T09:41:58Z

If I find the time, I will happily give it a try. But similarly to you, this might take a while. The issue is just performance-related, so I guess it's not crucial to fix it immediately.

* Reducing H_1 (#45) * Avoiding propagation of rhs items in H_1 with confidence < min_confidence * running black on efficient_apriori/rules.py

tommyod · 2023-01-14T15:29:11Z

Closed in #53 by @hprshayan

hprshayan mentioned this issue Jan 13, 2023

Reducing H_1 (#45) #53

Merged

tommyod pushed a commit that referenced this issue Jan 14, 2023

Reducing H_1 (#45) (#53)

938e44a

* Reducing H_1 (#45) * Avoiding propagation of rhs items in H_1 with confidence < min_confidence * running black on efficient_apriori/rules.py

tommyod closed this as completed Jan 14, 2023

selting mentioned this issue Feb 1, 2023

Reducing H_1 update broken #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H_1 is not reduced, unnecessary rule candidate generation/checking #45

H_1 is not reduced, unnecessary rule candidate generation/checking #45

selting commented Feb 1, 2022 •

edited

tommyod commented Feb 1, 2022 •

edited

selting commented Feb 2, 2022

tommyod commented Jan 14, 2023

H_1 is not reduced, unnecessary rule candidate generation/checking #45

H_1 is not reduced, unnecessary rule candidate generation/checking #45

Comments

selting commented Feb 1, 2022 • edited

Description

Minimum Working Example

Tasks

tommyod commented Feb 1, 2022 • edited

selting commented Feb 2, 2022

tommyod commented Jan 14, 2023

selting commented Feb 1, 2022 •

edited

tommyod commented Feb 1, 2022 •

edited