Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H_1 is not reduced, unnecessary rule candidate generation/checking #45

Closed
3 tasks
selting opened this issue Feb 1, 2022 · 3 comments
Closed
3 tasks

Comments

@selting
Copy link

selting commented Feb 1, 2022

Ok, let's do this the proper way. My PR #44 was not working because I prematurely tried to fix something that I did not fully understand.

Description

When generating rule candidates with 1-item consequents, no pruning is applied. This leads to unnecessary confidence computations for candidates that are below the confidence threshold.

Minimum Working Example

Going from the docstring for generate_rules_apriori, the following example will generate unnecessary candidates

transactions = [
    ('a', 'b', 'c', 'd'),
    ('a', 'b', 'c'),
]
n = len(transactions)
min_sup = 0.5
min_conf = 1.0

itemsets, _ = itemsets_from_transactions(transactions, min_sup, max_length=4, verbosity=1)
rules = list(generate_rules_apriori(itemsets, min_conf, n, verbosity=1))

When deriving rules of size 4, all of these candidates are being checked:

1) 	{b, c, d} -> {a} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
2) 	{a, c, d} -> {b} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
3) 	{a, b, d} -> {c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
4) 	{a, b, c} -> {d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
5) 	{c, d} -> {a, b} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
6) 	{b, d} -> {a, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
7) 	{b, c} -> {a, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
8) 	{a, d} -> {b, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)
9) 	{a, c} -> {b, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
10)	{a, b} -> {c, d} (conf: 0.500, supp: 0.500, lift: 1.000, conv: 1.000)
11)	{d} -> {a, b, c} (conf: 1.000, supp: 0.500, lift: 1.000, conv: 0.000)

In 10) we check {a, b} -> {c, d} although 4) {a, b, c} -> {d} (conf: 0.500) is already below the confidence threshold of 1.0.
Even the docstring says that it is not necessary to check 10) due to confidence-based pruning.

(In my fork, I added a few print statements to easily track what's being done for this very example in h_1_mwe.py)

Tasks

  • confirm that this behavior is not intended
  • find a way to prune H_1 properly
  • test whether it brings any speed improvement
@tommyod
Copy link
Owner

tommyod commented Feb 1, 2022

Great job writing this clear issue @selting ! Thanks a lot! Do you have time to investigate further and propose a solution? If not I will try to take a look at it when I have the time, which might not be any time soon.

@selting
Copy link
Author

selting commented Feb 2, 2022

If I find the time, I will happily give it a try. But similarly to you, this might take a while. The issue is just performance-related, so I guess it's not crucial to fix it immediately.

tommyod pushed a commit that referenced this issue Jan 14, 2023
* Reducing H_1 (#45)

* Avoiding propagation of rhs items in H_1 with confidence < min_confidence

* running black on efficient_apriori/rules.py
@tommyod
Copy link
Owner

tommyod commented Jan 14, 2023

Closed in #53 by @hprshayan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants