-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boolean Index query result is incorrect when inverted+resultset #28
Comments
The query you gave says The logic has four different cases, the first two share a code path:
|
I've added some more tests to ensure this really works as it should and updated some of the comments on the index. They were outdated and still referred to the early days when the |
Thanks for this work! We can agree we're focused on 2a - Resultset and not-indexed value.
Exactly. I think the issue is aggravated when the resultset contains document ids that do not have an entry in the boolean index at all. Neither True nor False. All tests assume that all objects are indexed in the boolean index, and I'm not sure that assumption is correct. AFAIK - and I could be wrong, when catalog.reindexObject(obj) is called, and it does not have a field corresponding to the boolean index, the index is never touched. Please confirm this. The test case I'm thinking of looks like this:
So the question is, what should we do with these objects in resultset that aren't in self._unindex? Path 1a says remove objects that are not in self._unindex doing a difference removes only those objects that have an entry in the boolean index, leaving the other objects in the result, where as doing an intersection on the inversion will remove the objects that don't have an entry in self.unindex And I argue that "remove them" is the correct answer for path 2a: Additionally, I believe this test will pass, which exercises 1a with additional unindexed objects:
Other solutions include applying boolean indexes before all other indexes in catalog.search, or making sure that all objects in the catalog are included in the _unindex of boolean indexes. And honestly, I'm not qualified to say those are not better options. But I am very hesitant to say that they are. Edit: updated tests, swapping false and true. My logic was backwards. |
Ah, yes. The issue is indeed about documents without entries in the specific index. In the other places where we use the resultset (like the Field/UnIndex) we use the equivalent of I've created a PR #29 to fix the issue. I've used a slightly different approach, by using an intersection with the unindex ( The passed in resultset is in-memory and usually small. The index, holding document ids for the less common value is also a small treeset. The difference call on those two is reasonably fast and doesn't require loading a lot of objects (buckets) from the database. An intersection between such a small result and the On the other hand |
the optimization you present makes sense. Would you like me to do anything? I mean, if one complains, one should be willing to help fix. |
Can you verify that this change actually fixes the problem you were having? I've backported the change to all older branches, but would like some confirmation before I make all those releases. |
Sure. No problem. |
For now, I only ran this fix on my local, and it looks good. Thank you for fixing the bug. |
We are now in the process of updating our Plone 4 to get Products.ZCatalog at 2.13.28 directly from github. I am confident this will fix our issue, but I won't close thist ticket until I get independent confirmation. |
I've gone ahead and cut new releases with this fix for all four branches. There are now 2.13.28, 3.0.3, 3.2.1 and 4.0.1 releases up on PyPi. @esteele You might want to grab these new releases and update Plone to include the right maintenance release for each supported Plone version. This issue covers quite a nasty bug, which makes the catalog return wrong results in some hard to track cases. |
Many thanks to @flipmcf for tracking down and reporting this ugly bug! |
@tseaver Thanks for shout out. Our Navigation Items Adapter is much simpler now. |
This might not be an issue, but a lack of understanding.
I do know however that one of our queries is misbehaving:
The indexed value is "False", so it's an inverted query.
What is returned:
difference(resultset, index)
And I read that as:
"Subtract the indexed values from the result set (all objects satisfying the path query) and return it"
Which is not correct.
Shouldn't this be:
"Invert the index"
difference(self._unindex, index)
And then intersect that with the result setintersection( <that>, resultset)
Also, we have an inverted index and have not modified it, so eventually _invert_index() will run and the error goes away - running a different line of code in _apply_index() (or query_index()) However, I'm not convinced this happens ONLY on inverted indexes. Queries for the unindexed value could suffer the same error.
This code is more than 7 years old and I can't possibly be the one to discover this error, so I'm really hesitant to raise the "Bug Flag".
The text was updated successfully, but these errors were encountered: