Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle None value in field index #12

Merged
merged 9 commits into from
Apr 21, 2017
Merged

handle None value in field index #12

merged 9 commits into from
Apr 21, 2017

Conversation

fgregg
Copy link
Contributor

@fgregg fgregg commented Apr 21, 2017

This should fix a test that depended upon None not being an acceptable key in Btree, which is no longer valid for current versions of BTree.

@jamadden
Copy link
Member

jamadden commented Apr 21, 2017

Interestingly, when the now-failing test was first added back in 9c0933d, the if value is not None line was maintained (indeed, there was one more 'is not None' check at the beginning of the method). (That line was added in d1f299a, but without tests, presumably to workaround this BTrees 4 issue, on the same day.) It was with a subsequent commit 43e8fb8 also on that same day that the if check was removed in favor of catching the TypeError.

Given that the original behaviour under BTrees 3 was to allow None keys, and that the restoration of that behaviour recently was for the benefit of the Plone community (who, IIUC, do want None keys in Products.ZIndex(?)), I'm wondering if the right thing to do might be to go back to the code pre- 9c0933d? That is, remove both the 'if' and 'try' statements (and alter the tests)? That would allow None keys, while not silently hiding TypeError for other invalid keys.

@jamadden
Copy link
Member

jamadden commented Apr 21, 2017

The OS X 'homebrew 3' build failed. It has switched to Python 3.6, and apparently we don't support Python 3.6 yet (the linux build is pinned to 3.5 max). This has been dealt with before in BTrees and others and probably needs to change here too (although probably in a separate PR).

@fgregg
Copy link
Contributor Author

fgregg commented Apr 21, 2017

@jamadden this follows your suggestion.

@@ -84,8 +76,9 @@ def index_doc(self, docid, value):
def unindex_doc(self, docid):
"""See interface IInjection"""
rev_index = self._rev_index
value = rev_index.get(docid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a marker value, to avoid doing both a __contains__ and then a __getitem__ call?

value = rev_index.get(docid, _MARKER)
if value is _MARKER:
   return # not in index
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with python interning rules to know how to do this in a completely safe way. Is there a part of the python spec that discusses this or is it something that can be different from implementation to implementation.

Copy link
Member

@jamadden jamadden Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, there's nothing that's actually in the spec about this specifically.

  • Literals can be automatically interned by all implementations.
  • PyPy can intern numbers automatically (well, it fakes that)
  • Likewise, PyPy fake-interns tuples and frozensets (exactly which ones has changed recently)
  • Strings can be interned manually; PyPy can intern strings automatically (which ones depends on the version)
  • CPython uses freelists for things like lists and tuples (and dicts?), so in theory it's possible to observe two such "different" objects having the same ID, so long as you don't keep a reference to them.
>>> id([])
4527437080
>>> id([])
4527437080
>>> id([1])
4527437080
>>> id([1, 2])
4527437080

(PyPy does exactly the opposite of this, BTW:

>>> id([])
4562493104L
>>> id([])
4562493128L
>>> id([1])
4562493152L

)

I think that "keeping a reference" thing is generally enough to be safe, though, so long as you avoid things that can be written literally like strings and numbers (typically one sees object). You can keep the _MARKER at global module scope or if you want to be extra sure you can cons one up for every function invocation:

marker = object()
value = rev_index.get(docid, marker)
if value is marker:
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create the module-global _MARKER and use it as the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do switch back to using rev_index.get(docid, _MARKER).

@jamadden
Copy link
Member

jamadden commented Apr 21, 2017

It LPGTM and it makes sense to me, but then again it was my suggestion 😄 Plus I don't actually have any commits in this repo. So I don't feel like I should be the sole approver myself. Maybe someone with the most commits (i.e., @tseaver )? Also, @mgedmin brought it up in #9

@@ -84,8 +76,9 @@ def index_doc(self, docid, value):
def unindex_doc(self, docid):
"""See interface IInjection"""
rev_index = self._rev_index
value = rev_index.get(docid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do switch back to using rev_index.get(docid, _MARKER).

@jamadden
Copy link
Member

jamadden commented Apr 21, 2017

The fact that this requires BTrees > 4.4.1, plus the fact that the try/except that hid TypeError for really incomparable keys is gone probably deserves a CHANGES mention.

@fgregg
Copy link
Contributor Author

fgregg commented Apr 21, 2017

@tseaver @jamadden I think this is good to go.

@jamadden
Copy link
Member

It still LGTM, and I believe the comment @tseaver made has been addressed. I think you can hit the merge button 😄 (Then I'll take a look at #8 since the builds are passing again.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants