Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle None value in field index #12

Merged
merged 9 commits into from
Apr 21, 2017
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,17 @@ matrix:
python: pypy3
- os: osx
language: generic
env: TERRYFY_PYTHON='homebrew 2'
env: TERRYFY_PYTHON='macpython 2.7'
- os: osx
language: generic
env: TERRYFY_PYTHON='macpython 3.4'
- os: osx
language: generic
env: TERRYFY_PYTHON='homebrew 3'
env: TERRYFY_PYTHON='macpython 3.5'
before_install:
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then git clone https://github.com/MacPython/terryfy; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then source terryfy/travis_tools.sh; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then get_python_environment $TERRYFY_PYTHON venv; fi
- if [[ "$TERRYFY_PYTHON" == "homebrew 3" ]]; then alias pip=`which pip3` ; fi
install:
- pip install -e .
script:
Expand Down
29 changes: 11 additions & 18 deletions src/zope/index/field/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,25 +55,17 @@ def index_doc(self, docid, value):
"""See interface IInjection"""
rev_index = self._rev_index
if docid in rev_index:
try:
if docid in self._fwd_index.get(value, ()):
# no need to index the doc, its already up to date
return
except TypeError:
if docid in self._fwd_index.get(value, ()):
# no need to index the doc, its already up to date
return
# unindex doc if present
self.unindex_doc(docid)

try:
# Insert into forward index.
set = self._fwd_index.get(value)
if set is None:
set = self.family.IF.TreeSet()
self._fwd_index[value] = set
set.insert(docid)
except TypeError:
# TypeError is caused by improper keys on the latest version of BTree
pass
# Insert into forward index.
set = self._fwd_index.get(value)
if set is None:
set = self.family.IF.TreeSet()
self._fwd_index[value] = set
set.insert(docid)

# increment doc count
self._num_docs.change(1)
Expand All @@ -84,8 +76,9 @@ def index_doc(self, docid, value):
def unindex_doc(self, docid):
"""See interface IInjection"""
rev_index = self._rev_index
value = rev_index.get(docid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a marker value, to avoid doing both a __contains__ and then a __getitem__ call?

value = rev_index.get(docid, _MARKER)
if value is _MARKER:
   return # not in index
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with python interning rules to know how to do this in a completely safe way. Is there a part of the python spec that discusses this or is it something that can be different from implementation to implementation.

Copy link
Member

@jamadden jamadden Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, there's nothing that's actually in the spec about this specifically.

  • Literals can be automatically interned by all implementations.
  • PyPy can intern numbers automatically (well, it fakes that)
  • Likewise, PyPy fake-interns tuples and frozensets (exactly which ones has changed recently)
  • Strings can be interned manually; PyPy can intern strings automatically (which ones depends on the version)
  • CPython uses freelists for things like lists and tuples (and dicts?), so in theory it's possible to observe two such "different" objects having the same ID, so long as you don't keep a reference to them.
>>> id([])
4527437080
>>> id([])
4527437080
>>> id([1])
4527437080
>>> id([1, 2])
4527437080

(PyPy does exactly the opposite of this, BTW:

>>> id([])
4562493104L
>>> id([])
4562493128L
>>> id([1])
4562493152L

)

I think that "keeping a reference" thing is generally enough to be safe, though, so long as you avoid things that can be written literally like strings and numbers (typically one sees object). You can keep the _MARKER at global module scope or if you want to be extra sure you can cons one up for every function invocation:

marker = object()
value = rev_index.get(docid, marker)
if value is marker:
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create the module-global _MARKER and use it as the default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do switch back to using rev_index.get(docid, _MARKER).

if value is None:
if docid in rev_index:
value = rev_index[docid]
else:
return # not in index

del rev_index[docid]
Expand Down
4 changes: 2 additions & 2 deletions src/zope/index/field/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,10 +320,10 @@ def test_insert_none_value_to_update_does_not_raise_typeerror(self):
index.index_doc(1, 5)
index.index_doc(1, None)

def test_insert_none_value_does_not_insert_into_forward_index(self):
def test_insert_none_value_does_insert_into_forward_index(self):
index = self._makeOne()
index.index_doc(1, None)
self.assertEquals(len(index._fwd_index), 0)
self.assertEquals(len(index._fwd_index), 1)
self.assertEquals(len(index._rev_index), 1)

def test_suite():
Expand Down