Skip to content
This repository has been archived by the owner on May 13, 2020. It is now read-only.

Commit

Permalink
Try to handle the case where a wid has no wordinfo.
Browse files Browse the repository at this point in the history
This case can arise when the last occurence of a word is removed, or
when a lexicon is shared across multiple indexes.

XXX Not sure this code is correct, but it might be and the tests pass.
If it's wrong, we need more tests.
  • Loading branch information
Jeremy Hylton committed May 16, 2002
1 parent f94baf2 commit c02fcee
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions CosineIndex.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,11 @@ def _search_wids(self, wids):
L = []
DictType = type({})
for wid in wids:
d2w = self._wordinfo[wid] # maps docid to w(docid, wid)
d2w = self._wordinfo.get(wid) # maps docid to w(docid, wid)
if d2w is None:
# Need a test case to cover this
L.append((IIBucket(), scaled_int(1)))
continue
idf = query_term_weight(len(d2w), N) # this is an unscaled float
#print "idf = %.3f" % idf
if isinstance(d2w, DictType):
Expand All @@ -165,7 +169,10 @@ def query_weight(self, terms):
for wid in wids:
if wid == 0:
continue
wt = math.log(1.0 + N / len(self._wordinfo[wid]))
map = self._wordinfo.get(wid)
if map is None:
continue
wt = math.log(1.0 + N / len(map))
sum += wt ** 2.0
return scaled_int(math.sqrt(sum))

Expand Down

0 comments on commit c02fcee

Please sign in to comment.