Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WidCode problems after Python 3 migration #83

Open
icemac opened this issue Oct 30, 2019 · 3 comments
Open

WidCode problems after Python 3 migration #83

icemac opened this issue Oct 30, 2019 · 3 comments
Assignees
Labels
bug

Comments

@icemac
Copy link
Member

@icemac icemac commented Oct 30, 2019

I migrated a ZODB for a customer using zodbupdate to Python 3. Now I get the following error when searching for a non-ASCII character in a ZCTextIndex.

...
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCatalog/ZCatalog.py", line 611, in searchResults
    return self._catalog.searchResults(query, **kw)
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCatalog/Catalog.py", line 1091, in searchResults
    return self.search(query, sort_indexes, reverse, sort_limit, _merge)
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCatalog/Catalog.py", line 634, in search
    rs = self._search_index(cr, index_id, query, rs)
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCatalog/Catalog.py", line 564, in _search_index
    index_rs = index.query_index(index_query, rs)
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCTextIndex/ZCTextIndex.py", line 210, in query_index
    results = tree.executeQuery(self.index)
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCTextIndex/ParseTree.py", line 132, in executeQuery
    return index.search_phrase(self.getValue())
  File ".../Products.ZCatalog-5.0.1-py3.7.egg/Products/ZCTextIndex/BaseIndex.py", line 218, in search_phrase
    if docwords.find(code) >= 0:
TypeError: argument should be integer or bytes-like object, not 'str'

query = {'SearchableMetaData': 'Thüringen-Kliniken', ...}
docwords = b'\x92k+$\xfeQO\'\xfeQ`\x06\xfeQP%\xfeR\x05\x0f\xfeQd1\xfeQOL\xfeQ]\x01\xfeQ]\x02\xdfff&\xfeQR\x0b\xfeQYd\xb5\n\x1a"\xfeTq7\xfeQO\'\xfeQ\\g\xfeQo\x7f\xfeQNt\xfeQU\x1d\xda\'VJ\xa9a\x7f%\xfeQPV\xa50VS\xfeQ]Q'
code = 'þQR\x0bþQ`\x06'

self._docwords contains a mixture of byte and str objects. The str ones are the empty ones.
Re-indexing the index did not help to solve the problem.

What is the desired datatype for the docwords? str or bytes. According to WidCode.encode() it seems to be str.

@icemac icemac added the bug label Oct 30, 2019
@icemac icemac self-assigned this Oct 30, 2019
@icemac

This comment has been minimized.

Copy link
Member Author

@icemac icemac commented Oct 30, 2019

@davisagli You worked on WidCode the last time it was changed. Do you have any idea?

My current plan is to iterate over _docwords.values() and covert each value to str using value.decode('latin1'). Does this seem to reasonable?

@icemac

This comment has been minimized.

Copy link
Member Author

@icemac icemac commented Oct 30, 2019

My suggestion in the previous comment at least solved the problem.

@d-maurer

This comment has been minimized.

Copy link
Contributor

@d-maurer d-maurer commented Oct 31, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.