-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation Enhancement: Custom comparators are required to accept all bytestrings #117
Comments
thanks for the detailed report. i understand your issue. the "next value" for byte strings is plyvel, not leveldb, and used only for the prefix= iterator. a pluggable function could be one option indeed. another one is to not use a custom comparator at all, since those come at write a performance cost due to the many crossings between python and c++, including outside the python program's control since leveldb compacts in a background thread. (hence the fatal error + abort, there is no python stack to raise from in all cases). perhaps another key design can achieve the same? what do your exactly need? (also i'm travelling without a computer currently so didn't reproduce your samples) |
Thanks for the quick reply. Concerning the key design I'm using pack and unpack because I need to be able to use arbitrary-length tuples of data (strings, integers, floats and bools) and I'm using the lexicographic ordering for tuples to build indexes (I'm trying to build a more complex database on top of LevelDB). For example, if our records are posts (with fields author, year, title, content) coded as dictionaries we can insert them under the tuple-key ("posts", post_uuid). I'm currently thinking of a way to encode keys to preserve their lexicographic ordering and I think I will take that route at the end, but I still think that being able to pass a pluggable function for next value would be valuable in other cases in which not all bytestrings are valid key encodings. If you agree on the solution, I can also send a Pull Request that adds a pluggable function for the next value, but I will have to learn a bit more cython before :) |
Hi! First of all I thank you for the great project.
Summary
I was trying to use plyvel with custom comparators to build another project on top of LevelDB.
I found that custom comparators are passed "all" bytestrings when used in a prefix iterator, and this can give problems if they expect a custom format for keys.
I think it is interesting to report to the project and some mention of it should be added in the documentation.
Detailed Description
To give more structure to the keys I wanted to have them as tuples, that are ordered lexicographically in python. In this way I could have a multi-level index thanks to the custom ordering.
I'm using msgpack to pack the tuples into bytestrings and the bytestrings back to tuples inside of the custom comparator to unpack them.
If I use a prefix iterator (in certain cases) on a database with such comparator I get the traceback:
It took me quite a bit to get what the problem is, but I finally found it.
Minimal Code Sample
My virtualenv (
python 3.7.7
):The code responsible for the error, with debugging print statements inserted:
What it prints when executed:
Notice that the hash of the bytestring ends in
\xff
. It is then clearly visible in the sequence of compare how the second member is the "end of the range" in the last line, which is however invalid msgpack format, so that the comparator function crashes.Questions
I would like to ask a few questions, about the possible fixes that come to my mind about the issue (my usecase in particular):
1. The ideal approach
I understand the need for the iterator to compare the first string outside of the defined range to check if other elements remain to be iterated, even if it was not obvious at first and I think it should be at least mentioned in the documentation.
That said, I think that a good approach would be to let the user define also a custom
next
function which returns the first key after the end of the prefix-range so that one is allowed to use only a subset of keys and can then use custom encodings in them.I understand that there may be issues in the implementation of LevelDB, since this is only a wrapper.
So the question: Do you think the above is feasible?
2. The practical approach
I would like to be assured that this is the only case in which a key outside my range is presented to comparison (of course assuming that only valid keys are inserted into the database).
If it is so, then I would simply capture the exception and know that this is just a key following a bytestring ending in
\xff
, recover the real next tuple and compare this with the other key.The question here is: Is this the only case in which I can be given a key outside my range?
Suggested changes in documentation
I think that mentioning that when calling iterators, keys outside of the user inserted coding can occur in comparison.
In particular it would be great to add something to the Custom Comparator Documentation, when the actual occurrences of this behaviour have been clarified.
Thank you in advance for your time.
The text was updated successfully, but these errors were encountered: