Reverse iteration behavior #4

Closed
dw opened this Issue Mar 6, 2013 · 14 comments

Comments

Projects
None yet
2 participants

dw commented Mar 6, 2013

Hi there,

First of all thanks for creating Plyvel, it's been incredibly useful in my experimentation with LevelDB.

I'm currently facing an issue where I can't seem to reverse iterate reliably from the end of the store, without knowing the key I requested was past the end. and would like your thoughts.

The options are either to invoke iterator(reverse=True, include_stop=True, stop=k) , or .iterator(reverse=True) then .seek(k).

In the former case, this succeeds if the key exists (and yields the key correctly), otherwise if the key is greater than any in the store, it fails by yielding nothing.

With the second approach, I can reliably iterate from the end of the store when providing a nonexistent key, except if the key I provide actually exists, then it is skipped during the first .next() call.

My workaround for the second approach is to append a single '\x00' to the key if iterating in reverse, which seems to produce my desired result.

My issue is whether the skipping behaviour in the latter case is sensible, and if not, whether my workaround is actually exploiting a bug that might be fixed later. :)

In any case the first approach seems totally wrong. If .Seek() results in an invalid iterator, then I think calling SeekToEnd() should occur.

Thanks again for an excellent library, I use it all the time!

Owner

wbolster commented Mar 6, 2013

You might have found a bug. Please follow along with me while I investigate this using the following test database:

>>> import plyvel
>>> db = plyvel.DB('/tmp/testdb', create_if_missing=True)
>>> db.put(b'b', b'')
>>> db.put(b'c', b'')
>>> db.put(b'd', b'')

Iteration without start/stop works fine:

>>> list(db.iterator(include_value=False))
['b', 'c', 'd']
>>> list(db.iterator(include_value=False, reverse=True))
['d', 'c', 'b']
>>> list(db.iterator(include_value=False, start=b'a'))
['b', 'c', 'd']

Let's try forward iteration. First with start key equal to the first key in the db, then with start key that sorts before the first key in the db:

>>> list(db.iterator(include_value=False, start=b'b'))
['b', 'c', 'd']
>>> list(db.iterator(include_value=False, start=b'a'))
['b', 'c', 'd']

Same results, which seems correct.

Now let's try the same with reverse iterations, first with stop key equal to the last key in the db, then with stop key that sorts after the last key in the db:

>>> list(db.iterator(include_value=False, reverse=True, stop=b'd'))
['c', 'b']
>>> list(db.iterator(include_value=False, reverse=True, stop=b'e'))
[]

Different answers, while I expected the same answer!

By default the start key is inclusive and the stop key is exclusive, hence we only see two items here, not three as with the forward iteration. It doesn't seem to work with stop key included either:

>>> list(db.iterator(include_value=False, reverse=True, stop=b'd', include_stop=True))
['d', 'c', 'b']
>>> list(db.iterator(include_value=False, reverse=True, stop=b'e', include_stop=True))
[]

Again, different answers while I expected the same answer.

Yes, this looks like a bug indeed.

@wbolster wbolster added a commit that referenced this issue Mar 6, 2013

@wbolster wbolster Fix iterators using non-existent start/stop keys
Fixes issue #4.
23ae79a
Owner

wbolster commented Mar 6, 2013

Hi, could you please test my recent fixes? I'd love to hear your feedback.

dw commented Mar 7, 2013

Woah thanks, that was very fast. :)

The new version fixes the problem with seeking past the end, although I still seem to need a workaround to reverse-iterate from a fixed key:

>>> db = plyvel.DB('test2.ldb', create_if_missing=1)
>>> db.put('a', '')
>>> db.put('b', '')
>>> it = db.iterator(reverse=True)
>>> it.seek('b')
>>> list(it)
[('a', '')]

Perhaps I am doing something wrong, but I can't seem to find a way to implement a generic "iterate backwards from the given key" function, without first mutating the key, using a function like your bytes_increment(). Either I must know the key is the last in the set (and call .seek_to_stop() instead), or I must mutate it.

>>> it.seek('c') # result of bytes_increment() on 'b'
>>> list(it)
[('b', ''), ('a', '')] # desired result

If I am not mistaken, the underlying iterator API exposes the correct (it.Key(), it.Value()) immediately after calling .Seek('b'). So the question is how to access that tuple through the Python interface.

dw commented Mar 7, 2013

Hmm nopes, there's still something funky happening:

>>> db.put('a', '')
>>> db.put('b', '')
>>> db.put('d', '')
>>> list(db.iterator(include_value=False, stop='c', reverse=True, include_stop=True))
['d', 'b', 'a']  # d :(

dw commented Mar 7, 2013

Hi there,

In an hour or so I will send a patch that cures this issue. Currently just fixing some test breakage I've caused

Owner

wbolster commented Mar 7, 2013

Hmmm. I used a comparator call for the forward iteration case in my recent changes. The backward iteration case was also patched, but does not call the comparator. Maybe it should?

dw commented Mar 7, 2013

Hello again,

I have basically spent the entire day trying to come up with a bug free version of the iterator, but all attempts end up in a mess.

At one stage I had it down to a single test failing, and 100 lines of code removed. There is a fundamental problem with how it is designed.. No code should need so many if statements and state variables. :(

I am providing a patch below, however it is also a mess. Although it passes all tests, it has broken behaviour, such as with .seek(). There are a few problems here that are confusing me so much I stopped work:

  • The underlying API gives the guarantee that Seek(target) results in "target<=key() || !Valid()", although Plyvel documentation claims something like "target>=key()". I'm not sure why that is.
  • seek('') on a reverse iterator when there is no '' key should result in StopIteration for the following next(). Snapping back to the smallest key in the store is (I think) non-intuitive.
  • For the keys ['1', '2', '3', '4'], seek('2.5') on a reverse iterator will latch '3'. I'm not sure if this makes sense: does "reverse=True" signify "next() and prev() are swapped", or does it signify "the underlying sequence and comparator are reversed"?

Confused :)

http://pastie.org/6416838

Owner

wbolster commented Mar 9, 2013

I agree with your finding that the iterator code is really hairy. :) This issue needs some more love to be resolved, unfortunately…

As I understand it, seeking results in target >= key() || !Valid() (that is >=, not <=, as in your comment). From LevelDB's iterator.h:

// Position at the first key in the source that at or past target
// The iterator is Valid() after this call iff the source contains
// an entry that comes at or past target.
virtual void Seek(const Slice& target) = 0;

Note the ‘at or past target’ here.

Owner

wbolster commented Mar 9, 2013

Okay, I have a fix that solves the problem you reported after my previous changes. Your example seems to work as expected now:

>>> import plyvel
>>> db = plyvel.DB('/tmp/testdb', create_if_missing=True)
>>> db.put('a', '')
>>> db.put('b', '')
>>> db.put('d', '')

Using c as the stop key:

>>> list(db.iterator(include_value=False, stop='c', reverse=True, include_stop=True))
['b', 'a']
>>> list(db.iterator(include_value=False, stop='c', reverse=True, include_stop=False))
['b', 'a']

Using d as the stop key:

>>> list(db.iterator(include_value=False, stop='d', reverse=True, include_stop=True))
['d', 'b', 'a']
>>> list(db.iterator(include_value=False, stop='d', reverse=True, include_stop=False))
['b', 'a']

@wbolster wbolster added a commit that referenced this issue Mar 9, 2013

@wbolster wbolster Make sure .real_prev() respects iterator boundaries
See issue #4.
54d790a
Owner

wbolster commented Mar 9, 2013

The seeking issue you reported earlier (third bullet in your comment) seems to be solved as well now:

>>> import plyvel
>>> db = plyvel.DB('/tmp/testdb', create_if_missing=True)
>>> db.put('1', '')
>>> db.put('2', '')
>>> db.put('3', '')
>>> db.put('4', '')
>>> list(db)
[('1', ''), ('2', ''), ('3', ''), ('4', '')]
>>> it = db.iterator()
>>> it.seek('2.5')
>>> list(it)
[('3', ''), ('4', '')]
>>> it = db.iterator(reverse=True)
>>> it.seek('2.5')
>>> list(it)
[('2', ''), ('1', '')]

dw commented Mar 13, 2013

So sorry for the delay, I've been working on another part of my project. :P~

Should have this re-tested by sometime tomorrow

On 9 March 2013 23:27, Wouter Bolsterlee notifications@github.com wrote:

The seeking issue your reported earlier (third bullet in your comment)
seems to be solved as well now:

import plyvel>>> db = plyvel.DB('/tmp/testdb', create_if_missing=True)
db.put('1', '')>>> db.put('2', '')>>> db.put('3', '')>>> db.put('4', '')>>> list(db)[('1', ''), ('2', ''), ('3', ''), ('4', '')]>>> it = db.iterator()>>> it.seek('2.5')>>> list(it)[('3', ''), ('4', '')]>>> it = db.iterator(reverse=True)>>> it.seek('2.5')>>> list(it)[('2', ''), ('1', '')]


Reply to this email directly or view it on GitHubhttps://github.com/wbolster/plyvel/issues/4#issuecomment-14671042
.

dw commented Mar 15, 2013

I can joyously confirm this is all working now. :) The bug is resolved from my perspective.

I have one further problem that is causing me difficulty, which is implementing the interface described here: http://5.39.91.176/centidb/#engine-interface .

It turns out for my design, it is not sufficient to stop at some known highest key or prefix, there are occasions where I need to reverse iterate from some key, or the very next highest in the store.

The reason for this is due to how batch compression works in this library. Given keys (a, b, c, d), if they are batch compressed, a single key (d+a+b+c) will exist, such that a single ">=" lookup for any of those keys will immediately return the correct record.

My problem is that when iterating in reverse starting from "c", I need to seek to "c or higher", then "walk backwards, starting with the seeked key, whatever it was". It seems there is no neat combination of iterator options that will allow this. Given keys "a b d", calling "db.iterator(reverse=True)" followed by "seek(c)" will only yield "b a", yet c's value is really stored as part of the batch "d".

The only approach I can find that allows this query to succeed is "db.iterator()", "seek(c)", "next(it)", compare returned key, is it outside collection's range? No - is it a batch? Yes - ok, handle batch. Now call "prev()" to skip the result we just seen, and then finally use "iter(it.prev, None)" to build a new iterator that moves in the correct direction.

Obviously that last approach is horrid :) But it seems a limit of the current API?

Thanks for all your help so far!

David

dw commented Mar 15, 2013

Just a note to say I edited the ticket text, so if you're reading this by email, I fixed some mistakes in the text.

wbolster closed this Mar 15, 2013

wbolster was assigned Jun 17, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment