Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk remove items from TreeIndex: remove_range #120

Closed
novacrazy opened this issue Jan 4, 2024 · 9 comments
Closed

Bulk remove items from TreeIndex: remove_range #120

novacrazy opened this issue Jan 4, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@novacrazy
Copy link
Contributor

Would it be possible to add an efficient remove_range or drain-like API on TreeIndex?

I'd like to use a TreeIndex<u64, Arc<[u8]>> as a queue where many readers keep a pointer into it by index, so they can keep up at their own pace, and periodically clear old entries like .remove_range(..least_recent_index). The alternative would be just repeated calls to .remove with a decrementing counter until it returns false.

@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 5, 2024

Yes, that kind of API would be helpful, however, I wouldn't expect a huge performance gain, since there can always be other threads inserting values in that range, and thus disallowing me to implement something similar to split_off.
-> Still, I can get rid of one tree traversal for each removal (comparing to for e in range { if (cond) { tree.remove(e.0); } }, so the gain would be at around ~30%, I'd say.

I'll implement something next week.

@novacrazy
Copy link
Contributor Author

For my use case at least the key is always atomically incrementing, I keep an Arc<AtomicU64> around for that, so it never would re-insert below the range. Perhaps some sort of unsafe fn split_off or drain that assumes this? If not, I understand, this is a hefty ask.

Thank you again for your continued development and upkeep of scc. It's become a staple library for me.

@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 5, 2024

In my opinion, implementing a new data structure for the purpose is optimal since the ever-increasing key assumption will enable a lot of optimization; I think, a ring-buffer-like data structure will be better than a tree (didn't think about it seriously yet).

On the other hand, implementing fn split_off seems to be a viable option, and maybe it doesn't have to be unsafe, because fn insert can check whether the inserted key has just been split off; though correctly implementing it will be super hard..

Anyway, I'll firstly focus on optimizing the for e in range { if (cond) { tree.remove(e.0); } } case, and then think about any other possible long-term solutions (including a new data structure and split_off) afterwards.

@wvwwvwwv wvwwvwwv self-assigned this Jan 5, 2024
@wvwwvwwv wvwwvwwv added the enhancement New feature or request label Jan 5, 2024
@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 5, 2024

  1. Implement fn remove for Iter and Range.
    => It's relatively trivial; ~ 14 Jan.
  2. Implement fn remove_range<R: RangeBounds<K>>(&self, range: R) for bulk removal.
    => Slightly complicated, but it will be doable! Though I need to figure out possible outcomes when there are other threads inserting keys in the range, and document them.
    => ~ mid Feb?

@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 6, 2024

OK, I tried to implement remove for Iter and Range, but it turns out to be much more complicated than expected.

On the other hand, O(1) bulk removal seems to be quite doable in a couple of weeks.

So, in this issue, as originally suggested, I'll directly implement remove_range.

@wvwwvwwv wvwwvwwv changed the title Bulk remove items from TreeIndex using a Range? Bulk remove items from TreeIndex using a Range Jan 6, 2024
@wvwwvwwv wvwwvwwv changed the title Bulk remove items from TreeIndex using a Range Bulk remove items from TreeIndex: remove_range Jan 6, 2024
@novacrazy
Copy link
Contributor Author

O(1) bulk removal would be more than optimal. Also absolutely no rush on this, and thank you again!

@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 6, 2024

You're welcome! Just note that, O(1) will be a best-case scenario, and most of the time, removing entries close to start and end bounds will require tree traversal.

@wvwwvwwv
Copy link
Owner

wvwwvwwv commented Jan 7, 2024

FYI: uploaded SCC 2.0.9 which contains an unoptimized version of remove_range. To be optimized later.

wvwwvwwv added a commit that referenced this issue Jan 13, 2024
wvwwvwwv added a commit that referenced this issue Jan 13, 2024
wvwwvwwv added a commit that referenced this issue Jan 13, 2024
wvwwvwwv added a commit that referenced this issue Jan 23, 2024
Still, several issues with remove_range.
wvwwvwwv added a commit that referenced this issue Jan 26, 2024
wvwwvwwv added a commit that referenced this issue Jan 31, 2024
Sub-trees are directly removed if they are definitely contained in the
specified range.
@wvwwvwwv
Copy link
Owner

FYI: uploaded 2.0.14 that contains the proposed optimized version of remove_range.
-> Time complexity is O(depth = logN), independent of the range.
-> Not 100% sure about correctness of the method. If the method doesn't seem function correctly, please use for (k, _) in tree.range(a..b, guard) { tree.remove(k); } instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants