-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient batch construction/insert for rle_str #5
Comments
Thanks for the suggestion Adam! In bitvectors batch inserts would take just O(log n + k) time, where k is the number of bits inserted. Tree balancing is not a problem because we would need to rebalance the tree (log n time) only every log n inserted bits (i.e. leaf size), so this cost would be amortized over the k insertions. In wavelet-tree strings this would be multiplied by log sigma. In rle_str this could probably also be done efficiently, but it's more tricky because we need to do multiple batch inserts on all the gap-encoded bitvectors corresponding to characters in the inserted vector.. I'm not sure about the complexity of doing this. I'll think about it. While writing the library I was thinking about doing this, but since my main goal was to implement dynamic RLE FM-indexes I abandoned the idea (because in this case inserts are done at random positions and not in batches). I'll add it to the TODO list. |
@nicolaprezza are you still working on RLE FM-indexes? That sounds really cool! |
Thanks Erik! yes, the RLE FM-index in this library is fully operative and I have already used it to implement a couple of algorithms to compute LZ77 in compressed space. If you are interested in RLE indexes, take a look also to my github.com/nicolaprezza/lz-rlbwt repository: this one is a RLE FM-index combined with LZ77 to achieve compression also of the suffix-array sampling. |
It would be useful (I think) to have a way to make a
rle_str
from avector<int>
, or to insert a wholevector<int>
into anrle_str
at some position.Would breaking those operations out as batch operations make them any more efficient than just inserting each value one at a time? I don't know much about the internals, but I suspect a workload of inserts all at the very end, or all one after the other, might produce tree balancing problems.
The text was updated successfully, but these errors were encountered: