Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient batch construction/insert for rle_str #5

Open
adamnovak opened this issue Mar 23, 2016 · 3 comments
Open

Efficient batch construction/insert for rle_str #5

adamnovak opened this issue Mar 23, 2016 · 3 comments

Comments

@adamnovak
Copy link
Contributor

It would be useful (I think) to have a way to make a rle_str from a vector<int>, or to insert a whole vector<int> into an rle_str at some position.

Would breaking those operations out as batch operations make them any more efficient than just inserting each value one at a time? I don't know much about the internals, but I suspect a workload of inserts all at the very end, or all one after the other, might produce tree balancing problems.

@nicolaprezza
Copy link
Member

Thanks for the suggestion Adam!

In bitvectors batch inserts would take just O(log n + k) time, where k is the number of bits inserted. Tree balancing is not a problem because we would need to rebalance the tree (log n time) only every log n inserted bits (i.e. leaf size), so this cost would be amortized over the k insertions. In wavelet-tree strings this would be multiplied by log sigma. In rle_str this could probably also be done efficiently, but it's more tricky because we need to do multiple batch inserts on all the gap-encoded bitvectors corresponding to characters in the inserted vector.. I'm not sure about the complexity of doing this. I'll think about it.

While writing the library I was thinking about doing this, but since my main goal was to implement dynamic RLE FM-indexes I abandoned the idea (because in this case inserts are done at random positions and not in batches). I'll add it to the TODO list.

@ekg
Copy link
Contributor

ekg commented Mar 26, 2016

@nicolaprezza are you still working on RLE FM-indexes? That sounds really cool!

@nicolaprezza
Copy link
Member

Thanks Erik! yes, the RLE FM-index in this library is fully operative and I have already used it to implement a couple of algorithms to compute LZ77 in compressed space. If you are interested in RLE indexes, take a look also to my github.com/nicolaprezza/lz-rlbwt repository: this one is a RLE FM-index combined with LZ77 to achieve compression also of the suffix-array sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants