Efficient batch construction/insert for rle_str #5

adamnovak · 2016-03-23T19:55:13Z

It would be useful (I think) to have a way to make a rle_str from a vector<int>, or to insert a whole vector<int> into an rle_str at some position.

Would breaking those operations out as batch operations make them any more efficient than just inserting each value one at a time? I don't know much about the internals, but I suspect a workload of inserts all at the very end, or all one after the other, might produce tree balancing problems.

The text was updated successfully, but these errors were encountered:

nicolaprezza · 2016-03-24T11:38:04Z

Thanks for the suggestion Adam!

In bitvectors batch inserts would take just O(log n + k) time, where k is the number of bits inserted. Tree balancing is not a problem because we would need to rebalance the tree (log n time) only every log n inserted bits (i.e. leaf size), so this cost would be amortized over the k insertions. In wavelet-tree strings this would be multiplied by log sigma. In rle_str this could probably also be done efficiently, but it's more tricky because we need to do multiple batch inserts on all the gap-encoded bitvectors corresponding to characters in the inserted vector.. I'm not sure about the complexity of doing this. I'll think about it.

While writing the library I was thinking about doing this, but since my main goal was to implement dynamic RLE FM-indexes I abandoned the idea (because in this case inserts are done at random positions and not in batches). I'll add it to the TODO list.

ekg · 2016-03-26T11:07:14Z

@nicolaprezza are you still working on RLE FM-indexes? That sounds really cool!

nicolaprezza · 2016-03-26T14:04:00Z

Thanks Erik! yes, the RLE FM-index in this library is fully operative and I have already used it to implement a couple of algorithms to compute LZ77 in compressed space. If you are interested in RLE indexes, take a look also to my github.com/nicolaprezza/lz-rlbwt repository: this one is a RLE FM-index combined with LZ77 to achieve compression also of the suffix-array sampling.

chrisbarber mentioned this issue Apr 29, 2019

Batch construction of wavelet trees #13

Closed

ekg mentioned this issue Jul 18, 2020

abort: packed_vector.hpp:465: void dyn::packed_vector::remove(uint64_t): Assertion `(size_ / int_per_word_ == words.size() #24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient batch construction/insert for rle_str #5

Efficient batch construction/insert for rle_str #5

adamnovak commented Mar 23, 2016

nicolaprezza commented Mar 24, 2016

ekg commented Mar 26, 2016

nicolaprezza commented Mar 26, 2016

Efficient batch construction/insert for rle_str #5

Efficient batch construction/insert for rle_str #5

Comments

adamnovak commented Mar 23, 2016

nicolaprezza commented Mar 24, 2016

ekg commented Mar 26, 2016

nicolaprezza commented Mar 26, 2016