Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add `rsdic` combined rank and select data structure #10

Open
wants to merge 33 commits into
base: master
from

Conversation

@sujayakar
Copy link

sujayakar commented Oct 31, 2019

I've ported @hillbig's implementation [1] of Navarro and Providel's data structure [2] for simultaneously accelerating rank and select queries over a bit vector. This structure is great for select heavy workloads, like wavelet matrices [3] that currently binary search over ranks. The shape of the implementation is largely the same but with some small changes and Rust idioms.

The testing is all through quickcheck by testing all inputs for rank and select against a naive implementation.

I've also added some basic benchmarks, and the results are already pretty good. This benchmark generates a random 1M bitvector and queries the rank at 1000 different indices.

rsdic::rank             time:   [16.738 us 16.957 us 17.196 us]
jacobson::rank          time:   [21.522 us 21.721 us 21.958 us]
rank9::rank             time:   [7.7177 us 7.7851 us 7.8541 us]
rsdic::select0          time:   [31.906 us 32.131 us 32.372 us]
rsdic::select1          time:   [34.378 us 34.956 us 35.641 us]
rank9::binsearch::select0
                        time:   [296.19 us 302.05 us 307.71 us]
rank9::binsearch::select1
                        time:   [291.32 us 298.60 us 305.48 us]

So, this data structure is faster than Jacobson for rank but slower than rank9, and it's much faster than binary search for my given benchmark.

Then, I added SIMD acceleration for the rank computation, and it's closer to rank9 now. It's behind an experimental feature flag since that whole ecosystem isn't fully settled yet. Here's the benchmarks rerun with the feature and RUSTFLAGS="-C target-cpu=native" on my 2018 MBP.

rsdic::rank             time:   [8.6908 us 8.7916 us 8.9432 us]
jacobson::rank          time:   [18.180 us 18.654 us 19.135 us]
rank9::rank             time:   [7.3820 us 7.5698 us 7.7660 us]
rsdic::select0          time:   [43.371 us 44.225 us 45.118 us]
rsdic::select1          time:   [35.664 us 36.867 us 38.243 us]
rank9::binsearch::select0
                        time:   [263.92 us 268.39 us 273.62 us]
rank9::binsearch::select1
                        time:   [257.19 us 263.10 us 269.85 us]

Not all of the speed up is just from changing target-cpu: Here's the same benchmark without the SIMD acceleration feature.

rsdic::rank             time:   [13.612 us 13.684 us 13.769 us]                         

[1] https://github.com/hillbig/rsdic
[2] https://users.dcc.uchile.cl/~gnavarro/ps/sea12.1.pdf
[3] https://github.com/sekineh/wavelet-matrix-rs

sujayakar added 27 commits Oct 25, 2019
benchmark results:
```
rsdic::rank             time:   [16.738 us 16.957 us 17.196 us]
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild

jacobson::rank          time:   [21.522 us 21.721 us 21.958 us]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

rank9::rank             time:   [7.7177 us 7.7851 us 7.8541 us]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

rsdic::select0          time:   [31.906 us 32.131 us 32.372 us]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

rsdic::select1          time:   [34.378 us 34.956 us 35.641 us]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

rank9::binsearch::select0
                        time:   [296.19 us 302.05 us 307.71 us]
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

rank9::binsearch::select1
                        time:   [291.32 us 298.60 us 305.48 us]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
```
The nice coincidence here is that we represent the small block classes as u8s
in a dense buffer and that the start of a large block is always divisible by
1024.  So, we can directly load small block classes as a u8x16 vector and then
loop through a large block in a single go.

There's a few tricks described in the comments to turn indexing into the
`ENUM_CODE_LENGTH` table into simple vector operations plus a shuffle.  Check
them out!

Benchmarks after turning on the `simd_acceleration` feature:
```
$ RUSTFLAGS="-C target-cpu=native" cargo bench --features simd_acceleration -- '::rank'

rsdic::rank             time:   [7.9304 ns 7.9607 ns 7.9929 ns]
                        change: [-22.005% -21.462% -20.881%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

jacobson::rank          time:   [13.985 ns 14.057 ns 14.141 ns]
                        change: [-1.9807% -0.6375% +0.6820%] (p = 0.37 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

rank9::rank             time:   [5.8981 ns 5.9204 ns 5.9443 ns]
                        change: [-3.1620% -1.8780% -0.5792%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
```
So, we're not quite at rank9 yet but we're quite close.
- Change raw selection in enum_code to loop over the bits (skipping over zeros with clz) since broadword is really slow
- Improve QC tests to randomize blocks and have lengths that aren't multiples of 64
- Fix bug in enum_code::select0 tests
@tov

This comment has been minimized.

Copy link
Owner

tov commented Nov 5, 2019

Cool stuff! I saw you just made a change. Is this ready to merge?

@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Nov 5, 2019

Yep, I think so! I have some local changes to vectorize select and add a builder to speed up construction, but I can submit subsequent PRs for those :)

@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Nov 19, 2019

@tov, ping on this? Let me know if there's anything I can help with.

@tov

This comment has been minimized.

Copy link
Owner

tov commented Nov 20, 2019

@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Dec 23, 2019

hey @tov, I'm going to hack on this a bit over the holidays anyways, so I'll put it up in a separate repo and we can chat about merging later if you'd like.

@sujayakar sujayakar closed this Dec 23, 2019
@tov

This comment has been minimized.

Copy link
Owner

tov commented Dec 23, 2019

Okay, please keep me posted. Over the holidays is when I have time as well…

@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Dec 24, 2019

Cool, I've put up the library at https://github.com/sujayakar/rsdict and fixed most of the issues I wanted to (docstrings, from_blocks constructor, runtime CPU feature detection), so take a look when you have time. Shall I resubmit a PR?

@tov

This comment has been minimized.

Copy link
Owner

tov commented Dec 29, 2019

Yes, please do! Sorry again for the delay.

@sujayakar sujayakar reopened this Dec 30, 2019
@@ -19,7 +19,7 @@ It’s [on crates.io](https://crates.io/crates/succinct), so you can add

```toml
[dependencies]
succinct = "0.5.2"
succinct = "0.5.4"

This comment has been minimized.

Copy link
@sujayakar

sujayakar Dec 30, 2019

Author

@tov, I've also bumped the version here for the new RsDict export

@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Dec 30, 2019

hmm, looks like packed_simd only supports nightly Rust. @tov, would you rather we put SIMD support behind a feature gate or only support nightly for this crate?

Edit: I just added the feature gate :)

@tov

This comment has been minimized.

Copy link
Owner

tov commented Feb 20, 2020

+1 to the feature gate.

}
}

// TODO: Generate this using `const fn` when it stabilizes.

This comment has been minimized.

Copy link
@tov

tov Feb 20, 2020

Owner

Should we just generate this some other? A macro? A script that spits out text?


[[bin]]
name = "rsdict_fuzz"
path = "src/rsdict/fuzz.rs"

This comment has been minimized.

Copy link
@tov

tov Feb 20, 2020

Owner

This seems like a test, not a [[bin]].

This comment has been minimized.

Copy link
@sujayakar

sujayakar Feb 20, 2020

Author

ah, yeah I made it a binary since it runs forever, and we wouldn't want that running with cargo test. I could make it run for some bounded amount of time and change it to a test or leave it as is. what do you think?

This comment has been minimized.

Copy link
@tov

tov Feb 20, 2020

Owner

If I recall correctly, there’s a way to mark a test so that it isn’t run be default. If that doesn‘t work, maybe it could be an example?

(BTW, you’ve done really impressive work here. I should have said that first!)

Sujay Jayakar added 4 commits Feb 20, 2020
 - Shrink the table by 1) not storing zeros and 2) exploiting each row's symmetry
…ets the job done)
Sujay Jayakar
@sujayakar

This comment has been minimized.

Copy link
Author

sujayakar commented Feb 26, 2020

@tov, this should be ready for another review!

@ucyo

This comment has been minimized.

Copy link

ucyo commented Mar 30, 2020

@sujayakar There seems to be a change in behaviour with RsDict. The rank1(pos) operation does not consider the bit at pos anymore. Minimal example:

    let keys = vec![0,2,4,5];
    let mut bv: BitVector<u64> = BitVector::with_fill(m as u64 + 1, false);
    for k in keys {
        bv.set_bit(k as u64, true);
    }
    let mut rs = RsDict::new();
    for bit in bv.iter() {
        rs.push(bit);
    }
    let jj = Rank9::new(bv);
    println!("bv {} rs {}", jj.rank1(0), rs.rank1(0));  // jj returns 1, while rs returns 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.