Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented iterator type for BitVec #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

ISibboI
Copy link

@ISibboI ISibboI commented Jan 9, 2019

I noticed the BitVec in the bv crate is missing an option to iterate over the bits. The succinct crate has this option. I think the bv crate would benefit from this.

@coveralls
Copy link

coveralls commented Jan 9, 2019

Pull Request Test Coverage Report for Build 225

  • 38 of 56 (67.86%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.3%) to 76.415%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/bit_vec/mod.rs 0 2 0.0%
src/bits_iter.rs 38 54 70.37%
Totals Coverage Status
Change from base Build 217: -0.3%
Covered Lines: 1377
Relevant Lines: 1802

💛 - Coveralls

@tov
Copy link
Owner

tov commented Jan 15, 2019

Thanks for the PR! I have two thoughts/questions:

  • Is an iterator over bits actually useful? I mean, I doubt it's needed often, and it's easy enough to do this:

     wants_an_iterator((.. bv.bit_len()).map(|i| bv.get_bit(i)))
  • If it is useful, I think a better design would be to have a generic BitsIter<T> type that works for any T: Bits.

What do you think?

@ISibboI
Copy link
Author

ISibboI commented Jan 22, 2019

Thanks for the reply!

  • For the usefulness: I think every collection-like data type should implement iterators over its elements. It seems more rusty to me to be able to do .iter() on a collection instead of mimicing the behavior manually.
    And well, I actually needed it. I zipped the iterator with another one to perform conditional operations on each element of the other one.
    One could also replace my struct implementation with your one-liner, and let the .iter() method return an impl Iterator<Item = bool>. The actual type is then inferred at compile time, I think.

  • For the BitsIter<T>: That probably makes sense. I did not think about this.

@tov
Copy link
Owner

tov commented Jan 23, 2019

Okay, if you needed it then it must be at least a bit useful, and I don't think we'd be increasing API complexity significantly anyway. Do you want to implement BitsIter<T>?

@ISibboI
Copy link
Author

ISibboI commented Jan 24, 2019

Yeah I can do that. Should have time for that latest next week.

@ISibboI
Copy link
Author

ISibboI commented Jan 26, 2019

I changed things around and now we have a BitsIter<T>
It's almost perfect, but rust doesn't allow to add the .iter() method to Bits. So it is now in BitVec. Which is fine I guess.
The reason for this is the auto trait implementation for Box<Bits> I think. I get the error that traits with methods that refer to Self can't be made into an object, but I need to refer to Self to be able to return a BitsIter<Self>. Using impl Iterator also doesn't work since it's in a trait.

Also, I can't check for Rust 1.20.0 compliance, as when compiling with 1.20.0 I get errors for lazy_static v1.2.0, which does not seem to be Rust 1.20.0 compatible.

But I feel like we have a good implementation for iterating bits now. What do you think?

Edit: Okay, I just realized that you require 1.24.0, not 1.20.0. Then the impl Trait in return is experimental. It would be fine to just use the actual type here I think.

 * Changed return impl Iterator to return BitsIter
Copy link
Owner

@tov tov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks! This looks very good to me, though there are a few things that I think aren’t quite there yet.

// Invariant: pos <= bits.block_len()

impl<T: Bits> BitsIter<T> {
/// Creates a new block iterator from a `Bits` instance.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems wrong.

}
}

fn cmp_bits_iter<T, U>(iter1: &BitsIter<T>, iter2: &BitsIter<U>) -> Ordering
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is the wrong approach, but I don’t think it’s your fault—I was wrong to write cmp_block_iter, which this appears to be derived from, in the first place.

I should have just called Iterator::cmp, and so (probably) should you. The standard library implements lexicographical order for iterators correctly, whereas my implementation of cmp_block_iter is inconsistent with that, because it treats length (and not even the iterator’s length—the bit length!) as sort of a 0th key before it starts comparing elements. I don’t know if I thought that was a feature or an optimization, or what, but now I think it’s a bug in my code.

That said, there is possibly an optimization opportunity here, which is that you could compare BitsIters block-wise. This assumes the bit order is consistent with comparing whole blocks—and I think that it is regardless of host byte order—but I’m not totally sure. And then it's a bit tricky, because:

  1. If you aren’t starting block-aligned then you have to either go bit-wise until you hit a block boundary, or (probably better) mask out the first block for only the remaining bits you care about.

  2. If the last block is partial then you need to compare in a way that’s consistent with bitwise iteration. For example, if both Bitses last blocks are all 0 bits, but one is longer than the other, then bits-wise you'll notice the difference, but just comparing blocks you won't.

I’m not saying you should do this, but if it looks like a fun project, then there you go. If you do want to do it, you should replace this code here with a call to Iterator::cmp for now, and then I’ll accept this PR. And then if you can replace the library call with an equivalent but faster version (benchmarks?), we’ll make that a separate PR.


impl<T, U> PartialEq<BitsIter<U>> for BitsIter<T>
where T: Bits,
U: Bits<Block = T::Block> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s worth noting that requiring T and U to have the same Block type is useful if we want to do a block-wise optimization (as I discussed above), but otherwise there’s no reason to do so, right? Because we can just iterate over the bits.

@@ -126,7 +126,8 @@ mod bit_vec;
pub use self::bit_vec::BitVec;

mod array_n_impls;
mod iter;
mod block_iter;
mod bits_iter;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this module exposed anywhere, or is the only way to access BitsIter via the BitVec::iter() method? Yeah, it looks like when I build the documentation, there is no link from the result type of that method to any docs for the BitsIter struct, since it isn’t public.

And this makes me note that neither is BlockIter public. The reason, I think, is that I found it useful internally (?) but didn’t want to commit to it as a public API yet.

Anyway, I think that for this to make sense, BitsIter needs to be public, à la BitSlice, BitSliceMut, BitVec, etc., above.

@tov
Copy link
Owner

tov commented Jan 26, 2019

I switched from 1.20.0 to 1.24.0 because it depended something that depended on lazy_static, and it didn't seem worth fighting that.

As for the issue of returning BitsIter<Self> are you saying that it conflicts with impls like this? That's an issue I don't understand very well, but I seem to recall that adding where Self: Sized on the method might help. Or does that just push the problem some place else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants