Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl FromIterator for BytesMut exposes implementation details #723

Open
aatifsyed opened this issue Jul 16, 2024 · 7 comments
Open

impl FromIterator for BytesMut exposes implementation details #723

aatifsyed opened this issue Jul 16, 2024 · 7 comments

Comments

@aatifsyed
Copy link

aatifsyed commented Jul 16, 2024

BytesMut cares about keeping allocation implementation details private, which means no From<Vec<u8>> and no Into<Vec<u8>>:

bytes/src/bytes_mut.rs

Lines 825 to 830 in 4e2c9c0

// For now, use a `Vec` to manage the memory for us, but we may want to
// change that in the future to some alternate allocator strategy.
//
// Thus, we don't expose an easy way to construct from a `Vec` since an
// internal change could make a simple pattern (`BytesMut::from(vec)`)
// suddenly a lot more expensive.

However, this information is leaked in the FromIterator impl:

bytes/src/bytes_mut.rs

Lines 1409 to 1413 in f8c7b57

impl FromIterator<u8> for BytesMut {
fn from_iter<T: IntoIterator<Item = u8>>(into_iter: T) -> Self {
BytesMut::from_vec(Vec::from_iter(into_iter))
}
}

which go through through this specialization:
https://github.com/rust-lang/rust/blob/6c6b3027ef62e911142cfc55589baef4e9f38ec8/library/alloc/src/vec/spec_from_iter.rs#L37-L64

So collection is basically a From<Vec<u8>> with no overhead, which may not be intended by the authors.

use bytes::BytesMut; // 1.6.1

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc; // 0.3.3

fn main() {
    let _guard = dhat::Profiler::new_heap();

    let src = vec![1; 1024];
    let mut_from_vec = measure(|| BytesMut::from_iter(src));
    assert_eq!(mut_from_vec, 0); // doesn't allocate!

    let mut_from_array = measure(|| BytesMut::from_iter([1; 1024]));
    assert_eq!(mut_from_array, 1024);
}

fn measure<T>(f: impl FnOnce() -> T) -> u64 {
    let before = dhat::HeapStats::get();
    f();
    let after = dhat::HeapStats::get();
    after.total_bytes - before.total_bytes
}

OTOH, this probably doesn't count as "an easy way", so maybe it's fine to leave it exposed

@aatifsyed
Copy link
Author

aatifsyed commented Jul 16, 2024

The fix would be to turn off specialization like this:

let src = vec![1; 1024];
let from_vec = measure(|| BytesMut::from_iter(NoSpecialize(src)));
assert_eq!(from_vec, 1024); // this used to be 0

struct NoSpecialize<T>(T);

impl Iterator for NoSpecialize<vec::IntoIter<u8>> {
    type Item = u8;
    fn next(&mut self) -> Option<Self::Item> {
        self.0.next()
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        self.0.size_hint()
    }
}

impl IntoIterator for NoSpecialize<Vec<u8>> {
    type Item = u8;
    type IntoIter = NoSpecialize<vec::IntoIter<u8>>;
    fn into_iter(self) -> Self::IntoIter {
        NoSpecialize(self.0.into_iter())
    }
}

@Darksonn
Copy link
Contributor

Darksonn commented Jul 16, 2024

I believe the section you're quoting is only talking about BytesMut. We already have impl From<Vec<u8>> for Bytes.

@aatifsyed aatifsyed changed the title FromIterator impls expose implementation details impl FromIterator for BytesMut exposes implementation details Jul 16, 2024
@aatifsyed
Copy link
Author

I believe the section you're quoting is only talking about BytesMut. We already have impl From<Vec> for Bytes.

Ah, my mistake! I've updated the issue and comments as appropriate :)

@aatifsyed
Copy link
Author

If you agree that this is a bug, I'll happily open a PR

@seanmonstar
Copy link
Member

What's the bug? That the compiler is able to make the copying faster? That seems like a thing we want to keep.

It doesn't actually expose in the type system the internals.

@aatifsyed
Copy link
Author

Perhaps "bug" is a bit strong.
BytesMut goes to moderate lengths to not allow cheap conversion from a Vec<u8> so that users cannot rely on it.
But because of the interaction of the above traits, a wily user can get a "free" conversion to BytesMut.

If the allocation strategy changes in future (which the comment in from_vec seems to want to leave open), the "free" conversion becomes not free, which violates the comment.

@jorgehermo9
Copy link

jorgehermo9 commented Aug 1, 2024

I don't think people expect a from_iter to be free.

They could think it from something like a Vec<u8>, as it happens with Bytes::from(vec)

You see it like its free in the benchmark most likely because the compiler optimizes that a lot, since it is a hardcoded Vec.

Try with something like this (note the std::hint::black_box):

use bytes::BytesMut; // 1.6.1

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc; // 0.3.3

fn main() {
    let _guard = dhat::Profiler::new_heap();

    let src = vec![1; 1024];
    let mut_from_vec = std::hint::black_box(measure(|| BytesMut::from_iter(src)));
    assert_eq!(mut_from_vec, 0); // doesn't allocate!

    let mut_from_array = std::hint::black_box(measure(|| BytesMut::from_iter([1; 1024])));
    assert_eq!(mut_from_array, 1024);
}

fn measure<T>(f: impl FnOnce() -> T) -> u64 {
    let before = dhat::HeapStats::get();
    f();
    let after = dhat::HeapStats::get();
    after.total_bytes - before.total_bytes
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants