Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BytesMut::freeze is not always zero-cost as documented #587

Closed
bk2204 opened this issue Jan 27, 2023 · 5 comments · Fixed by #592
Closed

BytesMut::freeze is not always zero-cost as documented #587

bk2204 opened this issue Jan 27, 2023 · 5 comments · Fixed by #592

Comments

@bk2204
Copy link
Contributor

bk2204 commented Jan 27, 2023

When converting a BytesMut into Bytes via freeze, the operation can cause a re-allocation to occur because we call into_boxed_slice. This is unacceptable in some hot codepaths.

The documentation says that “[t]he conversion is zero cost.” I don't believe that an allocation can be considered zero cost.

Example code:

use bytes::BytesMut;

fn main() {
    let mut b = BytesMut::with_capacity(10);
    b.extend(b"abcdefg");
    let b = b.freeze();
    println!("{:?}", b);
}

If you compile this program and set a breakpoint on realloc, you'll see this (after the startup code):

(gdb) bt
#0  __GI___libc_realloc (oldmem=0x5555555a5ba0, bytes=7) at ./malloc/malloc.c:3394
#1  0x000055555555da2b in alloc::vec::Vec<T,A>::into_boxed_slice ()
#2  0x000055555555cc00 in <bytes::bytes::Bytes as core::convert::From<alloc::vec::Vec<u8>>>::from ()
#3  0x000055555555c762 in bytes_test::main ()
#4  0x000055555555cb03 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#5  0x000055555555cad9 in _ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17hcadd4fe5b7b3f353E.llvm.6218250527406152195 ()
#6  0x00005555555703fb in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> ()
    at library/core/src/ops/function.rs:286
#7  std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:483
#8  std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:447
#9  std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:137
#10 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148
#11 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:483
#12 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:447
#13 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:137
#14 std::rt::lang_start_internal () at library/std/src/rt.rs:148
#15 0x000055555555c915 in main ()

I believe in this case that's because the variant of the BytesMut is KIND_VEC and we're then recreating the Vec and doing an into, which calls into_boxed_slice. This is also the case if one substitutes BytesMut in the code for Vec, but in that case, there's no guarantee of that being zero-cost.

I'm not sure what the best way forward here is, but it would be great if someone could propose an idea where this really is zero cost. I have a server where I need to read some data into a buffer and then transfer it using a Bytes via Hyper, and since this is a very hot codepath, I really want to avoid a reallocation, since the reallocation shows up in my performance graphs.

@Darksonn
Copy link
Contributor

I don't think the realloc is necessary in the Vec<u8> -> Bytes conversion, since a Bytes can be subsliced.

@seanmonstar
Copy link
Member

The details aren't fresh in my mind, but I think the reason it does into_boxed_slice is because there was no room to store the length and capacity.

@bk2204
Copy link
Contributor Author

bk2204 commented Jan 28, 2023

I think this can be pretty easily implemented by turning the Vec<u8> into a shared variant. If folks want a patch, I'm happy to send one, and I've even added a few tests.

@seanmonstar
Copy link
Member

Making it shared does require a new allocation too. Would it be worth checking if the length and capacity are the same, and so into_boxed_slice won't need to shrink?

@bk2204
Copy link
Contributor Author

bk2204 commented Jan 28, 2023

That's true, but it requires a much smaller allocation of a handful of bytes, which I think will be acceptable here. Reallocating my buffer of 64 KiB requires substantially more space and a much larger memcpy as well.

I can certainly see if the length and capacity are the same and use into_boxed_slice. However, in my case, it's definitely not.

bk2204 added a commit to bk2204/bytes that referenced this issue Jan 30, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
bk2204 added a commit to bk2204/bytes that referenced this issue Jan 30, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
bk2204 added a commit to bk2204/bytes that referenced this issue Jan 30, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
bk2204 added a commit to bk2204/bytes that referenced this issue Jan 30, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
bk2204 added a commit to bk2204/bytes that referenced this issue Jan 31, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
bk2204 added a commit to bk2204/bytes that referenced this issue Jan 31, 2023
When we freeze a BytesMut, we turn it into a Vec, and then convert that
to a Bytes.  Currently, this happen using Vec::into_boxed_slice, which
reallocates to a slice of the same length as the Vev if the length and
the capacity are not equal.  This can pose a performance problem if the
Vec is large or if this happens many times in a loop.

Instead, let's compare the length and capacity, and if they're the same,
continue to handle this using into_boxed_slice.  Otherwise, since we
have a type of vtable which can handle a separate capacity, the shared
vtable, let's turn our Vec into that kind of Bytes.  While this does not
avoid allocation altogether, it performs a fixed size allocation and
avoids any need to memcpy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants