Skip to content

Optimize Bytes::copy_from_slice#365

Open
stepancheg wants to merge 2 commits intotokio-rs:masterfrom
stepancheg:copy-from-slice
Open

Optimize Bytes::copy_from_slice#365
stepancheg wants to merge 2 commits intotokio-rs:masterfrom
stepancheg:copy-from-slice

Conversation

@stepancheg
Copy link
Contributor

Create a new SharedInline Bytes representation, which is:

struct SharedInline {
    ref_cnt: AtomicUsize,
    cap: usize,
    // data: [u8; cap],
}

The advantage of this representation is that we do not need an extra
allocation when cloning Bytes which makes such cloning much faster
and without extra allocation.

The drawback is a slightly lower performance of such object destruction
due to atomic decrement.

The bench:

#[bench]
fn copy_from_slice_and_clone(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef").clone()
    });
}

becomes two times faster (210ns/iter vs 110ns/iter).

While non-shared bench:

#[bench]
fn copy_from_slice(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef")
    });
}

becomes a little slower (85ns/iter vs 90ns/iter).

Create a new `SharedInline` `Bytes` representation, which is:

```
struct SharedInline {
    ref_cnt: AtomicUsize,
    cap: usize,
    data: [cap; u8],
}
```

The advantace of this representation is that we do not need an extra
allocation when cloning `Bytes` which makes such cloning much faster
and without extra allocation.

The drawback is slightly lower performance of such object destruction
due to atomic decrement in constructor.

The bench:

```
#[bench]
fn copy_from_slice_and_clone(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef").clone()
    });
}
```

becomes two times faster (210ns/iter vs 110ns/iter).

While non-shared bench:

```
#[bench]
fn copy_from_slice(b: &mut Bencher) {
    b.iter(|| {
        Bytes::copy_from_slice(b"abcdef")
    });
}
```

becomes a little slower (86ns/iter vs 90ns/iter).
Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand at first, but now I get it:

  • When built with an existing Vec, there's 2 allocations, the Shared which is a skinny Arc, and then the Vec inside it.
  • When we'd need to allocate a Vec ourselves, we could just allocate once, and smash the Arc and Vec into 1.

struct SharedInline {
ref_cnt: AtomicUsize,
cap: usize,
// data: [u8; cap]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be an actual DST? Admittedly I haven't tried to use that part of Rust much...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanmonstar it can't because *mut SharedInline must be one word size (to fit into AtomicPtr()), and DST is two words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants