feat: add #[repr(u64)] to Repr to optimize clone cost #6

PureWhiteWu · 2023-12-23T10:54:14Z

Motivation

Previously, I noticed the clone cost of FastStr is really high, for example, an empty FastStr clone costs about 40ns on amd64 compared to about 4ns of a normal String.

Solution

After some time of investigation, I found that this is because the Repr::Inline part has really great affect on the performance. And after adding #[repr(u64)] to Repr, the performance boosts about 9x. But the root cause is still not clear.

src/lib.rs

0xd34d10cc · 2023-12-23T13:21:21Z

src/lib.rs

@@ -496,7 +491,7 @@ impl From<Cow<'static, str>> for FastStr {
    }
 }

-const INLINE_CAP: usize = 38;
+const INLINE_CAP: usize = 24;


If you make Inline a separate struct (keep len: u8) and just slap #[repr(align(8))] on it you can have INLINE_CAP = 30 with same performance.

Alternatively try reverting all changes and just adding #[repr(u64)] to Repr enum with INLINE_CAP = 30. This probably will have same effect, as it will change discriminant type from u8 to u64

Hi, thank you very much!
This works! I will change this pr to use this method!

Hi, I've tested again on amd64 and found that this method leads to the clone of Empty costs 8ns instead of 4ns. This method works fine on aarch64, but seems that it doesn't on amd64.
I've fixed this in 342bdc9

feat: manually add padding to optimize clone cost

eefc461

PureWhiteWu self-assigned this Dec 23, 2023

PureWhiteWu mentioned this pull request Dec 23, 2023

Enum field align cause performance degradation about 10x rust-lang/rust#119247

Open

yukiiiteru reviewed Dec 23, 2023

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

PureWhiteWu added 2 commits December 23, 2023 20:20

use usize for len instead of add padding

2a9a245

optimize code

4b66108

PureWhiteWu changed the title ~~feat: manually add padding to optimize clone cost~~ feat: change type of len to usize to optimize clone cost Dec 23, 2023

PureWhiteWu added 2 commits December 23, 2023 20:31

fix clippy

de5d4a9

fix lint

9b5fad5

0xd34d10cc reviewed Dec 23, 2023

View reviewed changes

use repr(u64)

6f76879

PureWhiteWu changed the title ~~feat: change type of len to usize to optimize clone cost~~ feat: add #[repr(u64)] to Repr to optimize clone cost Dec 23, 2023

PureWhiteWu merged commit 1e0a9b2 into main Dec 23, 2023
10 checks passed

PureWhiteWu deleted the feat/optimize_clone branch December 23, 2023 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add #[repr(u64)] to Repr to optimize clone cost #6

feat: add #[repr(u64)] to Repr to optimize clone cost #6

PureWhiteWu commented Dec 23, 2023 •

edited

Loading

0xd34d10cc Dec 23, 2023 •

edited

Loading

0xd34d10cc Dec 23, 2023 •

edited

Loading

PureWhiteWu Dec 23, 2023

PureWhiteWu Dec 24, 2023 •

edited

Loading

feat: add #[repr(u64)] to Repr to optimize clone cost #6

feat: add #[repr(u64)] to Repr to optimize clone cost #6

Conversation

PureWhiteWu commented Dec 23, 2023 • edited Loading

Motivation

Solution

0xd34d10cc Dec 23, 2023 • edited Loading

Choose a reason for hiding this comment

0xd34d10cc Dec 23, 2023 • edited Loading

Choose a reason for hiding this comment

PureWhiteWu Dec 23, 2023

Choose a reason for hiding this comment

PureWhiteWu Dec 24, 2023 • edited Loading

Choose a reason for hiding this comment

PureWhiteWu commented Dec 23, 2023 •

edited

Loading

0xd34d10cc Dec 23, 2023 •

edited

Loading

0xd34d10cc Dec 23, 2023 •

edited

Loading

PureWhiteWu Dec 24, 2023 •

edited

Loading