Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add #[repr(u64)] to Repr to optimize clone cost #6

Merged
merged 6 commits into from
Dec 23, 2023

Conversation

PureWhiteWu
Copy link
Contributor

@PureWhiteWu PureWhiteWu commented Dec 23, 2023

Motivation

Previously, I noticed the clone cost of FastStr is really high, for example, an empty FastStr clone costs about 40ns on amd64 compared to about 4ns of a normal String.

Solution

After some time of investigation, I found that this is because the Repr::Inline part has really great affect on the performance. And after adding #[repr(u64)] to Repr, the performance boosts about 9x. But the root cause is still not clear.

src/lib.rs Outdated Show resolved Hide resolved
@PureWhiteWu PureWhiteWu changed the title feat: manually add padding to optimize clone cost feat: change type of len to usize to optimize clone cost Dec 23, 2023
src/lib.rs Outdated
@@ -496,7 +491,7 @@ impl From<Cow<'static, str>> for FastStr {
}
}

const INLINE_CAP: usize = 38;
const INLINE_CAP: usize = 24;
Copy link

@0xd34d10cc 0xd34d10cc Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make Inline a separate struct (keep len: u8) and just slap #[repr(align(8))] on it you can have INLINE_CAP = 30 with same performance.

Copy link

@0xd34d10cc 0xd34d10cc Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively try reverting all changes and just adding #[repr(u64)] to Repr enum with INLINE_CAP = 30. This probably will have same effect, as it will change discriminant type from u8 to u64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you very much!
This works! I will change this pr to use this method!

Copy link
Contributor Author

@PureWhiteWu PureWhiteWu Dec 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've tested again on amd64 and found that this method leads to the clone of Empty costs 8ns instead of 4ns. This method works fine on aarch64, but seems that it doesn't on amd64.
I've fixed this in 342bdc9

@PureWhiteWu PureWhiteWu changed the title feat: change type of len to usize to optimize clone cost feat: add #[repr(u64)] to Repr to optimize clone cost Dec 23, 2023
@PureWhiteWu PureWhiteWu merged commit 1e0a9b2 into main Dec 23, 2023
10 checks passed
@PureWhiteWu PureWhiteWu deleted the feat/optimize_clone branch December 23, 2023 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants