-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add #[repr(u64)] to Repr to optimize clone cost #6
Conversation
src/lib.rs
Outdated
@@ -496,7 +491,7 @@ impl From<Cow<'static, str>> for FastStr { | |||
} | |||
} | |||
|
|||
const INLINE_CAP: usize = 38; | |||
const INLINE_CAP: usize = 24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make Inline
a separate struct (keep len: u8) and just slap #[repr(align(8))]
on it you can have INLINE_CAP = 30
with same performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively try reverting all changes and just adding #[repr(u64)]
to Repr
enum with INLINE_CAP = 30
. This probably will have same effect, as it will change discriminant type from u8
to u64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thank you very much!
This works! I will change this pr to use this method!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've tested again on amd64 and found that this method leads to the clone of Empty
costs 8ns instead of 4ns. This method works fine on aarch64, but seems that it doesn't on amd64.
I've fixed this in 342bdc9
Motivation
Previously, I noticed the clone cost of
FastStr
is really high, for example, an emptyFastStr
clone costs about 40ns on amd64 compared to about 4ns of a normalString
.Solution
After some time of investigation, I found that this is because the
Repr::Inline
part has really great affect on the performance. And after adding #[repr(u64)] toRepr
, the performance boosts about 9x. But the root cause is still not clear.