Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

Merged
merged 5 commits into from
Dec 12, 2023

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 2, 2023

I realized today that we can do the allocation-free comparisons much more easily than we had been. This new comparison impl is easier to reason about, less code, and faster.

data_locale/strict_cmp  time:   [2.4656 µs 2.4698 µs 2.4742 µs]
                        change: [-14.891% -14.703% -14.533%] (p = 0.00 < 0.05)
                        Performance has improved.

langid/compare/strict_cmp/langid
                        time:   [220.81 ns 221.10 ns 221.42 ns]
                        change: [-5.5894% -5.2507% -4.7995%] (p = 0.00 < 0.05)
                        Performance has improved.

locale/compare/strict_cmp/locale
                        time:   [331.43 ns 331.65 ns 331.86 ns]
                        change: [+4.5129% +4.7702% +5.0030%] (p = 0.00 < 0.05)
                        Performance has regressed.

(not sure about the Locale comparison bench; since it makes the others faster, I'm inclined to ignore that or eat the 5%)

Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever

if self.result != Ordering::Equal {
return Ok(());
}
let cmp_len = core::cmp::min(other.len(), self.string.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if other is longer than self we should be capping out, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If other is longer, then the only effect is that on the line below the remainder becomes empty and we compare the whole of other to the whole of self.string.

@zbraniecki
Copy link
Member

The slowdown on locale strict_cmp is reproducable and seems real to me.

I traced it down to

if other.get_ext() > 't' && !wrote_tu {
// Since 't' and 'u' are next to each other in alphabetical
// order, write both now.
self.transform.for_each_subtag_str(f)?;
self.unicode.for_each_subtag_str(f)?;
wrote_tu = true;
- it seems that in our benchmark the other is empty, but this closure affects perf of that test by 10%.

I was able to inject panic!() and have the benchmark complete, and I was able to comment out the closure and get 10% perf win on this branch.

I haven't looked into asm, so not sure what causes that.

@sffc sffc requested a review from zbraniecki December 4, 2023 22:06
@sffc
Copy link
Member Author

sffc commented Dec 4, 2023

Thanks @zbraniecki for the investigation.

Are you okay merging this with the known 5% impact on Locale::strict_cmp given that this PR shows a 15% win on DataLocale::strict_cmp and 5% on LanguageIdentifier::strict_cmp, besides the benefits to code maintainability? (Note that there are another couple hundred lines of code I can delete in 2.0 after we delete the deprecated API that should have never been made public.)

@zbraniecki
Copy link
Member

Are you okay merging this with the known 5% impact on Locale::strict_cmp

Yes. I'm mildly curious as to why is it regressing, but I'm comfortable with this tradeoff.

@sffc sffc merged commit c33f44c into unicode-org:main Dec 12, 2023
29 checks passed
@sffc sffc deleted the datalocale-bench branch December 12, 2023 02:45
sffc added a commit that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants