Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

sffc · 2023-12-02T05:08:55Z

I realized today that we can do the allocation-free comparisons much more easily than we had been. This new comparison impl is easier to reason about, less code, and faster.

data_locale/strict_cmp  time:   [2.4656 µs 2.4698 µs 2.4742 µs]
                        change: [-14.891% -14.703% -14.533%] (p = 0.00 < 0.05)
                        Performance has improved.

langid/compare/strict_cmp/langid
                        time:   [220.81 ns 221.10 ns 221.42 ns]
                        change: [-5.5894% -5.2507% -4.7995%] (p = 0.00 < 0.05)
                        Performance has improved.

locale/compare/strict_cmp/locale
                        time:   [331.43 ns 331.65 ns 331.86 ns]
                        change: [+4.5129% +4.7702% +5.0030%] (p = 0.00 < 0.05)
                        Performance has regressed.

(not sure about the Locale comparison bench; since it makes the others faster, I'm inclined to ignore that or eat the 5%)

Manishearth

clever

Manishearth · 2023-12-02T17:47:23Z

utils/writeable/src/cmp.rs

+        if self.result != Ordering::Equal {
+            return Ok(());
+        }
+        let cmp_len = core::cmp::min(other.len(), self.string.len());


if other is longer than self we should be capping out, right?

If other is longer, then the only effect is that on the line below the remainder becomes empty and we compare the whole of other to the whole of self.string.

zbraniecki · 2023-12-04T21:59:42Z

The slowdown on locale strict_cmp is reproducable and seems real to me.

I traced it down to

icu4x/components/locid/src/extensions/mod.rs

Lines 266 to 271 in aadc9be

    
           if other.get_ext() > 't' && !wrote_tu { 
        
               // Since 't' and 'u' are next to each other in alphabetical 
        
               // order, write both now. 
        
               self.transform.for_each_subtag_str(f)?; 
        
               self.unicode.for_each_subtag_str(f)?; 
        
               wrote_tu = true;

- it seems that in our benchmark the other is empty, but this closure affects perf of that test by 10%.

I was able to inject panic!() and have the benchmark complete, and I was able to comment out the closure and get 10% perf win on this branch.

I haven't looked into asm, so not sure what causes that.

sffc · 2023-12-04T22:11:13Z

Thanks @zbraniecki for the investigation.

Are you okay merging this with the known 5% impact on Locale::strict_cmp given that this PR shows a 15% win on DataLocale::strict_cmp and 5% on LanguageIdentifier::strict_cmp, besides the benefits to code maintainability? (Note that there are another couple hundred lines of code I can delete in 2.0 after we delete the deprecated API that should have never been made public.)

zbraniecki · 2023-12-05T03:15:59Z

Are you okay merging this with the known 5% impact on Locale::strict_cmp

Yes. I'm mildly curious as to why is it regressing, but I'm comfortable with this tradeoff.

See #4402, #4741, #4787

sffc added 5 commits December 1, 2023 20:54

Add benches for DataLocale

48aafab

Add write_cmp function

b5033d5

Implement the comparison with bytes instead of strings

9822dbe

Use the new comparison code in DataLocale::strict_cmp

8f1571e

Use the new comparison code in icu_locid and deprecate the old stuff

c67e585

sffc requested review from robertbastian, Manishearth, zbraniecki, nciric and a team as code owners December 2, 2023 05:08

sffc removed request for a team, zbraniecki and nciric December 2, 2023 05:09

Manishearth reviewed Dec 2, 2023

View reviewed changes

sffc requested a review from zbraniecki December 4, 2023 22:06

Manishearth approved these changes Dec 8, 2023

View reviewed changes

sffc merged commit c33f44c into unicode-org:main Dec 12, 2023
29 checks passed

sffc deleted the datalocale-bench branch December 12, 2023 02:45

This was referenced Apr 10, 2024

Should we have a fallible Writeable? #4741

Closed

Rename write_cmp_bytes to writeable_cmp_bytes #4795

Merged

sffc added a commit that referenced this pull request Apr 18, 2024

Rename write_cmp_bytes to writeable_cmp_bytes (#4795)

c7f2424

See #4402, #4741, #4787

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

sffc commented Dec 2, 2023

Manishearth left a comment

Manishearth Dec 2, 2023

sffc Dec 4, 2023

zbraniecki commented Dec 4, 2023

sffc commented Dec 4, 2023

zbraniecki commented Dec 5, 2023

Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

Add Writeable::write_cmp_bytes and use it in DataLocale and Locale #4402

Conversation

sffc commented Dec 2, 2023

Manishearth left a comment

Choose a reason for hiding this comment

Manishearth Dec 2, 2023

Choose a reason for hiding this comment

sffc Dec 4, 2023

Choose a reason for hiding this comment

zbraniecki commented Dec 4, 2023

sffc commented Dec 4, 2023

zbraniecki commented Dec 5, 2023