Improve performance of VarZeroVec::deserialize and add provider benches #2603

sffc · 2022-09-21T01:06:18Z

After @zbraniecki noticed a performance issue with StaticDataProvider's constructor, I looked into the code with @Manishearth and found some low-hanging fruit to optimize. I managed to get a 63% improvement, as shown below with the new benches run across the second commit in this PR:

Benchmarking provider/construct/static: Collecting 100 samples in estimated 5.0030 s (5.2M iterati                                                                                                  provider/construct/static                        
                        time:   [951.26 ns 952.42 ns 953.85 ns]
                        change: [-63.724% -63.601% -63.474%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking provider/construct/blob: Collecting 100 samples in estimated 5.4580 s (25k iterations                                                                                                  provider/construct/blob time:   [214.57 us 214.85 us 215.23 us]
                        change: [-0.7276% -0.5377% -0.3567%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking provider/construct/baked: Collecting 100 samples in estimated 5.0000 s (18446744074B                                                                                                   provider/construct/baked                        
                        time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-45.396% -1.3066% +75.734%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

provider/load/static    time:   [117.46 ns 117.92 ns 118.47 ns]                                 
                        change: [-0.8186% +1.4342% +4.4178%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

provider/load/blob      time:   [114.53 ns 114.67 ns 114.83 ns]                               
                        change: [-3.5102% -3.2867% -3.0490%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

provider/load/baked     time:   [39.669 ns 39.951 ns 40.354 ns]                                 
                        change: [+7.0914% +7.7782% +8.7248%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

sffc · 2022-09-21T02:05:06Z

I included an extra commit, 508372c, with the following characteristics:

Benchmarking provider/construct/static: Collecting 100 samples in estimated 5.0000 s (7.0M iterati                                                                                                  provider/construct/static                        
                        time:   [707.11 ns 708.16 ns 709.41 ns]
                        change: [-25.849% -25.592% -25.343%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking provider/construct/blob: Collecting 100 samples in estimated 5.4600 s (25k iterations                                                                                                  provider/construct/blob time:   [213.82 us 214.17 us 214.68 us]
                        change: [-0.5128% -0.3020% -0.0509%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

Benchmarking provider/construct/baked: Collecting 100 samples in estimated 5.0000 s (18446744074B                                                                                                   provider/construct/baked                        
                        time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-45.589% +0.0750% +89.388%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

provider/load/static    time:   [133.00 ns 133.25 ns 133.56 ns]                                 
                        change: [+6.5930% +9.8142% +12.347%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

provider/load/blob      time:   [132.53 ns 132.71 ns 132.92 ns]                               
                        change: [+15.493% +15.828% +16.180%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

provider/load/baked     time:   [36.813 ns 37.003 ns 37.284 ns]                                 
                        change: [-8.6583% -7.8680% -7.1866%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

An extra 25% off the constructor at the cost of around 10% on the load function. It also saves a small amount of space in the postcard file. I am neutral on whether to include this additional change.

sffc · 2022-09-21T06:29:48Z

provider/blob/src/blob_schema.rs

+    pub indices: &'data FlexZeroSlice,
+    /// Buffer slice
+    #[serde(borrow)]
+    pub buffer: &'data [u8],


Note: I can probably get a further boost by using serde_bytes::Bytes or a similar type (likely an internal one) at this spot, so that we use the deserialize_bytes code path instead of deserialize_seq.

sffc · 2022-09-21T06:30:19Z

provider/blob/src/blob_schema.rs

    #[serde(borrow)]
-    pub buffers: &'data VarZeroSlice<[u8], Index32>,
+    pub indices: &'data FlexZeroSlice,


Note: I could instead write this to have the exact same binary representation as the VarZeroSlice, which would allow us to make the change later and go back and forth.

If there is interest, this could be UnvalidatedVarZeroVec. It would support only one type, byte slices, and the indices would be lazily validated.

I like the current model in this PR just fine

robertbastian · 2022-09-21T13:44:02Z

The additional change increases maintenance complexity for the blob schema. I'd rather keep complicated ULE stuff contained in the zerovec crate.

What's the core of that change, using a FlexZeroVec for the indices instead of a &[u8]? Why can this not be done in VZV?

Manishearth · 2022-09-21T14:52:40Z

provider/blob/src/blob_schema.rs

    #[serde(borrow)]
-    pub buffers: &'data VarZeroSlice<[u8], Index32>,
+    pub indices: &'data FlexZeroSlice,


I like the current model in this PR just fine

sffc · 2022-09-21T15:32:56Z

The additional change increases maintenance complexity for the blob schema. I'd rather keep complicated ULE stuff contained in the zerovec crate.

What's the core of that change, using a FlexZeroVec for the indices instead of a &[u8]? Why can this not be done in VZV?

See my comment threads in the code. The core of 508372c is delaying the validation of the indices array; VZV requires that it be sorted, but it doesn't really need to be since we're not validating the byte slices; we just hand them off to Postcard to deal with.

sffc · 2022-09-21T15:35:05Z

I dropped 508372c to get the uncontroversial change checked in, and I'll open another PR with the schema change to discuss further.

sffc added 2 commits September 20, 2022 17:59

Add benches for provider constructors and load methods

03a1942

Improve performance of VarZeroVec deserialize

d35d97f

sffc requested review from zbraniecki and Manishearth September 21, 2022 01:06

sffc requested a review from a team as a code owner September 21, 2022 01:06

sffc added 2 commits September 20, 2022 18:53

clippy

b656805

Fix

2e76382

sffc commented Sep 21, 2022

View reviewed changes

Manishearth previously approved these changes Sep 21, 2022

View reviewed changes

sffc dismissed Manishearth’s stale review via 2e76382 September 21, 2022 15:34

sffc force-pushed the provider-perf branch from 508372c to 2e76382 Compare September 21, 2022 15:34

robertbastian approved these changes Sep 21, 2022

View reviewed changes

zbraniecki approved these changes Sep 21, 2022

View reviewed changes

sffc merged commit 3d80f0d into unicode-org:main Sep 21, 2022

sffc deleted the provider-perf branch September 21, 2022 21:22

sffc mentioned this pull request Sep 24, 2022

Further optimize BlobDataProvider constructor #2625

Open

sffc mentioned this pull request Dec 22, 2023

Add explicit ZeroVec and VarZeroVec iterator types #4487

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of VarZeroVec::deserialize and add provider benches #2603

Improve performance of VarZeroVec::deserialize and add provider benches #2603

sffc commented Sep 21, 2022 •

edited

sffc commented Sep 21, 2022

sffc Sep 21, 2022

sffc Sep 21, 2022

sffc Sep 21, 2022

Manishearth Sep 21, 2022

robertbastian commented Sep 21, 2022

Manishearth Sep 21, 2022

sffc commented Sep 21, 2022

sffc commented Sep 21, 2022

Improve performance of VarZeroVec::deserialize and add provider benches #2603

Improve performance of VarZeroVec::deserialize and add provider benches #2603

Conversation

sffc commented Sep 21, 2022 • edited

sffc commented Sep 21, 2022

sffc Sep 21, 2022

Choose a reason for hiding this comment

sffc Sep 21, 2022

Choose a reason for hiding this comment

sffc Sep 21, 2022

Choose a reason for hiding this comment

Manishearth Sep 21, 2022

Choose a reason for hiding this comment

robertbastian commented Sep 21, 2022

Manishearth Sep 21, 2022

Choose a reason for hiding this comment

sffc commented Sep 21, 2022

sffc commented Sep 21, 2022

sffc commented Sep 21, 2022 •

edited