Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of VarZeroVec::deserialize and add provider benches #2603

Merged
merged 4 commits into from Sep 21, 2022

Conversation

sffc
Copy link
Member

@sffc sffc commented Sep 21, 2022

Fixes #2600

After @zbraniecki noticed a performance issue with StaticDataProvider's constructor, I looked into the code with @Manishearth and found some low-hanging fruit to optimize. I managed to get a 63% improvement, as shown below with the new benches run across the second commit in this PR:

Benchmarking provider/construct/static: Collecting 100 samples in estimated 5.0030 s (5.2M iterati                                                                                                  provider/construct/static                        
                        time:   [951.26 ns 952.42 ns 953.85 ns]
                        change: [-63.724% -63.601% -63.474%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking provider/construct/blob: Collecting 100 samples in estimated 5.4580 s (25k iterations                                                                                                  provider/construct/blob time:   [214.57 us 214.85 us 215.23 us]
                        change: [-0.7276% -0.5377% -0.3567%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking provider/construct/baked: Collecting 100 samples in estimated 5.0000 s (18446744074B                                                                                                   provider/construct/baked                        
                        time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-45.396% -1.3066% +75.734%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

provider/load/static    time:   [117.46 ns 117.92 ns 118.47 ns]                                 
                        change: [-0.8186% +1.4342% +4.4178%] (p = 0.35 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

provider/load/blob      time:   [114.53 ns 114.67 ns 114.83 ns]                               
                        change: [-3.5102% -3.2867% -3.0490%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

provider/load/baked     time:   [39.669 ns 39.951 ns 40.354 ns]                                 
                        change: [+7.0914% +7.7782% +8.7248%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

@sffc sffc requested a review from a team as a code owner September 21, 2022 01:06
@sffc
Copy link
Member Author

sffc commented Sep 21, 2022

I included an extra commit, 508372c, with the following characteristics:

Benchmarking provider/construct/static: Collecting 100 samples in estimated 5.0000 s (7.0M iterati                                                                                                  provider/construct/static                        
                        time:   [707.11 ns 708.16 ns 709.41 ns]
                        change: [-25.849% -25.592% -25.343%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking provider/construct/blob: Collecting 100 samples in estimated 5.4600 s (25k iterations                                                                                                  provider/construct/blob time:   [213.82 us 214.17 us 214.68 us]
                        change: [-0.5128% -0.3020% -0.0509%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  1 (1.00%) high severe

Benchmarking provider/construct/baked: Collecting 100 samples in estimated 5.0000 s (18446744074B                                                                                                   provider/construct/baked                        
                        time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                        change: [-45.589% +0.0750% +89.388%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

provider/load/static    time:   [133.00 ns 133.25 ns 133.56 ns]                                 
                        change: [+6.5930% +9.8142% +12.347%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

provider/load/blob      time:   [132.53 ns 132.71 ns 132.92 ns]                               
                        change: [+15.493% +15.828% +16.180%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

provider/load/baked     time:   [36.813 ns 37.003 ns 37.284 ns]                                 
                        change: [-8.6583% -7.8680% -7.1866%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

An extra 25% off the constructor at the cost of around 10% on the load function. It also saves a small amount of space in the postcard file. I am neutral on whether to include this additional change.

pub indices: &'data FlexZeroSlice,
/// Buffer slice
#[serde(borrow)]
pub buffer: &'data [u8],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I can probably get a further boost by using serde_bytes::Bytes or a similar type (likely an internal one) at this spot, so that we use the deserialize_bytes code path instead of deserialize_seq.

#[serde(borrow)]
pub buffers: &'data VarZeroSlice<[u8], Index32>,
pub indices: &'data FlexZeroSlice,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I could instead write this to have the exact same binary representation as the VarZeroSlice, which would allow us to make the change later and go back and forth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is interest, this could be UnvalidatedVarZeroVec. It would support only one type, byte slices, and the indices would be lazily validated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current model in this PR just fine

@robertbastian
Copy link
Member

The additional change increases maintenance complexity for the blob schema. I'd rather keep complicated ULE stuff contained in the zerovec crate.

What's the core of that change, using a FlexZeroVec for the indices instead of a &[u8]? Why can this not be done in VZV?

Manishearth
Manishearth previously approved these changes Sep 21, 2022
#[serde(borrow)]
pub buffers: &'data VarZeroSlice<[u8], Index32>,
pub indices: &'data FlexZeroSlice,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the current model in this PR just fine

@sffc
Copy link
Member Author

sffc commented Sep 21, 2022

The additional change increases maintenance complexity for the blob schema. I'd rather keep complicated ULE stuff contained in the zerovec crate.

What's the core of that change, using a FlexZeroVec for the indices instead of a &[u8]? Why can this not be done in VZV?

See my comment threads in the code. The core of 508372c is delaying the validation of the indices array; VZV requires that it be sorted, but it doesn't really need to be since we're not validating the byte slices; we just hand them off to Postcard to deal with.

@sffc
Copy link
Member Author

sffc commented Sep 21, 2022

I dropped 508372c to get the uncontroversial change checked in, and I'll open another PR with the schema change to discuss further.

@sffc sffc merged commit 3d80f0d into unicode-org:main Sep 21, 2022
@sffc sffc deleted the provider-perf branch September 21, 2022 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add benches for data provider constructors
4 participants