Speed up resource lookup and fallback at runtime #2699

sffc · 2022-10-01T16:30:50Z

#2683 and #834, as well as #2686, are largely centered around optimizations that are performed at datagen time.

We may want to consider opportunities for runtime improvements as well. Ideas include:

Use hash maps: The lowest-hanging fruit here would be to leverage ZeroHashMap to make resource lookups more constant-time, especially if you're in a situation where you ship hundreds of locales in a single data file. We should at least do this in BakedDataProvider since we can more easily change the data model post 1.0 and if people are building hundreds of locales, they probably should be using BakedDataProvider anyway.

Micro-optimizations in the fallback algorithm: For example, we can probably try the input DataLocale directly in the resource file and return fast if there is an exact match before performing the normalization and fallback steps.

Micro-optimizations in cmp_bytes (aka strict_cmp): Another bottleneck in resource lookup is the cmp_bytes function in DataLocale. There may be opportunities to improve the performance of this operation.

I'll also point out that one may naively believe that hitting the data provider multiple times for each constructor (e.g., loading date symbols, time symbols, and number symbols independently) has a negative performance cost. However, I would point to previous evidence that by leveraging smaller keys, we save on deserialization/validation cost, and also that smaller keys typically result in smaller data files due to more deduplication opportunities (although whenever a key split is proposed, the size impact should be measured).

sffc · 2022-12-16T11:48:49Z

@Manishearth @robertbastian Can you weigh in and suggest alternatives?

robertbastian · 2022-12-16T12:10:44Z

Use hash maps: The lowest-hanging fruit here would be to leverage ZeroHashMap to make resource lookups more constant-time, especially if you're in a situation where you ship hundreds of locales in a single data file. We should at least do this in BakedDataProvider since we can more easily change the data model post 1.0 and if people are building hundreds of locales, they probably should be using BakedDataProvider anyway.

Let's do it.

Micro-optimizations in the fallback algorithm: For example, we can probably try the input DataLocale directly in the resource file and return fast if there is an exact match before performing the normalization and fallback steps.

This makes sense and I'd have expected this to happen anyway.

Micro-optimizations in cmp_bytes (aka strict_cmp): Another bottleneck in resource lookup is the cmp_bytes function in DataLocale. There may be opportunities to improve the performance of this operation.

Do we actually have measurements confirming this as a bottleneck?

One other improvement I can think of: many keys will have identical locale sets. Our providers could cache the last requested locale and index, and if the next key uses the same locales (as determined at datagen time), we already know its index. This would only help in non-fallback lookup though.

For fallback we could cache the last key+locale+index, and if the next request is for the same key, we can start the search where we expected the last locale to be. As it's fallback, we usually look for a prefix of the previous key, so that should be in the same location.

sffc · 2022-12-16T17:32:51Z

Discussion:

We can carry info forward between requests using DataRequestMetadata
We could borrow a locale, etc., in that metadata.

sffc · 2023-10-24T21:50:10Z

In #4207 I'm implementing a ZeroTrie variant, which, if the previous benchmarks hold, is faster and smaller than the current ZeroMap2d variant. However, ZeroHashMap is likely the fastest, but it will incur some data size cost. So I'm keeping this issue open but lowering the priority.

Related: #4216, #4207, #2699 See the numbers in locale_aux_test.rs

sffc · 2023-12-23T21:43:51Z

Raising the priority again because the ZeroTrie optimization was only applied to blob provider. We should still explore a solution for baked provider.

This performance impact manifests itself in NeoDateTimeFormatter, which hits the provider more than the previous iteration of DateTimeFormatter.

sffc · 2024-04-19T00:28:38Z

An issue I'm encountering now is that in datetime format, the ZeroTrie lookup tables for skeleta are fairly large. If we were able to share strings between keys (calendars), they could be made smaller. For example, with one level of indirection, we could store a table of locale strings, and then the lookup table would have a u32 index into the locale table. This would probably amortize to be smaller than ZeroTrie. Not sure about impact on lookup speed.

Manishearth · 2024-04-19T17:07:45Z

That would be cool. I've wanted to amortize space between aux keys by storing them at a different layer in blob

sffc · 2024-05-04T00:59:24Z

Just an idea for how to do this in Blob Provider:

Have a blob-local global ZeroTrie for language identifiers and another blob-local global ZeroTrie for aux keys
The key-specific tries index with a two-character UTF-8 string: the index of the langid and the index of the aux key

I think this should amortize the cost of the aux-key-heavy data keys quite nicely.

robertbastian · 2024-05-06T09:15:37Z

(2.) is using ZeroTrie as ZeroMap2d<char, char, usize>?

sffc · 2024-05-10T17:58:13Z

I did some experimentation. For an export of all date skeleton keys over all locales and maximal deduplication, here's what I get:

	Blob	Blob2	#2699 (comment)
Postcard Size	567405 B	218376 B	196215 B
Load for sr-Latn-x-ym0d (no fallback)	247.58 ns	175.20 ns	203.23 ns
Load for sr-ME-x-ym0d (with fallback)	719.68 ns	421.64 ns	451.99 ns

These figures aren't compelling enough for me to continue work on this blob layout. I archived my code here:

https://github.com/sffc/omnicu/tree/archive/blob3

To evaluate solutions for #2699

sffc added T-enhancement Type: Nice-to-have but not required A-performance Area: Performance (CPU, Memory) C-data-infra Component: provider, datagen, fallback, adapters S-large Size: A few weeks (larger feature, major refactoring) labels Oct 1, 2022

sffc added the discuss Discuss at a future ICU4X-SC meeting label Oct 17, 2022

sffc added needs-approval One or more stakeholders need to approve proposal and removed discuss Discuss at a future ICU4X-SC meeting labels Dec 16, 2022

robertbastian mentioned this issue Dec 16, 2022

Explore alternate top level data structures #2885

Open

sffc self-assigned this Dec 16, 2022

sffc removed the needs-approval One or more stakeholders need to approve proposal label Dec 16, 2022

sffc added this to the ICU4X 1.2 milestone Dec 16, 2022

sffc modified the milestones: 1.2 Blocking ⟨P1⟩, 1.x Priority ⟨P2⟩ Mar 16, 2023

This was referenced Oct 22, 2023

Split DateSymbols data #3865

Open

Add BlobSchema V2 with ZeroTrie #4207

Merged

sffc modified the milestones: 1.x Priority ⟨P2⟩, Backlog ⟨P4⟩ Oct 24, 2023

sffc mentioned this issue Oct 24, 2023

ZeroTrie: Add some tests for locale with aux key #4206

Merged

sffc added a commit that referenced this issue Oct 25, 2023

ZeroTrie: Add some tests for locale with aux key (#4206)

e122add

Related: #4216, #4207, #2699 See the numbers in locale_aux_test.rs

sffc modified the milestones: Backlog ⟨P4⟩, 1.x Priority ⟨P2⟩ Dec 23, 2023

This was referenced May 10, 2024

Add blob provider benchmark for auxiliary key fallback #4887

Merged

ZeroTriePerfectHash: Increase bounds of p for newly found hard case #4888

Merged

sffc added a commit that referenced this issue May 11, 2024

Add blob provider benchmark for auxiliary key fallback (#4887)

fbd61a3

To evaluate solutions for #2699

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up resource lookup and fallback at runtime #2699

Speed up resource lookup and fallback at runtime #2699

sffc commented Oct 1, 2022 •

edited

Loading

sffc commented Dec 16, 2022

robertbastian commented Dec 16, 2022

sffc commented Dec 16, 2022

sffc commented Oct 24, 2023 •

edited

Loading

sffc commented Dec 23, 2023

sffc commented Apr 19, 2024

Manishearth commented Apr 19, 2024

sffc commented May 4, 2024

robertbastian commented May 6, 2024

sffc commented May 10, 2024

Speed up resource lookup and fallback at runtime #2699

Speed up resource lookup and fallback at runtime #2699

Comments

sffc commented Oct 1, 2022 • edited Loading

sffc commented Dec 16, 2022

robertbastian commented Dec 16, 2022

sffc commented Dec 16, 2022

sffc commented Oct 24, 2023 • edited Loading

sffc commented Dec 23, 2023

sffc commented Apr 19, 2024

Manishearth commented Apr 19, 2024

sffc commented May 4, 2024

robertbastian commented May 6, 2024

sffc commented May 10, 2024

sffc commented Oct 1, 2022 •

edited

Loading

sffc commented Oct 24, 2023 •

edited

Loading