-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make VarZeroVec format configurable #2306
Conversation
datagen-full fails since the ZM2D gets too big, looks like I'm going to have to add the ZeroMap stuff after all. I'll do it tomorrow. |
Which key is overflowing? |
Is it just the blob provider that's overflowing? Too many locales? If the proper fix is too tricky, this is the only place where we have totally private structures, so you could make a one-off VarULE type that's transparent over [u8] and has a bigger ZeroMapKV container. |
Yeah.
Nah I'll do the proper fix. It's a bit complicated but I don't anticipate problems. |
My bigger worry is that there's some locale that uses supplementary code points and long strings and just has really big data that overflows the time zone names struct. If that happens, we're in a bit of a pickle with the path we've chosen. |
We can use Index32 in those places if we're worried, right? |
The places where we may sometimes need to use Index32 are exactly the same places where using Index16 has the biggest impact. The design of FlexZeroVec is that the 95% case gets smaller data and the 5% case can still grow. However, hopefully this isn't actually an issue. In such cases, I may advocate to instead just break the data struct into smaller pieces (perhaps a different map for each continent or something like that). |
Yeah, that's a good call. I think that's also an acceptable place to introduce a new data struct version with Flex. That works well as an "if we're unlucky" path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Don't have any changes to suggest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I had already approved)
This finishes the 1.0 polish work necessary for #1410, switching us over to Index16 by default with Index32 being used for BlobDataProvider.
Stuff that should be done soon but is not necessary for ICU4X 1.0:
Stuff to do eventually: