Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make VarZeroVec format configurable #2306

Merged
merged 29 commits into from
Aug 3, 2022

Conversation

Manishearth
Copy link
Member

This finishes the 1.0 polish work necessary for #1410, switching us over to Index16 by default with Index32 being used for BlobDataProvider.

Stuff that should be done soon but is not necessary for ICU4X 1.0:

  • ZeroMap support (we didn't need this, and it can be added incrementally)
  • Format strategy support in custom derive
  • Actual nice errors for VZV mutation operations
  • Perhaps clean up the code to not cast to usize as often (and instead cast to the relevant index type)

Stuff to do eventually:

  • Introduce an "Auto" strategy (will need a rewrite of the code)

@Manishearth Manishearth requested review from sffc, a team and zbraniecki as code owners August 2, 2022 03:40
@Manishearth
Copy link
Member Author

datagen-full fails since the ZM2D gets too big, looks like I'm going to have to add the ZeroMap stuff after all. I'll do it tomorrow.

@sffc
Copy link
Member

sffc commented Aug 2, 2022

datagen-full fails since the ZM2D gets too big, looks like I'm going to have to add the ZeroMap stuff after all. I'll do it tomorrow.

Which key is overflowing?

@sffc
Copy link
Member

sffc commented Aug 2, 2022

Is it just the blob provider that's overflowing? Too many locales?

If the proper fix is too tricky, this is the only place where we have totally private structures, so you could make a one-off VarULE type that's transparent over [u8] and has a bigger ZeroMapKV container.

@Manishearth
Copy link
Member Author

Is it just the blob provider that's overflowing? Too many locales?

Yeah.

If the proper fix is too tricky, this is the only place where we have totally private structures, so you could make a one-off VarULE type that's transparent over [u8] and has a bigger ZeroMapKV container.

Nah I'll do the proper fix. It's a bit complicated but I don't anticipate problems.

@sffc
Copy link
Member

sffc commented Aug 2, 2022

My bigger worry is that there's some locale that uses supplementary code points and long strings and just has really big data that overflows the time zone names struct. If that happens, we're in a bit of a pickle with the path we've chosen.

@Manishearth
Copy link
Member Author

We can use Index32 in those places if we're worried, right?

@sffc
Copy link
Member

sffc commented Aug 2, 2022

We can use Index32 in those places if we're worried, right?

The places where we may sometimes need to use Index32 are exactly the same places where using Index16 has the biggest impact. The design of FlexZeroVec is that the 95% case gets smaller data and the 5% case can still grow.

However, hopefully this isn't actually an issue. In such cases, I may advocate to instead just break the data struct into smaller pieces (perhaps a different map for each continent or something like that).

@Manishearth
Copy link
Member Author

Manishearth commented Aug 2, 2022

Yeah, that's a good call.

I think that's also an acceptable place to introduce a new data struct version with Flex. That works well as an "if we're unlucky" path.

sffc
sffc previously approved these changes Aug 2, 2022
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Don't have any changes to suggest

@Manishearth Manishearth requested a review from sffc August 2, 2022 23:35
@Manishearth Manishearth merged commit fcd45e7 into unicode-org:main Aug 3, 2022
@Manishearth Manishearth deleted the vzv-configurable branch August 3, 2022 01:41
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I had already approved)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants