Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing auto-derived Ord impl for Locale/LangId #2142

Merged
merged 7 commits into from
Jul 15, 2022

Conversation

snktd
Copy link
Member

@snktd snktd commented Jun 30, 2022

  • I had to provide custom Ord implementation for LanguageIdentifier and ResourceOptions as these are being used widely across the codebase. For example, in many places LangId is being used inside a LiteMap and cause errors if Ord is not implemented for LangId.

Also, I am not entirely sure if this is the correct way to implement Ord. Let me know if I missed anything!

Fixes #1215

@@ -351,6 +351,18 @@ impl FromStr for LanguageIdentifier {
}
}

impl Ord for LanguageIdentifier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: we can't agree on what the behavior of impl Ord for LanguageIdentifier should be, so we don't want any impl of it. What things in ICU4X require it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the list of errors that I am running into if there is no Ord impl for LanguageIdentifier. This is probably not complete and most if not all of these errors are related to LiteMap.

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/calendar/japanese.rs#L60

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/decimal/mod.rs#L82

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/list/mod.rs#L42

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/locale_canonicalizer/aliases.rs#L79

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/plurals/mod.rs#L61

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/ca.rs#L165

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/likely_subtags.rs#L16

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/list_patterns.rs#L52

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/numbers.rs#L110

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/parent_locales.rs#L16

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/time_zones/time_zone_names.rs#L164

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/cldr_serde/time_zones/time_zone_names.rs#L164

https://github.com/unicode-org/icu4x/blob/main/provider/datagen/src/transform/cldr/datetime/mod.rs#L89

@@ -383,6 +383,18 @@ impl fmt::Display for ResourceOptions {
}
}

impl Ord for ResourceOptions {
fn cmp(&self, other: &Self) -> Ordering {
self.strict_cmp(other.to_string().as_bytes())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: we should not call to_string(), which is an expensive function, in cmp, which should be efficient since it may be called many times.

Where do we need impl Ord for ResourceOptions? Maybe we can change the call sites instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can switch to HashMap as you commented below. This is where I am getting errors for ResourceOptions if Ord impl is not provided.

https://github.com/unicode-org/icu4x/blob/main/provider/core/src/hello_world.rs#L110

https://github.com/unicode-org/icu4x/blob/main/provider/core/src/hello_world.rs#L127

Probably because HelloWorldProvider uses LiteMap: https://github.com/unicode-org/icu4x/blob/main/provider/core/src/hello_world.rs#L127

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HelloWorldProvider should just use HashMap, it doesn't matter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it can't, for reasons Rob mentioned

We could instead just use a manual LiteMap, a Vec of tuples that we binary search

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved all of datagen to LiteMap a while ago in part because LiteMap has a bunch of utility functions that are useful for us that don't exist in HashMap.

This is a place where LocaleStr would be a good type to have.

I'm not quite sure what to recommend to unblock @snktd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of those utility functions you mentioned are still in use. See #2152, which removes LiteMap deps and unblocks this.

@Manishearth
Copy link
Member

For example, in many places LangId is being used inside a LiteMap and cause errors if Ord is not implemented for LangId.

@sffc I wonder if we should be using a regular hashmap instead here?

@sffc
Copy link
Member

sffc commented Jun 30, 2022

It would be useful to see a list of the places and decide what to do there:

  1. Switch from LiteMap to HashMap?
  2. Use a different type (like a string) as the key?
  3. Something else?

@Manishearth
Copy link
Member

actually yeah a string might be fine

@snktd
Copy link
Member Author

snktd commented Jun 30, 2022

it would be useful to see a list of the places and decide what to do there:

Added the list of code-paths that throw errors if Ord impl is removed.

@robertbastian
Copy link
Member

All occurences of LiteMap<LanguageIdentifier, ...> are in datagen::transform::cldr::cldr_serde. I think we should just use HashMaps.

@robertbastian
Copy link
Member

robertbastian commented Jul 1, 2022

HashMap won't work in #[no_std] as it requires entropy. So for HelloWorld we'll need another solution. But we can just linearly scan through the list, it's a dummy key where it doesn't really matter.

@robertbastian
Copy link
Member

#2152 should have unblocked this

@snktd
Copy link
Member Author

snktd commented Jul 11, 2022

#2152 should have unblocked this

Tried to fetch and update. There are still few more issues removing the Ord impl.

  1. Looks like we are comparing LangId here: https://github.com/snktd/icu4x/blob/remove-ord-impl/provider/datagen/src/transform/cldr/locale_canonicalizer/aliases.rs#L79. Which throws error if Ord isn't implemented for LangId. Maybe we can do string comparison here?

  2. Still can't remove Ord impl from ResourceOptions. Because a Vector containing ResourceOptions (Vec<(ResourceKey, ResourceOptions, String), Global>) is being sorted here: https://github.com/snktd/icu4x/blob/remove-ord-impl/provider/fs/src/export/fs_exporter.rs#L144. Not sure what is the best option here.

@sffc
Copy link
Member

sffc commented Jul 12, 2022

  1. Yes, I think it is actually more correct to switch it to a string comparison anyway, given that the comment says: "Order the set of rules ... alphabetically by field"
  2. Is being fixed by Custom eq for ResourceKey #2163

@robertbastian
Copy link
Member

I don't think that's correct. We are following this algorithm, and assuming our current code is correct, changing to string comparison is not, because it changes the location of und-* ids.

#[derive(Default, Debug, Clone, PartialEq, Eq, PartialOrd, Ord)]
#[derive(Default, Debug, PartialEq, Eq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what's going on. You should not remove the Clone here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right! Thanks for catching! Updated.

@sffc sffc merged commit 1a4233e into unicode-org:main Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the Ord impl for LanguageIdentifier / Locale
4 participants