Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for implementing Local Displayname Algorithm #3587

Merged
merged 14 commits into from
Aug 23, 2023
18 changes: 9 additions & 9 deletions experimental/displaynames/src/displaynames.rs
Original file line number Diff line number Diff line change
Expand Up @@ -406,23 +406,24 @@ impl LocaleDisplayNamesFormatter {

/// Returns the display name of a locale.
/// This implementation is based on the algorithm described in
/// https://www.unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm
/// `<https://www.unicode.org/reports/tr35/tr35-general.html#locale_display_name_algorithm>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't want tick marks around this; just the <> is sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

///
// TODO: Make this return a writeable instead of using alloc
pub fn of<'a, 'b: 'a, 'c: 'a>(&'b self, locale: &'c Locale) -> Cow<'a, str> {
let longest_matching_identifier = self.find_longest_matching_subtag(&locale);
let longest_matching_identifier = self.find_longest_matching_subtag(locale);

// Step - 1: Construct a locale display name string (LDN).
// Find the displayname for the longest_matching_subtag which was derived above.
let ldn = self.get_locale_display_name(&locale, &longest_matching_identifier);
let ldn = self.get_locale_display_name(locale, &longest_matching_identifier);

// Step - 2: Construct a vector of longest qualifying substrings (LQS).
// Find the displayname for the remaining locale if exists.
let lqs = self.get_longest_qualifying_substrings(&locale, &longest_matching_identifier);
let lqs = self.get_longest_qualifying_substrings(locale, &longest_matching_identifier);

// Step - 3: Return the displayname based on the size of LQS.
let mut result = Cow::Borrowed(ldn);
if lqs.len() > 0 {
#[allow(clippy::indexing_slicing)] // indexes in range
if !lqs.is_empty() {
Comment on lines +425 to +426
Copy link
Member

@sffc sffc Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (optional): you can use split_first to avoid the need to unwrap. split_first returns an Option<&T> with the first and an Option<&[T]> with the remainder.

let mut output = String::with_capacity(
result.len() + " (".len() + lqs.iter().map(|s| ", ".len() + s.len()).sum::<usize>()
- ", ".len()
Expand All @@ -435,7 +436,7 @@ impl LocaleDisplayNamesFormatter {
output.push_str(", ");
output.push_str(lqs);
}
output.push_str(")");
output.push(')');
result = Cow::Owned(output);
}
result
Expand Down Expand Up @@ -473,7 +474,7 @@ impl LocaleDisplayNamesFormatter {
}
}
}
return (locale.id.language, None, None).into();
(locale.id.language, None, None).into()
}

fn get_locale_display_name<'a>(
Expand Down Expand Up @@ -614,7 +615,6 @@ impl LocaleDisplayNamesFormatter {
.unwrap_or(variant_key.as_str()),
);
}

return lqs;
lqs
}
}
4 changes: 4 additions & 0 deletions experimental/displaynames/tests/tests.rs
robertbastian marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ fn test_concatenate() {
input_1: &Locale::from_str("es-419-fonipa").unwrap(),
expected: "Latin American Spanish (IPA Phonetics)",
},
TestCase {
input_1: &Locale::from_str("es-Latn-419").unwrap(),
expected: "Spanish (Latin, Latin America)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be Latin American Spanish (Latin) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But "es-Latn" doesn't exist in locale data? Don't we consider just the sequential subtags to get the longest matching string for locale display name? For example the test for "es-Cyrl-MX" returns "Spanish (Cyrillic, Mexico)" and not the "Mexican Spanish (Cyrillic)".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think you're right: the CLDR test data does the same thing

https://github.com/unicode-org/cldr/blob/main/common/testData/localeIdentifiers/localeDisplayName.txt#L15

this is different than my intuitive sense but if it's what's in the spec then that's what we should implement!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locale data contains

"es-419": "Latin American Spanish",

so the result should be "Latin American Spanish (Latin)". The condition that stops this from happening is in line 463: LR is only tried if the locales has no script. I don't think that's following the spec, because es-419, not es should be the longest matching subtag

Match the L subtags against the type values in the elements. Pick the element with the most subtags matching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think non-sequential subtags can be a valid match, because the spec also says

If there is more than one such element, pick the one that has subtypes matching earlier.

This kind of implies that there can be multiple matches that aren't prefixes of one another, so subtags can be skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think the inconsistency with the CLDR test data is that it is running in non-dialect mode. In dialect mode, it's pretty clear that "Latin American Spanish (Latin)" is the correct output. This can be seen by the fact that the CLDR test data for es-419 also says "Spanish (Latin America)" instead of "Latin American Spanish".

So, yes, please fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(happy doing this in a follow-up in #3913)

},
TestCase {
input_1: &locale!("xx-YY"),
expected: "xx (YY)",
Expand Down
Loading