Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with type: 'disjunction' in Spanish? #45

Closed
mathiasbynens opened this issue May 8, 2019 · 21 comments
Closed

Bug with type: 'disjunction' in Spanish? #45

mathiasbynens opened this issue May 8, 2019 · 21 comments

Comments

@mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented May 8, 2019

https://twitter.com/_k_1_k_/status/1126022957904596995

Is this accurate? @caridy @littledan

@gsathya
Copy link
Member

@gsathya gsathya commented May 8, 2019

Loading

@littledan
Copy link
Member

@littledan littledan commented May 8, 2019

I'm not sure how I didn't realize this sooner, but our schema is very broken for Spanish, since it does not permit implementations to use the word in determining the conjunction. An even simpler example without disjunction:

d8> x.format(["hijos", "hijas"])
"hijos y hijas"

That should really be "hijos e hijas". I'm not sure what, if anything, we should do about this issue. @zbraniecki Aside from this phonological interaction in Spanish (which is relatively easy to deal with programmatically, except for loan words), were you saying there are sometimes other issues in other languages with grammatical gender or case and ListFormat?

Loading

@caridy
Copy link

@caridy caridy commented May 8, 2019

wow, just wow! I just realized that @littledan is correct. The implementation is actually not taking into consideration the parts at all, only the position of the parts:

image

I didn't follow many of the details of this proposal, but I should have known better since Spanish is my first language.

First question is: what kind of data do we have, aside from positioning? Then we can see what we can do with that abstract operation to fix this issue.

Loading

@srl295
Copy link
Member

@srl295 srl295 commented May 9, 2019

Maltese also would have: ( I think )

  • ħobż u żejt
  • ħobż, żejt, w abjad (abjad starts with a vowel)
  • ħobż, ilma, w żejt (ilma ends with a vowel)

Loading

@mathiasbynens
Copy link
Member Author

@mathiasbynens mathiasbynens commented May 13, 2019

Based on https://github.com/unicode-org/cldr/blob/2dd06669d833823e26872f249aa304bc9d9d2a90/tools/java/com/ibm/icu/text/ListFormatData.java#L18, it looks like the y in Spanish is just hardcoded, and there’s no additional data that encodes the e alternate form. Could someone who’s more familiar with CLDR data and how it’s built (@srl295, @caridy perhaps) please confirm?

Loading

@srl295
Copy link
Member

@srl295 srl295 commented May 13, 2019

@mathiasbynens sorry, I think you may have found a random commit containing the proof-of-concept code.

Listformat spec is here https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns and data for Spanish (for example) in master is here here https://github.com/unicode-org/cldr/blob/c4cf766c654f7fd47d037eb909f38f45af5a4329/common/main/es.xml#L8970 … so yes, there is not data for an alternate e form. While multiple types are allowed, there isn't anything in the spec which covers this case. Would you be able to file a ticket on our new tracker http://unicode-org.atlassian.net/projects/CLDR/issues ?

Loading

@mathiasbynens
Copy link
Member Author

@mathiasbynens mathiasbynens commented May 13, 2019

@srl295 Thanks for the pointer. I’ve filed https://unicode-org.atlassian.net/browse/CLDR-13025.

Loading

@zbraniecki
Copy link
Member

@zbraniecki zbraniecki commented Jan 24, 2020

@FrankYFTang @sffc @srl295 - what's the status of this in CLDR/ICU? are we any closer to unblocking stage 4 for this proposal?

Loading

@sffc
Copy link
Contributor

@sffc sffc commented Jan 24, 2020

We discussed this at the Dec 12 and Jan 9 ECMA-402 meetings. Notes from Jan 9:

SFC: Since we now understand the scope of the type: 'disjunction' fix and it is on track, I think we can move forward with Stage 4 for this proposal.

YMD: We went through 60-70 languages. We found that Maltese and Italian might have issues, but those issues might have been fixed in language reform. So we're going to fix this in code in ICU since it only affects a small number of languages.

DE: We also need to fix this in the specification level.

Has @littledan's comment been addressed?

Loading

@littledan
Copy link
Member

@littledan littledan commented Jan 24, 2020

I still don't know enough about the fix at the CLDR/ICU level to make the spec fix. Where can I learn more about it?

Loading

@sffc
Copy link
Contributor

@sffc sffc commented Jan 24, 2020

@younies Can you share the data model you came up with for Spanish list formatting so that the ECMA-402 spec change can be made?

Loading

@younies
Copy link
Member

@younies younies commented Jan 29, 2020

AND Rules

  • General Rule
    • A, B, C y D
  • Use e instead of y in the following cases (exceptions):
    1. Last word starts with hi/i but not (hie & hia)
    2. Last word is a foreign word starting with h” where the H is pronounced

OR Rules

  • General Rule
    • A, B, C o D
    • OR
    • A o B o C o D
  • Use u instead of o in the following cases (exceptions):
    1. Last word starts with o/ho
    2. Last word is a foreign word starting with ho- where the h stands for /x/, e.g. homeless /xóm.les/. Thus would lead to severely incorrect false positives. (not too frequent, though)
    3. Last word starts with 8, 8…
    4. Last word starts with 11, 11.000.000, 11.000, 11.000…….

Loading

@littledan
Copy link
Member

@littledan littledan commented Jan 29, 2020

One way that we could phrase the specification text is to say, there's a finite number of templates (instead of one), and an ILD function based on the previous and next list items which selects among the templates. Any thoughts on this design? It sounds like it'd handle the Spanish case (but maybe it's over-designed?).

Loading

@caridy
Copy link

@caridy caridy commented Jan 29, 2020

@littledan that seems reasonable.

Loading

@younies
Copy link
Member

@younies younies commented Jan 29, 2020

The problem is related to how to pronounce the last word. And the rules that I have just shared, cover 99% of all the cases.

And after the investigation of ~70 languages (the most popular ones), Spanish is the only language that has this problem (may be Italy too, but not confirmed).

Therefore, it is better to develop a small fix in the code that deals with this problem.

Loading

@littledan
Copy link
Member

@littledan littledan commented Jan 30, 2020

@younies Could you explain the situation with Italian and Maltese, where it's been raised that there may or may not be similar issues?

Loading

@younies
Copy link
Member

@younies younies commented Jan 30, 2020

For Italy:

AND Rules

  • General Rule
    • Use , between all the words and e before the last one
      • e.g. A,B,C --> A, B e C
  • Exceptions
    • Use ed instead of e if the last word starts with e but not ed
      • e.g. tigers and elephants -> tigri ed elefanti

OR Rules

  • General Rule
    • Use , between all the words and o before the last one
      • e.g. A,B,C --> A, B o C
  • Exceptions
    • Use od instead of o if the last word starts with o but not od
      • e.g. roses or orchids --> rose od orchidee

For Maltese:

It is confirmed that there is not exceptions in the modern Maltese.

Loading

@littledan
Copy link
Member

@littledan littledan commented Jan 30, 2020

Since Spanish and Italian seem to agree: does anyone have opinions on whether the choice of conjunction should be based on just the following word, or also the preceding word?

Loading

@younies
Copy link
Member

@younies younies commented Jan 30, 2020

@littledan , just the following word (i.e. the last word in the list)

Loading

@zbraniecki
Copy link
Member

@zbraniecki zbraniecki commented Jun 6, 2020

I think this is now fixed with #50

@littledan - can you confirm? Can you resolve this?

Loading

@littledan
Copy link
Member

@littledan littledan commented Jun 9, 2020

Yes, #50 closes this. (I am not able to close this issue, though, as I'm not a repository owner.)

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants