Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with `type: 'disjunction'` in Spanish? #45

Open
mathiasbynens opened this issue May 8, 2019 · 7 comments

Comments

@gsathya

This comment has been minimized.

Copy link
Member

commented May 8, 2019

@littledan

This comment has been minimized.

Copy link
Member

commented May 8, 2019

I'm not sure how I didn't realize this sooner, but our schema is very broken for Spanish, since it does not permit implementations to use the word in determining the conjunction. An even simpler example without disjunction:

d8> x.format(["hijos", "hijas"])
"hijos y hijas"

That should really be "hijos e hijas". I'm not sure what, if anything, we should do about this issue. @zbraniecki Aside from this phonological interaction in Spanish (which is relatively easy to deal with programmatically, except for loan words), were you saying there are sometimes other issues in other languages with grammatical gender or case and ListFormat?

@caridy

This comment has been minimized.

Copy link

commented May 8, 2019

wow, just wow! I just realized that @littledan is correct. The implementation is actually not taking into consideration the parts at all, only the position of the parts:

image

I didn't follow many of the details of this proposal, but I should have known better since Spanish is my first language.

First question is: what kind of data do we have, aside from positioning? Then we can see what we can do with that abstract operation to fix this issue.

@srl295

This comment has been minimized.

Copy link
Member

commented May 9, 2019

Maltese also would have: ( I think )

  • ħobż u żejt
  • ħobż, żejt, w abjad (abjad starts with a vowel)
  • ħobż, ilma, w żejt (ilma ends with a vowel)
@mathiasbynens

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

Based on https://github.com/unicode-org/cldr/blob/2dd06669d833823e26872f249aa304bc9d9d2a90/tools/java/com/ibm/icu/text/ListFormatData.java#L18, it looks like the y in Spanish is just hardcoded, and there’s no additional data that encodes the e alternate form. Could someone who’s more familiar with CLDR data and how it’s built (@srl295, @caridy perhaps) please confirm?

@srl295

This comment has been minimized.

Copy link
Member

commented May 13, 2019

@mathiasbynens sorry, I think you may have found a random commit containing the proof-of-concept code.

Listformat spec is here https://www.unicode.org/reports/tr35/tr35-general.html#ListPatterns and data for Spanish (for example) in master is here here https://github.com/unicode-org/cldr/blob/c4cf766c654f7fd47d037eb909f38f45af5a4329/common/main/es.xml#L8970 … so yes, there is not data for an alternate e form. While multiple types are allowed, there isn't anything in the spec which covers this case. Would you be able to file a ticket on our new tracker http://unicode-org.atlassian.net/projects/CLDR/issues ?

@mathiasbynens

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

@srl295 Thanks for the pointer. I’ve filed https://unicode-org.atlassian.net/browse/CLDR-13025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.