Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picking a language #42

Closed
mattgarrish opened this issue Aug 20, 2017 · 8 comments
Closed

Picking a language #42

mattgarrish opened this issue Aug 20, 2017 · 8 comments

Comments

@mattgarrish
Copy link
Member

The current fallback algorithm states that if the language cannot be determined, set it to English.

I don't like this approach, as it's a random choice. As stated in BCP 47, if there has to be a language and one cannot be determined, use "und" (undetermined).

If we're picking up language codes from WP resources, it would also be good to be clear that a conforming language code is expected, not just a language designation like "English".

@iherman
Copy link
Member

iherman commented Aug 20, 2017

My (major) mistake. I somehow thought that English is the fallback language is HTML but, luckily, this is not the case. Although the HTML spec[1] does not refer to und explicitly, it does say, in [1], that

If neither the node nor any of the node’s ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.

The "pragma set" refers to the http equivalent settings via a meta element.

What this basically means that, as a first step, you are absolutely right in using und and not English. I do not know whether we want to mix in HTTP; I would say that, at this point, let us avoid that. Maybe worth referring to [1] in the text when it comes to the way HTML establishes the language.

  1. https://www.w3.org/TR/html/dom.html#the-lang-and-xmllang-attributes

@mattgarrish
Copy link
Member Author

I do not know whether we want to mix in HTTP; I would say that, at this point, let us avoid that.

I've been wondering about that, too. It might be where an image-based work specifies the language of any text in the images. But do you look at the Content-Language of the manifest, a first resource, or both?

I don't imagine it will provide anything useful for HTML/SVG publications if the author has omitted a language from the manifest and from the content.

@iherman
Copy link
Member

iherman commented Aug 20, 2017

I would propose not to go there for now; leave this issue open and refer to it from the draft. At some point in time we will have to discuss the interplay between HTTP and the access to the (concrete) manifest, and this would become part of it.

@lrosenthol
Copy link

lrosenthol commented Aug 20, 2017 via email

@iherman
Copy link
Member

iherman commented Aug 20, 2017

@lrosenthol sure. But I would consider any HTTP related information access only as a fallback anyway, used if everything else fails. Taken account the relative difficulties to set an HTTP response, I would not rely on HTTP headers only.

@HadrienGardeur
Copy link

@lrosenthol

The other reason to "not go there" right now is that if we put stuff about
HTTP directly into WP, then we may/will run into issues in creating PWP.

We already have an issue when creating a PWP from a WP if we don't store the address of the publication in the manifest itself.

@BigBlueHat
Copy link
Member

The HTML spec only uses HTTP as an example of a "higher-level protocol" (see the excerpt in @iherman's comment). If we define a language "fallback" to that protocol level, I'd just crib that wording. 😸

@mattgarrish
Copy link
Member Author

Closing this issue as my original concern was fixed in PR #51. I'll open new issues for the items that split out from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants