Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty attribute "xml:lang" reported as invalid, even though is valid #777

Closed
elmimmo opened this issue Jul 6, 2017 · 10 comments · Fixed by #1209
Closed

Empty attribute "xml:lang" reported as invalid, even though is valid #777

elmimmo opened this issue Jul 6, 2017 · 10 comments · Fixed by #1209
Assignees
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug
Milestone

Comments

@elmimmo
Copy link

elmimmo commented Jul 6, 2017

EpubCheck v4.0.2 (installed via Homebrew on Mac) chokes with xml:lang attributes that have no value assigned, such as:

<i lang="" xml:lang="">blahblah</i>

by saying:

Error while parsing file 'value of attribute "lang" is invalid; must be an RFC 3066 language identifier'.

But Language tags in HTML and XML states that:

HTML and XML also provide a means to prevent inheritance of language using the empty string, ie. xml:lang="". Essentially, this says: I do not want to associate any language with this information.

@tofi86 tofi86 added the type: bug The issue describes a bug label Jul 6, 2017
@mattgarrish
Copy link
Member

HTML 5.0 also says it explicitly:

Setting the attribute to the empty string indicates that the primary language is unknown.
https://www.w3.org/TR/html5/dom.html#the-lang-and-xml:lang-attributes

It's a pretty rare to find, though. You can use zxx for non-linguistic content. You have text you have no idea what language it is in?

@elmimmo
Copy link
Author

elmimmo commented Jul 6, 2017

My case is that of a body of text sprinkled with some words in different languages, for which I applied, to all, the same pattern:

<i lang="xx" xml:lang="xx">_______</i>

One word in particular is in a fictional language, so I left the value of its lang atttribute blank, as in "unknown" which, as far as I understand it, fits the particular situation the standard contemplates.

My case is irrelevant, though. I can put up with doing it different for that lone word.

P.S.: I actually found it strange that the HTML standard went with an empty value for signifying “Unknown”, since ISO 639-2 does have und for “Undetermined”, but that is a whole different scope.

@mattgarrish
Copy link
Member

Definitely a bug. I was just curious what you ran into. Thanks.

@tofi86
Copy link
Collaborator

tofi86 commented Jul 7, 2017

Matt, any chance you can fix the Schema and make a PullRequest? Otherwise I'd take this in the next week or so...

@murata2makoto
Copy link
Contributor

I did it. I will create a pull request.

@tofi86 tofi86 self-assigned this Jul 7, 2017
@tofi86 tofi86 added the status: has PR The issue is being processed in a pull request label Jul 7, 2017
@tofi86 tofi86 added this to the 4.1.0 milestone Jul 7, 2017
tofi86 added a commit to Advanced-Publishing-Laboratory/epubcheck that referenced this issue Jul 10, 2017
@tofi86 tofi86 closed this as completed in 392c2f6 Jul 10, 2017
@tofi86
Copy link
Collaborator

tofi86 commented Jul 10, 2017

Fixed an merged in master.

@Hellsbutt
Copy link

I'm trying to upload a book to play store but I keep getting this error please help me

@mattgarrish
Copy link
Member

Are you getting it from HTML, SVG or the package document?

It looks like the package document schema has the same problem, as it validates against xsd:language which doesn't allow for empty strings.

@mattgarrish mattgarrish reopened this Dec 8, 2020
@rdeltour rdeltour modified the milestones: 4.1.0, v4.2.5 Feb 26, 2021
@rdeltour rdeltour self-assigned this Feb 26, 2021
rdeltour added a commit that referenced this issue Feb 26, 2021
An empty `xml:lang` attribute can explicitly indicate that the content
does not inherit the context language.

Fixes #777
rdeltour added a commit that referenced this issue Feb 26, 2021
An empty `xml:lang` attribute can explicitly indicate that the content
does not inherit the context language.

Fixes #777
@clapierre
Copy link

clapierre commented Apr 19, 2021

I believe we are now not detecting a missing xml:lang="en" attribute entirely for content documents when you relaxed the xml:lang="" check.

Ace no longer reports an error when we have

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en">

I would expect that to flag an error that we also require xml:lang="en" as well as lang="en"

@mattgarrish
Copy link
Member

I would expect that to flag an error that we also require xml:lang="en" as well as lang="en"

I'm not sure this is the right place for reporting this, as it's valid as far as epubcheck is concerned.

It's best practice to define both attributes because xhtml content documents may get served up as text/html, but it's not a requirement. The only requirement is that the languages match if both attributes are specified.

It's technically better to specify xml:lang for xhtml content documents because it's better for xml toolchains, but I don't believe it matters for browsers or reading systems which one you choose. See, for example, https://www.w3.org/International/questions/qa-html-language-declarations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: has PR The issue is being processed in a pull request type: bug The issue describes a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants