Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify (in the model) the encoding #557

Closed
iherman opened this issue May 15, 2015 · 11 comments · Fixed by #581
Closed

Specify (in the model) the encoding #557

iherman opened this issue May 15, 2015 · 11 comments · Fixed by #581

Comments

@iherman
Copy link
Member

iherman commented May 15, 2015

The current text in the model does not specify that the metadata and the content must be in UTF-8. It may be better to make this explicit.

(This came up via issue #551)

@JeniT
Copy link

JeniT commented May 20, 2015

UTF-8 is an encoding of Unicode characters in a byte stream. The model is an abstract model. There shouldn't be any necessity to talk about encoding in an abstract model.

There is obviously a necessity to talk about the encoding of Unicode characters within CSV files, but that's handled within 7.2 Encoding and using the encoding flag when parsing.

@danbri
Copy link
Contributor

danbri commented May 27, 2015

Q: what do we do about cases in which a widely used character set doesn't have perfect mapping of all its defined chars 1:1 into unicode chars?

apparently Shift_JIS has this issue. https://support.microsoft.com/en-us/kb/170559

Do we discourage Shift_JIS or just lossily convert? @JeniT in See http://www.w3.org/2015/05/27-csvw-irc#T14-40-39 argues to replace, following https://encoding.spec.whatwg.org/#concept-encoding-process https://encoding.spec.whatwg.org/#error-mode

@JeniT
Copy link

JeniT commented May 27, 2015

We discussed: and resolved to specify that CSV files are parsed based on their encoding, according to the encoding spec with replacement mode, but cell errors are added if the resulting cell string values contain a U+FFFD.

http://www.w3.org/2015/05/27-csvw-irc#T14-47-07

@6a6d74
Copy link
Contributor

6a6d74 commented May 27, 2015

+1

1 similar comment
@danbri
Copy link
Contributor

danbri commented May 27, 2015

+1

@iherman
Copy link
Member Author

iherman commented May 27, 2015

I would say +1, but have the admin issue of normative reference: the encoding spec of the WHATWG is not, as far as I know, reference-able normatively. Can we get around that? Isn't it enough to say that we expect UTF-8, although the http response may return other encodings?

@JeniT
Copy link

JeniT commented May 27, 2015

This won't be a normative reference, because the only things we ever say about parsing are non-normative. We already referencing the WHATWG encoding spec in a non-normative way.

@iherman
Copy link
Member Author

iherman commented May 27, 2015

On 27 May 2015, at 12:16 , Jeni Tennison notifications@github.com wrote:

This won't be a normative reference, because the only things we ever say about parsing are non-normative. We already referencing the WHATWG encoding spec in a non-normative way.

Pfew:-) +1 from then!

Ivan


Reply to this email directly or view it on GitHub.


Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

@gkellogg
Copy link
Member

gkellogg commented Sep 2, 2015

@JeniT @iherman: just going through open branches, and saw that 27eaa52 on issue-557 branch was never merged. Did this get lost, or did we decide to do something different?

@JeniT
Copy link

JeniT commented Sep 2, 2015

I think it got lost. Can you get it merged in please? (As above, this is non-normative so purely an editorial change.)

@gkellogg
Copy link
Member

gkellogg commented Sep 2, 2015

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants