Specify (in the model) the encoding #557

iherman · 2015-05-15T07:47:47Z

The current text in the model does not specify that the metadata and the content must be in UTF-8. It may be better to make this explicit.

(This came up via issue #551)

JeniT · 2015-05-20T11:50:49Z

UTF-8 is an encoding of Unicode characters in a byte stream. The model is an abstract model. There shouldn't be any necessity to talk about encoding in an abstract model.

There is obviously a necessity to talk about the encoding of Unicode characters within CSV files, but that's handled within 7.2 Encoding and using the encoding flag when parsing.

danbri · 2015-05-27T14:41:19Z

Q: what do we do about cases in which a widely used character set doesn't have perfect mapping of all its defined chars 1:1 into unicode chars?

apparently Shift_JIS has this issue. https://support.microsoft.com/en-us/kb/170559

Do we discourage Shift_JIS or just lossily convert? @JeniT in See http://www.w3.org/2015/05/27-csvw-irc#T14-40-39 argues to replace, following https://encoding.spec.whatwg.org/#concept-encoding-process https://encoding.spec.whatwg.org/#error-mode

JeniT · 2015-05-27T14:48:18Z

We discussed: and resolved to specify that CSV files are parsed based on their encoding, according to the encoding spec with replacement mode, but cell errors are added if the resulting cell string values contain a U+FFFD.

http://www.w3.org/2015/05/27-csvw-irc#T14-47-07

6a6d74 · 2015-05-27T14:49:07Z

+1

danbri · 2015-05-27T14:49:15Z

+1

iherman · 2015-05-27T15:40:44Z

I would say +1, but have the admin issue of normative reference: the encoding spec of the WHATWG is not, as far as I know, reference-able normatively. Can we get around that? Isn't it enough to say that we expect UTF-8, although the http response may return other encodings?

JeniT · 2015-05-27T16:16:34Z

This won't be a normative reference, because the only things we ever say about parsing are non-normative. We already referencing the WHATWG encoding spec in a non-normative way.

iherman · 2015-05-27T17:32:26Z

On 27 May 2015, at 12:16 , Jeni Tennison notifications@github.com wrote:

This won't be a normative reference, because the only things we ever say about parsing are non-normative. We already referencing the WHATWG encoding spec in a non-normative way.

Pfew:-) +1 from then!

Ivan

—
Reply to this email directly or view it on GitHub.

Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

fixes #557

gkellogg · 2015-09-02T18:13:43Z

@JeniT @iherman: just going through open branches, and saw that 27eaa52 on issue-557 branch was never merged. Did this get lost, or did we decide to do something different?

JeniT · 2015-09-02T20:27:50Z

I think it got lost. Can you get it merged in please? (As above, this is non-normative so purely an editorial change.)

gkellogg · 2015-09-02T23:13:34Z

Done.

iherman added Model Document Metadata vocabulary document Requires telcon discussion/decision For LCCR labels May 15, 2015

iherman mentioned this issue May 15, 2015

Exact nature of case-insensitive match in schema compatibility #551

Closed

JeniT removed the Requires telcon discussion/decision label Jun 3, 2015

JeniT self-assigned this Jun 3, 2015

JeniT added the Editorial label Jun 3, 2015

JeniT pushed a commit that referenced this issue Jun 3, 2015

added text re non-Unicode encodings

b5882a0

fixes #557

JeniT mentioned this issue Jun 3, 2015

added text re non-Unicode encodings #581

Merged

gkellogg closed this as completed in #581 Jun 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify (in the model) the encoding #557

Specify (in the model) the encoding #557

iherman commented May 15, 2015

JeniT commented May 20, 2015

danbri commented May 27, 2015

JeniT commented May 27, 2015

6a6d74 commented May 27, 2015

danbri commented May 27, 2015

iherman commented May 27, 2015

JeniT commented May 27, 2015

iherman commented May 27, 2015

gkellogg commented Sep 2, 2015

JeniT commented Sep 2, 2015

gkellogg commented Sep 2, 2015

Specify (in the model) the encoding #557

Specify (in the model) the encoding #557

Comments

iherman commented May 15, 2015

JeniT commented May 20, 2015

danbri commented May 27, 2015

JeniT commented May 27, 2015

6a6d74 commented May 27, 2015

danbri commented May 27, 2015

iherman commented May 27, 2015

JeniT commented May 27, 2015

iherman commented May 27, 2015

gkellogg commented Sep 2, 2015

JeniT commented Sep 2, 2015

gkellogg commented Sep 2, 2015