Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

acli · 2020-08-24T02:26:35Z

Proposed patch to change the behaviour of the q tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii.

The patch also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.

A few test cases are included.

…tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii. Also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.

tats · 2020-08-29T06:36:44Z

Tests q2-q6 failed.

+           HTMLlineproc1((obuf->q_level & 1 ? "&ldquo;": "&lsquo;"), h_env);
+           HTMLlineproc1((obuf->q_level & 1 ? "&rdquo;": "&rsquo;"), h_env);

Should the condition be obuf->q_level == 0 to compatible with Firefox and Chrome?

acli · 2020-08-29T08:13:28Z

Let me double-check why q2–q6 fail. Thanks.

If Firefox and Chrome do the equivalent of obuf->q_level == 0 then they are wrong and I’ll file a bug report with Firefox (and maybe Chrome, but I don’t use Chrome). Normal typographic practice is to alternate between single and double quotes; I’ll find some references to back me up tomorrow, it’s already 4am in my timezone.

tats · 2020-08-29T11:37:21Z

--- -	2020-08-29 19:34:48.921878094 +0900
+++ q2.expected	2020-08-29 13:54:04.808471611 +0900
@@ -1 +1 @@
-‘test’
+“test”
--- -	2020-08-29 19:34:48.974691204 +0900
+++ q5.expected	2020-08-29 13:54:04.808471611 +0900
@@ -1 +1 @@
-‘example of a “nested” quote’
+“example of a ‘nested’ quote”

I also tested this q7.html

<!doctype html>
<meta charset=utf-8>
<q>foo <q>bar <q>example of a <q>nested</q> quote</q> baz</q> qux</q>

$ ../w3m q7.html | cat
‘foo “bar ‘example of a “nested” quote’ baz” qux’

With obuf->q_level == 0:

$ ../w3m q7.html | cat
“foo ‘bar ‘example of a ‘nested’ quote’ baz’ qux”
$ ../w3m q5.html | cat
“example of a ‘nested’ quote”
$ ../w3m q2.html | cat
“test”

On Debian, firefox-esr 68.12.0esr-1 and chromium 83.0.4103.116-3
seem to behave like obuf->q_level == 0. firefox 80.0-1 seems to
use brackets (「」『』) instead of quotes (“”‘’).

acli · 2020-08-29T18:07:42Z

q2–q6 is failing on my end too. I somehow introduced an error without noticing it. Sorry. I’ll work on finding the error and fixing it.

Firefox is probably language-aware. I’ve thought about that too, but I wasn’t sure how I’d go about detecting language. For example, in French we should use « “ ” » and not “ ‘ ’ ” and in British English people would expect ‘ “ ” ’ and not “ ‘ ’ ”. In Chinese and Japanese 「『』」 would be correct, not “ ‘ ’ ” (except in zh-CN, which I understand follows English rules)

acli · 2020-08-29T18:41:50Z

I’ve checked my reference books (Bringhurst, Chicago, Ramat, and a small reference called Detail in Typography) and I could not find a clear guideline that quotation marks should alternate. So I think if you prefer we can change the test condition, but I think alternating makes reading less confusing in case we do get three levels of q or more. What do you think?

(Bringhurst = a standard reference in North America for designers; Chicago = a standard reference in North America for English copy editors; Ramat = a standard reference in Canada for French designers and copy editors)

acli · 2020-08-29T19:53:43Z

More about language-aware quoting:

Suppose we had a test case that looked like

<body lang=ja><q>test<q>test</q>test</q></body>

(or zh-TW, or zh-HK, or zh-hant) we should get

 「test『test』test」

but if we have something like

<body lang=ja><q>test</q><p lang=fr><q>test</q><p><q>test</q></body>

we should get

「test」

« test »

「test」

(I just tested this on Firefox 80.0. They render the middle line as «test». They are wrong. In French there should always be a thin or non-breaking space after « and before » and I can cite the table on p.191 in Ramat 2012.)

I think my major sticking point is I don’t know how to get “the lang attribute of the closest enclosing block that has a lang attribute”. I also don’t think we’re looking at all tags. A while ago I was told that something like this hypothetical example would be valid HTML5 but there’d be no way for us to handle it because this basically means the set of valid HTML5 tags is infinite:

<!doctype html>
<meta charset=utf-8>
<element-i-made-up lang=en-AU>test<q>this should be enclosed in single quotes</q></element-i-made-up>

tats · 2020-08-29T21:43:59Z

Firefox is probably language-aware.

I see. I normally use LANG=ja_JP.UTF-8, so Firefox 80.0 used brackets.
When I set LC_ALL=C, Firefox 80.0 uses quotes. Even when LANG=ja_JP.UTF-8,
Firefox 80.0 uses quotes if <html lang="en">.

Anyway, only English-style support is acceptable to me for w3m.

The current fix looks OK. I don't have a strong opinion about three levels or more.

acli · 2020-08-29T22:03:11Z

Great. We can revisit the language issue in the future if I manage to figure out how to pull it off.

tats · 2020-08-30T00:57:14Z

Merged. Thanks for your contribution.

Somehow the wrong quotes were used. This should fix the failing tests.

b9488ff

tats merged commit 8c164d8 into tats:master Aug 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

acli commented Aug 24, 2020

tats commented Aug 29, 2020

acli commented Aug 29, 2020 •

edited

Loading

tats commented Aug 29, 2020

acli commented Aug 29, 2020

acli commented Aug 29, 2020

acli commented Aug 29, 2020

tats commented Aug 29, 2020

acli commented Aug 29, 2020

tats commented Aug 30, 2020

Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

Conversation

acli commented Aug 24, 2020

tats commented Aug 29, 2020

acli commented Aug 29, 2020 • edited Loading

tats commented Aug 29, 2020

acli commented Aug 29, 2020

acli commented Aug 29, 2020

acli commented Aug 29, 2020

tats commented Aug 29, 2020

acli commented Aug 29, 2020

tats commented Aug 30, 2020

acli commented Aug 29, 2020 •

edited

Loading