Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139

Merged
merged 2 commits into from
Aug 30, 2020

Conversation

acli
Copy link
Contributor

@acli acli commented Aug 24, 2020

Proposed patch to change the behaviour of the q tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii.

The patch also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.

A few test cases are included.

…tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii.

Also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.
@tats
Copy link
Owner

tats commented Aug 29, 2020

Tests q2-q6 failed.

+           HTMLlineproc1((obuf->q_level & 1 ? "“": "‘"), h_env);
+           HTMLlineproc1((obuf->q_level & 1 ? "”": "’"), h_env);

Should the condition be obuf->q_level == 0 to compatible with Firefox and Chrome?

@acli
Copy link
Contributor Author

acli commented Aug 29, 2020

Let me double-check why q2–q6 fail. Thanks.

If Firefox and Chrome do the equivalent of obuf->q_level == 0 then they are wrong and I’ll file a bug report with Firefox (and maybe Chrome, but I don’t use Chrome). Normal typographic practice is to alternate between single and double quotes; I’ll find some references to back me up tomorrow, it’s already 4am in my timezone.

@tats
Copy link
Owner

tats commented Aug 29, 2020

--- -	2020-08-29 19:34:48.921878094 +0900
+++ q2.expected	2020-08-29 13:54:04.808471611 +0900
@@ -1 +1 @@
-‘test’
+“test”
--- -	2020-08-29 19:34:48.974691204 +0900
+++ q5.expected	2020-08-29 13:54:04.808471611 +0900
@@ -1 +1 @@
-‘example of a “nested” quote’
+“example of a ‘nested’ quote”

I also tested this q7.html

<!doctype html>
<meta charset=utf-8>
<q>foo <q>bar <q>example of a <q>nested</q> quote</q> baz</q> qux</q>
$ ../w3m q7.html | cat
‘foo “bar ‘example of a “nested” quote’ baz” qux’

With obuf->q_level == 0:

$ ../w3m q7.html | cat
“foo ‘bar ‘example of a ‘nested’ quote’ baz’ qux”
$ ../w3m q5.html | cat
“example of a ‘nested’ quote”
$ ../w3m q2.html | cat
“test”

On Debian, firefox-esr 68.12.0esr-1 and chromium 83.0.4103.116-3
seem to behave like obuf->q_level == 0. firefox 80.0-1 seems to
use brackets (「」『』) instead of quotes (“”‘’).

@acli
Copy link
Contributor Author

acli commented Aug 29, 2020

q2–q6 is failing on my end too. I somehow introduced an error without noticing it. Sorry. I’ll work on finding the error and fixing it.

Firefox is probably language-aware. I’ve thought about that too, but I wasn’t sure how I’d go about detecting language. For example, in French we should use « “ ” » and not “ ‘ ’ ” and in British English people would expect ‘ “ ” ’ and not “ ‘ ’ ”. In Chinese and Japanese 「『』」 would be correct, not “ ‘ ’ ” (except in zh-CN, which I understand follows English rules)

@acli
Copy link
Contributor Author

acli commented Aug 29, 2020

I’ve checked my reference books (Bringhurst, Chicago, Ramat, and a small reference called Detail in Typography) and I could not find a clear guideline that quotation marks should alternate. So I think if you prefer we can change the test condition, but I think alternating makes reading less confusing in case we do get three levels of q or more. What do you think?

(Bringhurst = a standard reference in North America for designers; Chicago = a standard reference in North America for English copy editors; Ramat = a standard reference in Canada for French designers and copy editors)

@acli
Copy link
Contributor Author

acli commented Aug 29, 2020

More about language-aware quoting:

Suppose we had a test case that looked like

<body lang=ja><q>test<q>test</q>test</q></body>

(or zh-TW, or zh-HK, or zh-hant) we should get

 「test『test』test」

but if we have something like

<body lang=ja><q>test</q><p lang=fr><q>test</q><p><q>test</q></body>

we should get

「test」

« test »

「test」

(I just tested this on Firefox 80.0. They render the middle line as «test». They are wrong. In French there should always be a thin or non-breaking space after « and before » and I can cite the table on p.191 in Ramat 2012.)

I think my major sticking point is I don’t know how to get “the lang attribute of the closest enclosing block that has a lang attribute”. I also don’t think we’re looking at all tags. A while ago I was told that something like this hypothetical example would be valid HTML5 but there’d be no way for us to handle it because this basically means the set of valid HTML5 tags is infinite:

<!doctype html>
<meta charset=utf-8>
<element-i-made-up lang=en-AU>test<q>this should be enclosed in single quotes</q></element-i-made-up>

@tats
Copy link
Owner

tats commented Aug 29, 2020

Firefox is probably language-aware.

I see. I normally use LANG=ja_JP.UTF-8, so Firefox 80.0 used brackets.
When I set LC_ALL=C, Firefox 80.0 uses quotes. Even when LANG=ja_JP.UTF-8,
Firefox 80.0 uses quotes if <html lang="en">.

Anyway, only English-style support is acceptable to me for w3m.

The current fix looks OK. I don't have a strong opinion about three levels or more.

@acli
Copy link
Contributor Author

acli commented Aug 29, 2020

Great. We can revisit the language issue in the future if I manage to figure out how to pull it off.

@tats tats merged commit 8c164d8 into tats:master Aug 30, 2020
@tats
Copy link
Owner

tats commented Aug 30, 2020

Merged. Thanks for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants