-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleaned version of 20200823_q branch. Changes the behaviour of the q … #139
Conversation
…tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii. Also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.
Tests q2-q6 failed.
Should the condition be |
Let me double-check why q2–q6 fail. Thanks. If Firefox and Chrome do the equivalent of |
I also tested this q7.html <!doctype html>
<meta charset=utf-8>
<q>foo <q>bar <q>example of a <q>nested</q> quote</q> baz</q> qux</q>
With
On Debian, firefox-esr 68.12.0esr-1 and chromium 83.0.4103.116-3 |
q2–q6 is failing on my end too. I somehow introduced an error without noticing it. Sorry. I’ll work on finding the error and fixing it. Firefox is probably language-aware. I’ve thought about that too, but I wasn’t sure how I’d go about detecting language. For example, in French we should use « “ ” » and not “ ‘ ’ ” and in British English people would expect ‘ “ ” ’ and not “ ‘ ’ ”. In Chinese and Japanese 「『』」 would be correct, not “ ‘ ’ ” (except in zh-CN, which I understand follows English rules) |
I’ve checked my reference books (Bringhurst, Chicago, Ramat, and a small reference called Detail in Typography) and I could not find a clear guideline that quotation marks should alternate. So I think if you prefer we can change the test condition, but I think alternating makes reading less confusing in case we do get three levels of q or more. What do you think? (Bringhurst = a standard reference in North America for designers; Chicago = a standard reference in North America for English copy editors; Ramat = a standard reference in Canada for French designers and copy editors) |
More about language-aware quoting: Suppose we had a test case that looked like
(or zh-TW, or zh-HK, or zh-hant) we should get
but if we have something like
we should get
(I just tested this on Firefox 80.0. They render the middle line as «test». They are wrong. In French there should always be a thin or non-breaking space after « and before » and I can cite the table on p.191 in Ramat 2012.) I think my major sticking point is I don’t know how to get “the lang attribute of the closest enclosing block that has a lang attribute”. I also don’t think we’re looking at all tags. A while ago I was told that something like this hypothetical example would be valid HTML5 but there’d be no way for us to handle it because this basically means the set of valid HTML5 tags is infinite:
|
I see. I normally use LANG=ja_JP.UTF-8, so Firefox 80.0 used brackets. Anyway, only English-style support is acceptable to me for w3m. The current fix looks OK. I don't have a strong opinion about three levels or more. |
Great. We can revisit the language issue in the future if I manage to figure out how to pull it off. |
Merged. Thanks for your contribution. |
Proposed patch to change the behaviour of the q tag (when m17n and Unicode are configured) to use "smart" quotes if the display charset can handle them. Falls back to old behaviour (ASCII quotes with left/right quote semantics for 6/0 and 2/6) if display charset is us-ascii.
The patch also changes the behaviour of conv_entity() to convert left/right quotes and some dashes because named entities are needed for the new code for the q tag.
A few test cases are included.