New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-normatively visualize the indexes #89
Conversation
Oops. I forgot to actually call my |
The reason Travis fails is because the visualization resources are not passing the HTML checker. Some are not NFC and some use code points U+0080 to U+009F. We'll either need to exclude them from HTML checking or address that somehow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits. Looks good to me overall. I wonder if @inexorabletash would also be willing to review.
@@ -68,6 +68,8 @@ if [ $BRANCH != "master" ] ; then | |||
> $BRANCH_DIR/index.html; | |||
cp *.txt $BRANCH_DIR/; | |||
cp *.json $BRANCH_DIR/; | |||
cp *.css $BRANCH_DIR/; | |||
python visualize.py $BRANCH_DIR/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it intentional to skip commit snapshots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I just didn't understand the purpose of that directory correctly.
@@ -18,6 +18,7 @@ Markup Shorthands: css off | |||
Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions | |||
</pre> | |||
|
|||
<style>@import 'visualization-colors.css';</style> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Tab we can use <link>
and Bikeshed will move it correctly in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
@@ -653,33 +654,66 @@ changed, so has the <a>index</a>. | |||
<var>code point</var> in <var>index</var>, or null if | |||
<var>code point</var> is not in <var>index</var>. | |||
|
|||
<div class=note id=visualization> | |||
<p>There is a non-normative visualization for each index other than index gb18030 ranges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is inside a <div>
it needs to be indented by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also xref the indexes here I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
<p>The legend for the visualizations is: | ||
<ul class=visualizationlegend> | ||
<li class=unmapped>Unmapped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have too much indentation. <div>
+ <ul>
means two spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
<tr> | ||
<td><dfn export>index Big5</dfn> | ||
<td><a href=index-big5.txt>index-big5.txt</a> | ||
<td><a href=big5.html>Visualization</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a title here saying "index Big5 visualization". Even better would be to make each link text unique somehow since that affects screen readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the latter.
<td>This matches the KS X 1001 standard and the Unified Hangul Code, more | ||
commonly known together as Windows Codepage 949. | ||
commonly known together as Windows Codepage 949. This index covers the Hangul Syllables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/This index/It/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Prepended with space.
Replaced with SVG. |
|
||
/* Common styling from standard.css */ | ||
h1 { | ||
color: ##3c790a; /* WHATWG Green */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed in the latest push.
# HTML prohibits C1 controls | ||
# TODO draw some fancy SVG hex inside the square | ||
return "<svg><rect x=1 y=1 width=14 height=14 stroke=black stroke-width=2 fill=none /></svg>" | ||
as_str = unichr(code_point) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get:
Traceback (most recent call last):
File "visualize.py", line 258, in <module>
format_index(name, row_length, lang, bmp, duplicates, byte_rule)
File "visualize.py", line 206, in format_index
out_file.write((u"<td class='%s %s%s%s' aria-label='%s'><dl><dt>%d<dd lang=%s>%s<dd>U+%04X</dl>" % ("contiguous" if contiguous else "discontiguous", classify(code_point), " duplicate" if duplicate else "", check_compatibility(code_point), aria(code_point, contiguous, duplicate), pointer, lang, format_code_point(code_point), code_point)).encode("utf-8"))
File "visualize.py", line 124, in format_code_point
as_str = unichr(code_point)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
...with Python 2.7.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python's sad Unicode support strikes again. How did you get your copy of Python? Apple?
The script works on Python 2.7.12 shipped on Ubuntu 16.04. Debian and, therefore, Ubuntu ships wide Python. It seems that Travis also runs Ubuntu, so it should be OK there.
@annevk, Is making the script work on e.g. Apple-shipped Python a blocker for merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got it from Apple, but I don't really know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apple and some OS X package managers ship narrow builds. If the issue is literally just unichr
, then it is at least easily worked around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the issue is literally just
unichr
, then it is at least easily worked around.
Yeah, I guess tomorrow I'll be writing the wrapper (Unicode string for code point regardless of 16 vs. 32-bit units) that Python should have had the good sense to put in the standard library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now works with Apple-shipped Python, too. (Actually tested on a Mac and diffed the output with Ubuntu's wide Python output.)
Builds for me now for |
https://travis-ci.org/whatwg/encoding/builds/193654621 still reports PUA and NFC warnings. I think the NFC warnings we have to accept per conversation on IRC. Does the same go for PUA? It would be good to mention in the commit message what is expected there. |
One other problem I noticed is that the wrapping is not following 100 columns which is what we're starting to adopt everywhere (though no linebreaks inside "inline" elements). I'm happy to fix that once you consider this ready. |
As noted on IRC, yes. For example, on Ubuntu, showing PUA as PUA gives the insight that the PUA code points in the left part of the last row of gb18030 show up as radicals that fit reasonably between the non-PUA radical code points around them. Then one can go read Lunde and learn why this is so: the PUA radicals weren't mapped in Unicode at the time gb18030 was first specced.
I think this is ready. Thanks. |
Okay, I'll go fix that. Could you write a commit message that explains the change and the warnings? Just post it as a comment and I'll use it when squash and merging. |
Add visualizations for the indexes PUA and NFC validation warnings are expected. Exposing the PUA code points as PUA in HTML is useful for seeing what those code points map to (if to anything) in system fonts, which may give insights about their usage. Exposing singletons (compatibility ideographs and scientific units) without normalizing them is useful for having browser "Find" functionality match whatever you get as output of a converter you might be developing. |
Thanks @hsivonen, they look great and will surely help folks understand how these encodings work! |
May I suggest a WHATWG blog post explaining this space and the new visualizations to people? :) |
Closes #78.
r? @annevk