Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-normatively visualize the indexes #89

Merged
merged 12 commits into from Jan 20, 2017
Merged

Non-normatively visualize the indexes #89

merged 12 commits into from Jan 20, 2017

Conversation

hsivonen
Copy link
Member

Closes #78.

r? @annevk

@hsivonen
Copy link
Member Author

Oops. I forgot to actually call my aria() function.

@annevk
Copy link
Member

annevk commented Jan 17, 2017

The reason Travis fails is because the visualization resources are not passing the HTML checker. Some are not NFC and some use code points U+0080 to U+009F. We'll either need to exclude them from HTML checking or address that somehow.

annevk
annevk previously requested changes Jan 17, 2017
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits. Looks good to me overall. I wonder if @inexorabletash would also be willing to review.

@@ -68,6 +68,8 @@ if [ $BRANCH != "master" ] ; then
> $BRANCH_DIR/index.html;
cp *.txt $BRANCH_DIR/;
cp *.json $BRANCH_DIR/;
cp *.css $BRANCH_DIR/;
python visualize.py $BRANCH_DIR/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it intentional to skip commit snapshots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I just didn't understand the purpose of that directory correctly.

@@ -18,6 +18,7 @@ Markup Shorthands: css off
Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions
</pre>

<style>@import 'visualization-colors.css';</style>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Tab we can use <link> and Bikeshed will move it correctly in the output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@@ -653,33 +654,66 @@ changed, so has the <a>index</a>.
<var>code point</var> in <var>index</var>, or null if
<var>code point</var> is not in <var>index</var>.

<div class=note id=visualization>
<p>There is a non-normative visualization for each index other than index gb18030 ranges.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is inside a <div> it needs to be indented by one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also xref the indexes here I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


<p>The legend for the visualizations is:
<ul class=visualizationlegend>
<li class=unmapped>Unmapped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have too much indentation. <div> + <ul> means two spaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

<tr>
<td><dfn export>index Big5</dfn>
<td><a href=index-big5.txt>index-big5.txt</a>
<td><a href=big5.html>Visualization</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a title here saying "index Big5 visualization". Even better would be to make each link text unique somehow since that affects screen readers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the latter.

<td>This matches the KS X 1001 standard and the Unified Hangul Code, more
commonly known together as Windows Codepage 949.
commonly known together as Windows Codepage 949. This index covers the Hangul Syllables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/This index/It/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@hsivonen
Copy link
Member Author

Some are not NFC

Prepended with space.

and some use code points U+0080 to U+009F.

Replaced with SVG.

@hsivonen hsivonen dismissed annevk’s stale review January 17, 2017 12:00

Addressed all review comments.


/* Common styling from standard.css */
h1 {
color: ##3c790a; /* WHATWG Green */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double #

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in the latest push.

# HTML prohibits C1 controls
# TODO draw some fancy SVG hex inside the square
return "<svg><rect x=1 y=1 width=14 height=14 stroke=black stroke-width=2 fill=none /></svg>"
as_str = unichr(code_point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get:

Traceback (most recent call last):
  File "visualize.py", line 258, in <module>
    format_index(name, row_length, lang, bmp, duplicates, byte_rule)
  File "visualize.py", line 206, in format_index
    out_file.write((u"<td class='%s %s%s%s' aria-label='%s'><dl><dt>%d<dd lang=%s>%s<dd>U+%04X</dl>" % ("contiguous" if contiguous else "discontiguous", classify(code_point), " duplicate" if duplicate else "", check_compatibility(code_point), aria(code_point, contiguous, duplicate), pointer, lang, format_code_point(code_point), code_point)).encode("utf-8"))
  File "visualize.py", line 124, in format_code_point
    as_str = unichr(code_point)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)

...with Python 2.7.10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's sad Unicode support strikes again. How did you get your copy of Python? Apple?

The script works on Python 2.7.12 shipped on Ubuntu 16.04. Debian and, therefore, Ubuntu ships wide Python. It seems that Travis also runs Ubuntu, so it should be OK there.

@annevk, Is making the script work on e.g. Apple-shipped Python a blocker for merging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got it from Apple, but I don't really know.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apple and some OS X package managers ship narrow builds. If the issue is literally just unichr, then it is at least easily worked around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the issue is literally just unichr, then it is at least easily worked around.

Yeah, I guess tomorrow I'll be writing the wrapper (Unicode string for code point regardless of 16 vs. 32-bit units) that Python should have had the good sense to put in the standard library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It now works with Apple-shipped Python, too. (Actually tested on a Mac and diffed the output with Ubuntu's wide Python output.)

@zcorpan
Copy link
Member

zcorpan commented Jan 18, 2017

Builds for me now for ./deploy.sh --local. 👍

@annevk
Copy link
Member

annevk commented Jan 20, 2017

https://travis-ci.org/whatwg/encoding/builds/193654621 still reports PUA and NFC warnings. I think the NFC warnings we have to accept per conversation on IRC. Does the same go for PUA? It would be good to mention in the commit message what is expected there.

@annevk
Copy link
Member

annevk commented Jan 20, 2017

One other problem I noticed is that the wrapping is not following 100 columns which is what we're starting to adopt everywhere (though no linebreaks inside "inline" elements). I'm happy to fix that once you consider this ready.

@hsivonen
Copy link
Member Author

Does the same go for PUA?

As noted on IRC, yes. For example, on Ubuntu, showing PUA as PUA gives the insight that the PUA code points in the left part of the last row of gb18030 show up as radicals that fit reasonably between the non-PUA radical code points around them. Then one can go read Lunde and learn why this is so: the PUA radicals weren't mapped in Unicode at the time gb18030 was first specced.

One other problem I noticed is that the wrapping is not following 100 columns which is what we're starting to adopt everywhere (though no linebreaks inside "inline" elements). I'm happy to fix that once you consider this ready.

I think this is ready. Thanks.

@annevk
Copy link
Member

annevk commented Jan 20, 2017

Okay, I'll go fix that. Could you write a commit message that explains the change and the warnings? Just post it as a comment and I'll use it when squash and merging.

@hsivonen
Copy link
Member Author

hsivonen commented Jan 20, 2017

Add visualizations for the indexes

PUA and NFC validation warnings are expected. Exposing the PUA code points as PUA in HTML is useful for seeing what those code points map to (if to anything) in system fonts, which may give insights about their usage. Exposing singletons (compatibility ideographs and scientific units) without normalizing them is useful for having browser "Find" functionality match whatever you get as output of a converter you might be developing.

@annevk annevk merged commit 7696b7a into master Jan 20, 2017
@annevk annevk deleted the visualization branch January 20, 2017 10:46
@annevk
Copy link
Member

annevk commented Jan 20, 2017

Thanks @hsivonen, they look great and will surely help folks understand how these encodings work!

@domenic
Copy link
Member

domenic commented Jan 20, 2017

May I suggest a WHATWG blog post explaining this space and the new visualizations to people? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants