Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Windows drive letter handling in the file slash state #343

Merged
merged 5 commits into from
Sep 18, 2017

Conversation

rmisev
Copy link
Member

@rmisev rmisev commented Sep 11, 2017

  1. Introduces term: "a string starts with Windows drive letter". It simplifies the file state and solves the remaining problem (remaining variable ambiguity #308).
  2. Fixes Drive letters get duplicated when resolving Windows file: URL with base #303.

Preview | Diff

rmisev added a commit to rmisev/web-platform-tests that referenced this pull request Sep 11, 2017
Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just editorial feedback. I haven't reviewed the new behavior in detail. Did you look at various implementations?

url.bs Outdated
<a>Windows drive letter</a>.

<li><p>it contains two code points or the third code point is U+002F (/), U+005C (\), U+003F (?),
or U+0023 (#).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its length (xref Infra) is two*

url.bs Outdated

<ul>
<li><p>it contains at least two code points and the first two code points are
<a>Windows drive letter</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its length is greater than 1*

I'd give the second requirement its own bullet point since we already AND the whole list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, no bullet at the end here since the sentence isn't finished

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Windows drive letter*

url.bs Outdated
@@ -1068,6 +1068,33 @@ code point is U+003A (:).
<p class="note">As per the <a href=#url-writing>URL writing</a> section, only a
<a>normalized Windows drive letter</a> is conforming.

<p>A string <dfn>starts with Windows drive letter</dfn> if all of the following are true:

<ul>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use <ul class=brief> here without <p>s to make it more of a single paragraph.

url.bs Outdated
<table>
<tr>
<th>String
<th>Result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Starts with a Windows drive letter" instead of "Result" to make it more clear?

url.bs Outdated
@@ -1068,6 +1068,33 @@ code point is U+003A (:).
<p class="note">As per the <a href=#url-writing>URL writing</a> section, only a
<a>normalized Windows drive letter</a> is conforming.

<p>A string <dfn>starts with Windows drive letter</dfn> if all of the following are true:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a Windows drive letter*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, to allow alternative spelling later on use lt="start with a Windows drive letter|starts with a Windows drive letter" here.

url.bs Outdated
</ul>

<p>then set <var>url</var>'s <a for=url>host</a> to <var>base</var>'s <a for=url>host</a>,
<p>If the substring from <var>pointer</var> in the <var>input</var> does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for "the" in front of input I think. Do we need to say the substring from pointer onward? We also haven't really defined substring. Not sure if that's problematic.

url.bs Outdated
@@ -1878,7 +1897,8 @@ string <var>input</var>, optionally with a <a>base URL</a> <var>base</var>, opti
<ol>
<li>
<p>If <var>base</var> is non-null and <var>base</var>'s <a for=url>scheme</a> is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ and/,/

url.bs Outdated
@@ -1878,7 +1897,8 @@ string <var>input</var>, optionally with a <a>base URL</a> <var>base</var>, opti
<ol>
<li>
<p>If <var>base</var> is non-null and <var>base</var>'s <a for=url>scheme</a> is
"<code>file</code>", then:
"<code>file</code>" and the substring from <var>pointer</var> in the <var>input</var> does
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ and/, and/
s/in the/in/

@rmisev
Copy link
Member Author

rmisev commented Sep 13, 2017

Just editorial feedback. I haven't reviewed the new behavior in detail. Did you look at various implementations?

I tested this new behavior on patched https://github.com/jsdom/whatwg-url and it passes all the tests.
I can prepare PR.

Do we need to say the substring from pointer onward? We also haven't really defined substring. Not sure if that's problematic.

Yes. A substring isn't defined in this spec., but it's already used to define remaining.

Yet another approach is not to use a substring here, for example I created such function for testing:

const fileOtherwiseCodePoints = new Set([p("/"), p("\\"), p("?"), p("#")]);
 
function startsWithWindowsDriveLetter(input, pointer) {
  const length = input.length - pointer;
  return length >= 2 &&
    isWindowsDriveLetterCodePoints(input[pointer], input[pointer + 1]) &&
    (length === 2 || fileOtherwiseCodePoints.has(input[pointer + 2]));
}

@domenic
Copy link
Member

domenic commented Sep 13, 2017

My guess is @annevk was especially curious about browser implementations, although of course keeping jsdom/whatwg-url up to date is always appreciated.

@rmisev
Copy link
Member Author

rmisev commented Sep 13, 2017

I tested by hand (https://quuz.org/url/liveview2.html) the Chrome 63.0.3213.3 for Windows - it passes all four tests and Safari 10.1.2 - also passes the tests.

Other browser's results from https://travis-ci.org/w3c/web-platform-tests/builds/274332313

INFO: /home/travis/build/w3c/web-platform-tests/tools/ci/check_stability

Firefox Nightly:

Subtest Results Messages
Parsing: </c:/foo/bar> against <file:///c:/baz/qux> PASS
Parsing: </c|/foo/bar> against <file:///c:/baz/qux> PASS
Parsing: <file:\c:\foo\bar> against <file:///c:/baz/qux> PASS
Parsing: </c:/foo/bar> against <file://host/path> PASS

Chrome Unstable (google-chrome-unstable_current_amd64.deb)

Subtest Results Messages
Parsing: </c:/foo/bar> against <file:///c:/baz/qux> PASS
Parsing: </c|/foo/bar> against <file:///c:/baz/qux> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file:///c%7C/foo/bar"
Parsing: <file:\c:\foo\bar> against <file:///c:/baz/qux> PASS
Parsing: </c:/foo/bar> against <file://host/path> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file://host/c:/foo/bar"

Microsoft Edge 14.14393:

Subtest Results Messages
Parsing: </c:/foo/bar> against <file:///c:/baz/qux> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file:///c:/c:/foo/bar"
Parsing: </c|/foo/bar> against <file:///c:/baz/qux> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file:///c:/c%7C/foo/bar"
Parsing: <file:\c:\foo\bar> against <file:///c:/baz/qux> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file:///c:/c:/foo/bar"
Parsing: </c:/foo/bar> against <file://host/path> FAIL assert_equals: href expected "file:///c:/foo/bar" but got "file://host/c:/foo/bar"

@annevk
Copy link
Member

annevk commented Sep 13, 2017

Okay, I guess it's fine, but it seems a little weird to decide against Edge for behavior Microsoft created at some point. Reminds me of XMLHttpRequest.

@annevk
Copy link
Member

annevk commented Sep 13, 2017

I filed whatwg/infra#152 as a follow-up for the substring usage. No changes needed here.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone else want to review this? I'll probably merge this Monday (please ping me if forgot) if there are no further concerns.

@domenic
Copy link
Member

domenic commented Sep 13, 2017

Since @rmisev has done the tests and jsdom/whatwg-url work, my usual reviewing function is done, so I'm happy to have this without my review.

Copy link
Member

@GPHemsley GPHemsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts.

<td>❌
</table>
</div>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of using symbols to represent the values of the boolean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one is preferred: yes/no or true/false?

cc @annevk

url.bs Outdated
<ul class=brief>
<li>its <a for=string>length</a> is greater than or equal to 2
<li>its first two code points are a <a>Windows drive letter</a>
<li>its <a for=string>length</a> is 2 or the third code point is U+002F (/), U+005C (\),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say "its" instead of "the" for consistency.

(Though I feel like this whole phrasing is clunky somehow...)

<var>url</var>'s <a for=url>path</a> to a copy of <var>base</var>'s <a for=url>path</a>,
and then <a>shorten</a> <var>url</var>'s <a for=url>path</a>.
<p>If the substring from <var>pointer</var> in <var>input</var> does not
<a>start with a Windows drive letter</a>, then set <var>url</var>'s <a for=url>host</a> to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The substring from pointer in input" seems incomplete to me. (I assume this is intended to imply that the substring runs until the end of the string?)

What do we gain by removing reference to remaining?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the substring runs until the end of the string like in the JavaScript when indexEnd is omitted.
@annevk opened the issue whatwg/infra#152 to define substring usage.

"<code>file</code>", then:
<p>If <var>base</var> is non-null, <var>base</var>'s <a for=url>scheme</a> is
"<code>file</code>", and the substring from <var>pointer</var> in <var>input</var> does not
<a>start with a Windows drive letter</a>, then:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we gain by removing reference to remaining?

The simpler check, that there are at least two code points:

  • without remaining: substr.length >= 2
  • with remaining: c != EOF && remaining.length >= 1

annevk pushed a commit to web-platform-tests/wpt that referenced this pull request Sep 18, 2017
@annevk annevk merged commit 2eef975 into whatwg:master Sep 18, 2017
@annevk
Copy link
Member

annevk commented Sep 18, 2017

I missed the new comments by @GPHemsley, sorry. I think they're addressed to my satisfaction though. I think using emojis for true/false is reasonable in examples. We do that here and there.

@rmisev rmisev deleted the file-slash-state-fix branch September 18, 2017 11:25
bcoe pushed a commit to bcoe/node-1 that referenced this pull request Sep 21, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail, see:
whatwg/url#343
BridgeAR pushed a commit to nodejs/node that referenced this pull request Sep 24, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: #15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
domenic pushed a commit to jsdom/whatwg-url that referenced this pull request Sep 25, 2017
jasnell pushed a commit to nodejs/node that referenced this pull request Sep 25, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: #15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
addaleax pushed a commit to addaleax/ayo that referenced this pull request Sep 30, 2017
Address issue with Windows drive letter handling that was
causing es-module test suite to fail.

PR-URL: nodejs/node#15490
Ref: whatwg/url#343
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
rachelandrew pushed a commit to rachelandrew/web-platform-tests that referenced this pull request Nov 8, 2017
jakearchibald pushed a commit to jakearchibald/web-platform-tests that referenced this pull request Nov 16, 2017
rmisev added a commit to upa-url/upa that referenced this pull request May 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Drive letters get duplicated when resolving Windows file: URL with base
4 participants