Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow publication of documents with HTTPS current/previous version links. #282

Closed
mikewest opened this issue Apr 25, 2016 · 42 comments
Closed
Assignees

Comments

@mikewest
Copy link
Member

As a tiny, tiny step towards #145, it would be lovely to allow publications of documents whose "This version:" and "Previous version:" links are HTTPS as opposed to HTTP.

@mikewest
Copy link
Member Author

/cc @plehegar @deniak @tripu

mikewest added a commit to w3c/webappsec-csp that referenced this issue Apr 25, 2016
@mikewest
Copy link
Member Author

Ping? ~2 weeks ago, @plehegar suggested the following in #pub:

"""
the way I think things at the moment is two possibilities:
- either we require https
- or we simply forbid http and let authors decide whether to use absolute or relative
"""

Implementing either of those suggestions would preclude the current behavior of outright rejecting links to https://www.w3.org/. Would it be possible to at least remove that restriction while we discuss whether or not to mandate some other behavior? :)

@plehegar
Copy link
Member

We're making progress but still have some issues: we're looking at implications in tr.rdf and the w3c api. I think we can do 1 or 2 as I proposed on irc but before we can do it, we have to figure how to represent that in the data, not just the document, otherwise we'll break tools like specref.

@plehegar
Copy link
Member

plehegar commented Apr 27, 2016

Let's imagine that HTML 5.1 gets published in June, July and August 2016
and that we do the switch for https-only on July 1, 2016. The current idea
is to obtain the following:

HTML document published in /TR

June 2016 document:

 This version:
   http://www.w3.org/TR/2016/WD-html51-20160602/
 Previous Version:
   http://www.w3.org/TR/2016/WD-html51-20160502/
 Latest version:
   http://www.w3.org/TR/html51/

July 2016 document:

 This version:
   https://www.w3.org/TR/2016/WD-html51-20160717/
 Previous version:
   https://www.w3.org/TR/2016/WD-html51-20160602/ (or http://...)
 Latest version:
   https://www.w3.org/TR/html51/

August 2016 document:

 This version:
   https://www.w3.org/TR/2016/WD-html51-20160817/
 Previous Version:
   https://www.w3.org/TR/2016/WD-html51-20160717/
 Latest version:
   https://www.w3.org/TR/html51/

tr.rdf representation of those documents:

   <!-- W3C server redirects from http to https, so let's represent this
     equivalence here -->
   <rdf:Description rdf:about="http://www.w3.org/TR/html51/">
     <sameWorkAs rdf:resource="https://www.w3.org/TR/html51/"/>
   </rdf:Description>

   <WD rdf:about="http://www.w3.org/TR/2016/WD-html51-20160602/">
     <dc:date>2017-06-02</dc:date>
     <dc:title>HTML 5.1</dc:title>
     <doc:obsoletes 
rdf:resource="http://www.w3.org/TR/2016/WD-html51-20160502/"/>
     <doc:versionOf rdf:resource="http://www.w3.org/TR/html51/"/>
     <patentRules 
rdf:resource="https://www.w3.org/Consortium/Patent-Policy-20040205/"/>
   </WD>

   <WD rdf:about="https://www.w3.org/TR/2016/WD-html51-20160717/">
     <dc:date>2017-07-17</dc:date>
     <dc:title>HTML 5.1</dc:title>
     <!-- we force http:// for all *dated* URI published before the 
switch -->
     <doc:obsoletes 
rdf:resource="http://www.w3.org/TR/2016/WD-html51-20160602/"/>
     <doc:versionOf rdf:resource="https://www.w3.org/TR/html51/"/>
     <patentRules 
rdf:resource="https://www.w3.org/Consortium/Patent-Policy-20040205/"/>
   </WD>

   <WD rdf:about="https://www.w3.org/TR/2016/WD-html51-20160817/">
     <dc:date>2017-08-17</dc:date>
     <dc:title>HTML 5.1</dc:title>
     <doc:obsoletes 
rdf:resource="https://www.w3.org/TR/2016/WD-html51-20160717/"/>
     <doc:versionOf rdf:resource="https://www.w3.org/TR/html51/"/>
     <patentRules 
rdf:resource="http://www.w3.org/Consortium/Patent-Policy-20040205/"/>
   </WD>

w3c api:

Before the switch:

     "_embedded": {
         "version-history": [
             {
                 "status": "Working Draft",
                 "uri": 
"http:\/\/www.w3.org\/TR\/2016\/WD-html51-20160602\/",
                 "date": "2016-06-02",
                 "informative": false,
                 "title": "HTML 5.1",
                 "shortlink": "http:\/\/www.w3.org\/TR\/html51\/",
                 "editor-draft": "https:\/\/w3c.github.io\/html\/",
                 "process-rules": 
"http:\/\/www.w3.org\/2015\/Process-20150901\/",
             }, ... ]

After the switch: we can't represent the equivalence between
http://www.w3.org/TR/html51/ and https://www.w3.org/TR/html51/ so we
make the choice of using https. This could break things but the API
isn't largely used yet so we might get away with it. Most critical would
be that we might break pheme (and consequently ash-nazg...).

     "_embedded": {
         "version-history": [
             {
                 "status": "Working Draft",
                 "uri": 
"http:\/\/www.w3.org\/TR\/2016\/WD-html51-20160602\/",
                 "date": "2016-06-02",
                 "title": "HTML 5.1",
                 "shortlink": "https:\/\/www.w3.org\/TR\/html51\/",
                 "process-rules": 
"https:\/\/www.w3.org\/2015\/Process-20150901\/",
             },
             {
                 "status": "Working Draft",
                 "uri": 
"https:\/\/www.w3.org\/TR\/2016\/WD-html51-20160717\/",
                 "date": "2016-07-17",
                 "title": "HTML 5.1",
                 "shortlink": "https:\/\/www.w3.org\/TR\/html51\/",
                 "process-rules": 
"https:\/\/www.w3.org\/2015\/Process-20150901\/",
             },
             {
                 "status": "Working Draft",
                 "uri": 
"https:\/\/www.w3.org\/TR\/2016\/WD-html51-20160817\/",
                 "date": "2016-08-17",
                 "title": "HTML 5.1",
                 "shortlink": "https:\/\/www.w3.org\/TR\/html51\/",
                 "process-rules": 
"https:\/\/www.w3.org\/2015\/Process-20150901\/",
             }, ... ]

@plehegar plehegar self-assigned this Apr 27, 2016
@plehegar
Copy link
Member

plehegar commented May 4, 2016

I circulated the proposal internally and didn't get pushback. So, my next step is to send the proposal to spec-prod and chairs. I ought to get to this this week. If you don't see anything from him, feel free to poke me in the eye until I do it.

@plehegar
Copy link
Member

plehegar commented May 9, 2016

Proposal was sent to spec-prod:
https://lists.w3.org/Archives/Public/spec-prod/2016AprJun/0024.html

@marcoscaceres
Copy link
Member

ReSpec has a PR in waiting to support this.

@mikewest
Copy link
Member Author

mikewest commented Jun 2, 2016

@plehegar: Can you update the status here? I see that there's been some discussion on that thread, but the conclusion isn't at all clear to me.

This blocked @estark37 from publishing a draft earlier in the week, and I can only imagine that it's biting other folks as well, given that HTTPS is fairly accepted as best practice.

@marcoscaceres
Copy link
Member

I'm also waiting on an Ok to merge in ReSpec from @plehegar: I'm worried that if I merge Echinda and PubRules will cry because of the new URLs.

@plehegar
Copy link
Member

plehegar commented Jun 2, 2016

I need to catch up with Tobie. As far as we know, he is the long pole at the moment in making the switch. If we can get specref.org to digest the https switch, then we're good to go. If we can't, they doing the switch means that we would break specref, and breaks respec and bikeshed subsequently. See also
https://lists.w3.org/Archives/Public/spec-prod/2016AprJun/0028.html

@mikewest
Copy link
Member Author

mikewest commented Jun 2, 2016

@tobie: What can we do to help out? Has anything happened since your comments on that thread a month ago?

@plehegar
Copy link
Member

plehegar commented Jun 2, 2016

To be fair with @tobie, I didn't follow up well enough with him. Part is because I got confused what he was asking for. The only way to create data before the switch is to create fake data as far as I know and he was pushing back on this idea.

@tobie
Copy link
Member

tobie commented Jun 2, 2016

My initial request was for a grace period during which both https and http would coexist so as to give me enough time to figure out what impact the change had.

If that's impossible to provide, the other option is to run a frozen version of Specref until I figure out the consequences of these changes.

That later solution does imply, however, that Specref will be out of date for a while. And given I work on this on my free time, and will be OoO for part of the summer, I can't really commit to a date at which I can do the transition.

Frankly, specref code isn't the highest quality software available, and while the transition could literally take less than 2 hours, it might also be much more painful than that; I had to backpedal out of a PR recently because it broke things in weird and unexpected ways. I wouldn't be surprised finding lots of similar issues here.

@marcoscaceres
Copy link
Member

On 3 Jun 2016, at 3:16 AM, Tobie Langel notifications@github.com wrote:

My initial request was for a grace period during which both https and http would coexist so as to give me enough time to figure out what impact the change had.

If that's impossible to provide, the other option is to run a frozen version of Specref until I figure out the consequences of these changes.

That later solution does imply, however, that Specref will be out of date for a while. And given I work on this on my free time, and will be OoO for part of the summer, I can't really commit to a date at which I can do the transition.

Frankly, specref code isn't the highest quality software available, and while the transition could literally take less than 2 hours, it might also be much more painful than that; I had to backpedal out of a PR recently because it broke things in weird and unexpected ways. I wouldn't be surprised finding lots of similar issues here.

If it would be of help, I'm available next week to assist in whatever way I can. I don't fully understand the problem at this point, but happy to set time aside to see if we can come up with a strategy or try some things.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@tobie
Copy link
Member

tobie commented Jun 2, 2016

I don't fully understand the problem at this point,

Me neither. That's precisely the problem.

@plehegar
Copy link
Member

plehegar commented Jun 2, 2016

which part of the problem is obscure? I tried to be as precise as possible in how the change would be implemented. I'm happy to clarify things if it helps.

@tobie
Copy link
Member

tobie commented Jun 2, 2016

which part of the problem is obscure?

How Specref will react to these changes is what's obscure.

@plehegar
Copy link
Member

so, do we know what to do here? As far as I can tell, W3C can either switch to https and break specref.org (and bikeshed and respec as a consequence), or we should delay the switch until a solution for specref.org is found...

@tobie
Copy link
Member

tobie commented Jun 14, 2016

Well, as I said, if the preferred solution of publishing the new (https-aware) rdf file in parallel with the old one isn't possible, the next best solution is for me to freeze specref updates until I figure out how to handle the new rdf format.

@tobie
Copy link
Member

tobie commented Jun 14, 2016

What I need for this 2nd solution is a schedule + ideally an on-call person on your side so we can fix possible rdf bugs quickly.

@plehegar
Copy link
Member

plehegar commented Jun 14, 2016

[Tobie and I caught up on irc]

Here is the solution to do the switch sooner rather than later:
1- provide a separate tr rdf file, like tr-http.rdf or something. It won't be authoritative but will allow specref.org to keep working until Tobie finds time to upgrade it. The sooner W3C can provide this, the better.
2- on July 1st, do the https switch in tr.rdf as documented previously.

if we can't do the solution above, we're probably looking at moving the switch to August 1st instead.

@tobie
Copy link
Member

tobie commented Jun 17, 2016

Any news? /cc @deniak

@deniak
Copy link
Member

deniak commented Jun 17, 2016

@tobie I will provide the new rdf beginning of next week. I talked to @plehegar and we are targetting Aug 1st for the switch.

@tobie
Copy link
Member

tobie commented Jun 17, 2016

Sounds good. By new RDF, do you mean the one containing https links or the current one on a new URL?

@deniak
Copy link
Member

deniak commented Jun 20, 2016

@tobie you can find the experimental rdf at https://www.w3.org/2002/01/tr-automation/tr-https.rdf. For testing purpose, I only moved the specs published after May 31 to https.

@deniak
Copy link
Member

deniak commented Jun 20, 2016

Moving earlier discussions on GH:

On 06/20/2016 04:25 PM, Tobie Langel wrote:

Thanks a bunch. See inline comments.

On Mon, 20 Jun 2016, at 12:54, Denis Ah-Kang wrote:

Tobie, Philippe,

As agreed [1], here's the experimental rdf:
https://www.w3.org/2002/01/tr-automation/tr-https.rdf

For testing purposes, I tweaked the code so all the documents published
after May 31 are under https (eg.
https://www.w3.org/TR/2016/WD-audio-output-20160601/).
A few things you need to pay attention to:

  • the previous versions are under http as they were previously

Is this going to change? I.e., are all links going to be https when you
flip the bit?

No, after the switch, the WG will first have to republish their specs
with:

  • the "This Version" under https
  • the "Previous Version" under http (because it was already published
    this way)

Then, the next versions of the spec will have both links under https.

This comes from a request from the semweb guys who want to be able to
dereference the URIs and they see http://www.w3.org and
https://www.w3.org as 2 different URIs.

published with http. However, the next versions will have the
doc:Obsoletes under https.

  • the latest versions are under https
  • the WGs homepage links are the ones from
    http://www.w3.org/Member/Mail. Some WGs still have their homepage under
    http. If the WG updates
    its homepage to https, it'll replicated onto the rdf.
  • patentRules, errata, translations should be under https
  • starting from line 21043, you will see the sameWorkAs between
    the http and https versions of each spec.

That rdf is here only for testing purposes, that's why I'm not
sharing that link publicly (on GitHub) as I will drop it later

This is such a weird way of doing things. Why not do this in the open
and state this link will change or just 301 it to the main file post
transition period.

You have a point. I was just being too lazy and wanted to avoid adding
one more redirect. OK to share publicly.

but
feel free to share with the people who could be interested in it.
It's also not complete because we don't have any specs having a previous
version with https and unless we have fake data, it's not possible to
represent that use case yet.
Let me know if there's any problem.

I'll probably have more questions once I start working on this.

Thanks again,

--tobie

@tobie
Copy link
Member

tobie commented Jun 20, 2016

Thanks for moving the conversation on a public forum. Much appreciated.

the previous versions are under http as they were previously
Is this going to change? I.e., are all links going to be https when you
flip the bit?

No, after the switch, the WG will first have to republish their specs
with:

  • the "This Version" under https
  • the "Previous Version" under http (because it was already published this way)
    Then, the next versions of the spec will have both links under https.

This comes from a request from the semweb guys who want to be able to
dereference the URIs and they see http://www.w3.org and
https://www.w3.org as 2 different URIs.

OK. I'm not sure I'm following.

Couple of questions:

  1. Will https://xxx -301-> http://xxx?
  2. Will previous specs be available both over http and https?
  3. Will newer specs be available only over https and not over http?

@tobie
Copy link
Member

tobie commented Jun 20, 2016

For context, here, I take a pragmatic approach to this issue, not a (technically pure) semweb one.

As far as I'm concerned, the content on both http and https versions of the spec should be the same (with the possible exception of internal links which might have matching protocols for absolute URLs). So Specref internals will consider it as such (that's what I have to fix, basically, to make sure that URL-based comparisons are protocol-agnostic).

Ultimately, I'd like Specref to expose https URLs for all W3C specs regardless of when said specs will have been published.

I feel like this publish-date-based role-out of https is strange and potentially prone to creating issues over a substantial amount of time. So I have two follow-up questions:

  1. Are http://www.w3.org/2002/01/tr-automation/tr.rdf and https://www.w3.org/2002/01/tr-automation/tr.rdf going to serve the same content or is the latter going to have all links be https?
  2. Is the JSON API moving towards all https or going through this same, publish-date-based system?

@deniak
Copy link
Member

deniak commented Jun 20, 2016

To answer your questions, all the specs under /TR are and will remain available on http and https. This is the case since our recent HSTS support. You will be redirected only if your browser supports HSTS/CSP.

Now, the problem is how to represent the data and make everyone happy. We cannot simply update all the URIs to https even if that's what the user actually sees if he's using a recent browser. Take a look at that blog post @philarcher1 wrote to explain why we can't just update the scheme.

@tobie
Copy link
Member

tobie commented Jun 20, 2016

I'm kind of confused.

@philarcher1's post you link above explicitly says W3C is treating resources on www.w3.org as identical regardless of their protocol:

Firstly, is the community agreed that if two URIs differ only in the scheme (http://, https:// and perhaps whatever comes in future) then they identify the same resource? We believe that this can only be asserted by the domain owner. In the specific case of http://www.w3.org/* we do make that assertion.

So I guess I don't understand why you're not switching all of the URLs within the APIs to https. (Note I'm not suggesting you actually modify the specs themselves.)

@tobie
Copy link
Member

tobie commented Jun 20, 2016

To answer your questions, all the specs under /TR are and will remain available on http and https. This is the case since our recent HSTS support.

Note this link requires a member account.

@deniak
Copy link
Member

deniak commented Jun 20, 2016

The blog post suggests to keep using http:

In short, keep writing “http:” and trust that the infrastructure will quietly switch over to TLS (https) whenever both client and server can handle it. Meanwhile, let’s try to get SemWeb software to be doing TLS+UIR+HSTS and be as secure as modern browsers.

On the other hand, some editors are pushing towards https (topic of that issue) so we need to find a way to make both worlds happy.

@deniak
Copy link
Member

deniak commented Jun 20, 2016

To answer your questions, all the specs under /TR are and will remain available on http and https. This is the case since our recent HSTS support.

Note this link requires a member account.

Oups, sorry. You can find the public post there: https://www.w3.org/blog/news/archives/5263

@tobie
Copy link
Member

tobie commented Jun 20, 2016

On the other hand, some editors are pushing towards https (topic of that issue) so we need to find a way to make both worlds happy.

So one one hand, you have people arguing to use http and have the server/browser combo upgrade to https if they can, and on the other, you have people pushing for https everywhere. I get how that's an unfortunate position to be in. But I'm not sure how your solution to let https trickle down is going to make both worlds happy (I'm pretty sure it's going to piss off everyone instead 😃 ).

@tobie
Copy link
Member

tobie commented Jun 20, 2016

Just rolled-out a brute force solution to Specref (tobie/specref#286). Let's see if it works or if I have to roll everything back.

If it does, you can consider the Specref blocker resolved.

@tobie
Copy link
Member

tobie commented Jun 21, 2016

OK, Specref seems to be working properly so far serving only https links for W3C specs, so we should pretty much be good on the Specref side at this point.

@marcoscaceres
Copy link
Member

Woot woot!

@philarcher
Copy link

From my POV, the key thing is that the original http URIs still work.
As long as that's true, we should be OK. It doesn't prevent making new
links to https://...

The underlying issue is persistence of identifiers. No doubt this
conversation will be repeated in future when https is replaced by
something else, so today's solution shouldn't apply specifically to
today's technology.

Thanks for taking this seriously,

Phil.

On 21/06/2016 10:56, Marcos Cáceres wrote:

Woot woot!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#282 (comment)

Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

@mikewest
Copy link
Member Author

Thank you @tobie!

@deniak, @plehegar, et al: When you remove the requirement that "This version" and "Previous Version" links be insecure, could you also allow HTTPS for the following:

I can also file separate bugs for those, if that would be helpful. :)

@deniak
Copy link
Member

deniak commented Jun 21, 2016

For the 3 points above, this is inline with the proposal. We will even make https a requirement.

Regarding the WG homepage link, it's a bit tricky. Specberus checks that link against what's listed on https://www.w3.org/Member/Mail. As you can see, some WGs still have their homepage under http.
Short term solution is to update the rule and look for the homepage under both http and https.
In the future, it would be better to update all the WG homepages to https.

@deniak
Copy link
Member

deniak commented Aug 1, 2016

Starting from Aug 1, 2016, specberus requires https links.

@deniak deniak closed this as completed Aug 1, 2016
@mikewest
Copy link
Member Author

mikewest commented Aug 1, 2016

Thank you. :)

ryandel8834 added a commit to ryandel8834/WebAppSec-CSP that referenced this issue Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants