Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

timbl · 2014-12-26T10:40:46Z

The disappearance of web material and the rotting of links is itself a major problem. This finding currently encouraging the wholesale breaking of links by moving from http: to https: , perhaps doing more damage to the web than any other change in its history.

The TAG can't publish this finding without addressing this issue. Things which need to be discussed could include

Actually dropping the 's' (which was possibly originally a mistake) but using encryption for all HTTP
Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;
Recommendation for web client libraries to find such information and treat it automatically.
etc

Search engines which trawl the whole web s and not s anyway will be able to figure out easily where the s makes no difference, but general application code and libraries won't unless it is codified.

mikewest · 2014-12-26T12:05:31Z

"Dropping the 's'" would have the very bad side-effect of making the origin of secure and insecure sites indistinguishable (e.g. the secure site's storage would be accessible to the insecure site). I don't believe that's a reasonable option on today's internet.
This is at least partially supported (by search engines) via <link rel='canonical' ...> for sites which choose to continue serving both insecure and secure versions. For sites which choose to serve only a secure version (which I suggest that the TAG recommend), this can be done explicitly via permanent redirects, right?

espadrine · 2014-12-26T12:18:12Z

Client libraries can and should try https whenever http requests fail. However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.

philipn · 2014-12-26T12:43:01Z

@timbl I'm a bit confused here. Everyone I've seen who's transitioned to HTTPS in the past several years has properly 301 redirected from http:// to https://. Could you elaborate on what you mean by link-breaking? Are you seeing lots of organizations not 301 redirecting?

mnot · 2014-12-26T12:51:38Z

also Strict Transport Security.

wycats · 2014-12-26T16:50:12Z

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

reschke · 2014-12-26T22:02:25Z

301 doesn't work well here because it allows UAs to rewrite POST to GET when following the redirect. 308 might work though.

awwright · 2014-12-29T16:58:46Z

Developing HTTP server that produces and consumes RDF resources, and performing authentication over TLS, I've found the extra s in the URI scheme particularly troublesome, actually getting in the way of producing a secure application.

Particularly, my HTTP server uses http: URIs internally (for production of new URIs, but more importantly, the database of historical ones), and it will answer queries in the form of GET / HTTP/1.1 and GET http://example.com/ HTTP/1.1 but not GET https://example.net/ HTTP/1.1 because TLS is something added one abstraction layer below.

If the HTTP server is accepting queries in plaintext, that could result in an unacceptable leaking of sensitive information from clients. And some resources just can't be renamed, so a 3xx redirect doesn't really solve any problem (if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

For applications that require long-lived identifiers, particularly http: identifiers, there really needs to be some way to say "http:, but over TLS." Even if for non-browser user-agents. It doesn't seem like HSTS quite does this, though that seems to be the closest solution.

domenic · 2014-12-29T17:32:36Z

(if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

This is false; please don't spread misinformation.

diracdeltas · 2014-12-29T18:05:48Z

@domenic: that might be true if by "flag" OP means "trigger mixed content warning/blocking" (when a fetched subresource is redirected). Both chrome and Firefox's active mixed content blocker fire before an http resource can be redirected to https; not sure if the passive mixed content warnings work similarly.

But in any case, this is irrelevant to links, which don't trigger mixed content blocking.

awwright · 2014-12-30T00:20:14Z

@domenic https://bugzilla.mozilla.org/show_bug.cgi?id=418354

I know this is correct because I just had to fix that bug in an application! Customers were being sent first to a legacy application that didn't go through a secure origin.

I suggest trying it yourself.

domenic · 2014-12-30T00:27:26Z

Yes, I didn't realize you were actually including insecure resources in your page; if that's the case it's certainly mixed content as usual.

timbl · 2015-01-02T19:49:29Z

On 2014-12 -26, at 15:18, Thaddee Tyl notifications@github.com wrote:

Client libraries can and should try https whenever http requests fail.

But what about web sites where the port 443 site is completely different site form the port 80 site?
Then the user will follow a link, get no error, and get erroneous information.
Do you want to make it retrospectively illegal?
It may be rare but there is no guarantee if you just add an 's' that what you get will have any relation at all to what you wanted.

However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.

—
Reply to this email directly or view it on GitHub.

timbl · 2015-01-02T19:49:29Z

On 2014-12 -26, at 19:50, Yehuda Katz notifications@github.com wrote:

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

They apply to one URI only.
I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,

—
Reply to this email directly or view it on GitHub.

mnot · 2015-01-03T03:13:23Z

I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,

That's HSTS, I think.

mnot · 2015-01-03T03:14:22Z

BTW, right now when I type a hostname into a browser without a scheme, it defaults to HTTP -- an interesting discussion could be had about if and when it should ever default to HTTPS. That's in-scope for WebAppSec, but probably not this doc.

ylafon · 2015-01-03T08:23:18Z

HSTS kicks in after the first redirect + proper response header for an https retrieval. There is a need for the first hit. (DNS was a possibility, but exporting too much metadata at that level is also an issue)

mnot · 2015-01-03T16:54:57Z

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

ylafon · 2015-01-03T20:15:32Z

Well, to avoid the first redirect, so the first unencrypted communication to the server. Even if everything after that will be encrypted (and that choice will be cached usually for monthes), the first hit might be revealing.

timbl · 2015-01-04T04:52:04Z

On 2015-01 -03, at 16:54, Mark Nottingham notifications@github.com wrote:

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Never say "just" before "round trips" :-)

In the 1980s one Brian Carpenter was a thought leader in terms of of protocols at CERN and told me memorably that only two things matter when it comes to network protocol performance, the time taken to copy the message, and the number of round trips. Nowadays I guess the two most important things are round trips and round trips ...

That's why I would like to see tables of round trip counts for all the options. Can someone please write this down from a knowledge of the protocols, or do we have to measure it with wire shark or something?!

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

The time it takes when you first use a given source or data is also important, a separate variable from the time you take on subsequent uses.

e.g. Suppose a viral social app had a very long first use time to first byte,that would likely decrease its likelihood of spreading virally as new users would follow another link while it loaded, and be lost.

I agree that the thing to optimize may be mainly subsequent uses, but one should not ignore first use completely.

timbl

director hat off

—
Reply to this email directly or view it on GitHub.

mnot · 2015-01-12T14:21:50Z

I think this is addressed in current text. Tim, if you have a concern, please reopen.

mnot added a commit that referenced this issue Dec 26, 2014

Mention link change costs and potential mitigations as per #11

5650922

mnot closed this as completed Jan 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

timbl commented Dec 26, 2014

mikewest commented Dec 26, 2014

espadrine commented Dec 26, 2014

philipn commented Dec 26, 2014

mnot commented Dec 26, 2014

wycats commented Dec 26, 2014

reschke commented Dec 26, 2014

awwright commented Dec 29, 2014

domenic commented Dec 29, 2014

diracdeltas commented Dec 29, 2014

awwright commented Dec 30, 2014

domenic commented Dec 30, 2014

timbl commented Jan 2, 2015

timbl commented Jan 2, 2015

mnot commented Jan 3, 2015

mnot commented Jan 3, 2015

ylafon commented Jan 3, 2015

mnot commented Jan 3, 2015

ylafon commented Jan 3, 2015

timbl commented Jan 4, 2015

mnot commented Jan 12, 2015

Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

Comments

timbl commented Dec 26, 2014

mikewest commented Dec 26, 2014

espadrine commented Dec 26, 2014

philipn commented Dec 26, 2014

mnot commented Dec 26, 2014

wycats commented Dec 26, 2014

reschke commented Dec 26, 2014

awwright commented Dec 29, 2014

domenic commented Dec 29, 2014

diracdeltas commented Dec 29, 2014

awwright commented Dec 30, 2014

domenic commented Dec 30, 2014

timbl commented Jan 2, 2015

timbl commented Jan 2, 2015

mnot commented Jan 3, 2015

mnot commented Jan 3, 2015

ylafon commented Jan 3, 2015

mnot commented Jan 3, 2015

ylafon commented Jan 3, 2015

timbl commented Jan 4, 2015

mnot commented Jan 12, 2015