New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link Breakage: Address that moving to HTTPS URLs breaks links wholesale #11

Closed
timbl opened this Issue Dec 26, 2014 · 20 comments

Comments

Projects
None yet
@timbl
Member

timbl commented Dec 26, 2014

The disappearance of web material and the rotting of links is itself a major problem. This finding currently encouraging the wholesale breaking of links by moving from http: to https: , perhaps doing more damage to the web than any other change in its history.

The TAG can't publish this finding without addressing this issue. Things which need to be discussed could include

  • Actually dropping the 's' (which was possibly originally a mistake) but using encryption for all HTTP
  • Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;
  • Recommendation for web client libraries to find such information and treat it automatically.
    etc

Search engines which trawl the whole web s and not s anyway will be able to figure out easily where the s makes no difference, but general application code and libraries won't unless it is codified.

@mikewest

This comment has been minimized.

Show comment
Hide comment
@mikewest

mikewest Dec 26, 2014

Contributor
  • "Dropping the 's'" would have the very bad side-effect of making the origin of secure and insecure sites indistinguishable (e.g. the secure site's storage would be accessible to the insecure site). I don't believe that's a reasonable option on today's internet.
  • This is at least partially supported (by search engines) via <link rel='canonical' ...> for sites which choose to continue serving both insecure and secure versions. For sites which choose to serve only a secure version (which I suggest that the TAG recommend), this can be done explicitly via permanent redirects, right?
Contributor

mikewest commented Dec 26, 2014

  • "Dropping the 's'" would have the very bad side-effect of making the origin of secure and insecure sites indistinguishable (e.g. the secure site's storage would be accessible to the insecure site). I don't believe that's a reasonable option on today's internet.
  • This is at least partially supported (by search engines) via <link rel='canonical' ...> for sites which choose to continue serving both insecure and secure versions. For sites which choose to serve only a secure version (which I suggest that the TAG recommend), this can be done explicitly via permanent redirects, right?
@espadrine

This comment has been minimized.

Show comment
Hide comment
@espadrine

espadrine Dec 26, 2014

Client libraries can and should try https whenever http requests fail. However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.

espadrine commented Dec 26, 2014

Client libraries can and should try https whenever http requests fail. However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.

@philipn

This comment has been minimized.

Show comment
Hide comment
@philipn

philipn Dec 26, 2014

@timbl I'm a bit confused here. Everyone I've seen who's transitioned to HTTPS in the past several years has properly 301 redirected from http:// to https://. Could you elaborate on what you mean by link-breaking? Are you seeing lots of organizations not 301 redirecting?

philipn commented Dec 26, 2014

@timbl I'm a bit confused here. Everyone I've seen who's transitioned to HTTPS in the past several years has properly 301 redirected from http:// to https://. Could you elaborate on what you mean by link-breaking? Are you seeing lots of organizations not 301 redirecting?

@mnot

This comment has been minimized.

Show comment
Hide comment
@mnot

mnot Dec 26, 2014

Member

also Strict Transport Security.

Member

mnot commented Dec 26, 2014

also Strict Transport Security.

@wycats

This comment has been minimized.

Show comment
Hide comment
@wycats

wycats Dec 26, 2014

Member

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

Member

wycats commented Dec 26, 2014

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

@reschke

This comment has been minimized.

Show comment
Hide comment
@reschke

reschke Dec 26, 2014

301 doesn't work well here because it allows UAs to rewrite POST to GET when following the redirect. 308 might work though.

reschke commented Dec 26, 2014

301 doesn't work well here because it allows UAs to rewrite POST to GET when following the redirect. 308 might work though.

@awwright

This comment has been minimized.

Show comment
Hide comment
@awwright

awwright Dec 29, 2014

Developing HTTP server that produces and consumes RDF resources, and performing authentication over TLS, I've found the extra s in the URI scheme particularly troublesome, actually getting in the way of producing a secure application.

Particularly, my HTTP server uses http: URIs internally (for production of new URIs, but more importantly, the database of historical ones), and it will answer queries in the form of GET / HTTP/1.1 and GET http://example.com/ HTTP/1.1 but not GET https://example.net/ HTTP/1.1 because TLS is something added one abstraction layer below.

If the HTTP server is accepting queries in plaintext, that could result in an unacceptable leaking of sensitive information from clients. And some resources just can't be renamed, so a 3xx redirect doesn't really solve any problem (if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

For applications that require long-lived identifiers, particularly http: identifiers, there really needs to be some way to say "http:, but over TLS." Even if for non-browser user-agents. It doesn't seem like HSTS quite does this, though that seems to be the closest solution.

awwright commented Dec 29, 2014

Developing HTTP server that produces and consumes RDF resources, and performing authentication over TLS, I've found the extra s in the URI scheme particularly troublesome, actually getting in the way of producing a secure application.

Particularly, my HTTP server uses http: URIs internally (for production of new URIs, but more importantly, the database of historical ones), and it will answer queries in the form of GET / HTTP/1.1 and GET http://example.com/ HTTP/1.1 but not GET https://example.net/ HTTP/1.1 because TLS is something added one abstraction layer below.

If the HTTP server is accepting queries in plaintext, that could result in an unacceptable leaking of sensitive information from clients. And some resources just can't be renamed, so a 3xx redirect doesn't really solve any problem (if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

For applications that require long-lived identifiers, particularly http: identifiers, there really needs to be some way to say "http:, but over TLS." Even if for non-browser user-agents. It doesn't seem like HSTS quite does this, though that seems to be the closest solution.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Dec 29, 2014

Member

(if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

This is false; please don't spread misinformation.

Member

domenic commented Dec 29, 2014

(if I recall, Firefox will flag the page if any request in a sequence of redirects is non-secure, even if the final page by itself is, shouldn't this be noted/considered?).

This is false; please don't spread misinformation.

@diracdeltas

This comment has been minimized.

Show comment
Hide comment
@diracdeltas

diracdeltas Dec 29, 2014

@domenic: that might be true if by "flag" OP means "trigger mixed content warning/blocking" (when a fetched subresource is redirected). Both chrome and Firefox's active mixed content blocker fire before an http resource can be redirected to https; not sure if the passive mixed content warnings work similarly.

But in any case, this is irrelevant to links, which don't trigger mixed content blocking.

diracdeltas commented Dec 29, 2014

@domenic: that might be true if by "flag" OP means "trigger mixed content warning/blocking" (when a fetched subresource is redirected). Both chrome and Firefox's active mixed content blocker fire before an http resource can be redirected to https; not sure if the passive mixed content warnings work similarly.

But in any case, this is irrelevant to links, which don't trigger mixed content blocking.

@awwright

This comment has been minimized.

Show comment
Hide comment
@awwright

awwright Dec 30, 2014

@domenic https://bugzilla.mozilla.org/show_bug.cgi?id=418354

I know this is correct because I just had to fix that bug in an application! Customers were being sent first to a legacy application that didn't go through a secure origin.

I suggest trying it yourself.

awwright commented Dec 30, 2014

@domenic https://bugzilla.mozilla.org/show_bug.cgi?id=418354

I know this is correct because I just had to fix that bug in an application! Customers were being sent first to a legacy application that didn't go through a secure origin.

I suggest trying it yourself.

@domenic

This comment has been minimized.

Show comment
Hide comment
@domenic

domenic Dec 30, 2014

Member

Yes, I didn't realize you were actually including insecure resources in your page; if that's the case it's certainly mixed content as usual.

Member

domenic commented Dec 30, 2014

Yes, I didn't realize you were actually including insecure resources in your page; if that's the case it's certainly mixed content as usual.

@timbl

This comment has been minimized.

Show comment
Hide comment
@timbl

timbl Jan 2, 2015

Member

On 2014-12 -26, at 15:18, Thaddee Tyl notifications@github.com wrote:

Client libraries can and should try https whenever http requests fail.

But what about web sites where the port 443 site is completely different site form the port 80 site?
Then the user will follow a link, get no error, and get erroneous information.
Do you want to make it retrospectively illegal?
It may be rare but there is no guarantee if you just add an 's' that what you get will have any relation at all to what you wanted.

However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.


Reply to this email directly or view it on GitHub.

Member

timbl commented Jan 2, 2015

On 2014-12 -26, at 15:18, Thaddee Tyl notifications@github.com wrote:

Client libraries can and should try https whenever http requests fail.

But what about web sites where the port 443 site is completely different site form the port 80 site?
Then the user will follow a link, get no error, and get erroneous information.
Do you want to make it retrospectively illegal?
It may be rare but there is no guarantee if you just add an 's' that what you get will have any relation at all to what you wanted.

However, having clients use http when https is asked for ought to be forbidden. Happily, it encourages servers to make the switch.


Reply to this email directly or view it on GitHub.

@timbl

This comment has been minimized.

Show comment
Hide comment
@timbl

timbl Jan 2, 2015

Member

On 2014-12 -26, at 19:50, Yehuda Katz notifications@github.com wrote:

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

They apply to one URI only.
I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,


Reply to this email directly or view it on GitHub.

Member

timbl commented Jan 2, 2015

On 2014-12 -26, at 19:50, Yehuda Katz notifications@github.com wrote:

Allowing a website to declare in a machine-readable way to that http: and https: URIs in a given space are equivalent;

This is precisely what 301 and rel=canonical are all about. What limitations of those mechanisms are you worried about?

They apply to one URI only.
I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,


Reply to this email directly or view it on GitHub.

@mnot

This comment has been minimized.

Show comment
Hide comment
@mnot

mnot Jan 3, 2015

Member

I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,

That's HSTS, I think.

Member

mnot commented Jan 3, 2015

I was talking about a way to say in advance that any URI which starts with a given string will be guaranteed by the server to be redirected. The client then stored that and can skip all the redirects,

That's HSTS, I think.

@mnot

This comment has been minimized.

Show comment
Hide comment
@mnot

mnot Jan 3, 2015

Member

BTW, right now when I type a hostname into a browser without a scheme, it defaults to HTTP -- an interesting discussion could be had about if and when it should ever default to HTTPS. That's in-scope for WebAppSec, but probably not this doc.

Member

mnot commented Jan 3, 2015

BTW, right now when I type a hostname into a browser without a scheme, it defaults to HTTP -- an interesting discussion could be had about if and when it should ever default to HTTPS. That's in-scope for WebAppSec, but probably not this doc.

@ylafon

This comment has been minimized.

Show comment
Hide comment
@ylafon

ylafon Jan 3, 2015

Member

HSTS kicks in after the first redirect + proper response header for an https retrieval. There is a need for the first hit. (DNS was a possibility, but exporting too much metadata at that level is also an issue)

Member

ylafon commented Jan 3, 2015

HSTS kicks in after the first redirect + proper response header for an https retrieval. There is a need for the first hit. (DNS was a possibility, but exporting too much metadata at that level is also an issue)

@mnot

This comment has been minimized.

Show comment
Hide comment
@mnot

mnot Jan 3, 2015

Member

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

Member

mnot commented Jan 3, 2015

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

@ylafon

This comment has been minimized.

Show comment
Hide comment
@ylafon

ylafon Jan 3, 2015

Member

Well, to avoid the first redirect, so the first unencrypted communication to the server. Even if everything after that will be encrypted (and that choice will be cached usually for monthes), the first hit might be revealing.

Member

ylafon commented Jan 3, 2015

Well, to avoid the first redirect, so the first unencrypted communication to the server. Even if everything after that will be encrypted (and that choice will be cached usually for monthes), the first hit might be revealing.

@timbl

This comment has been minimized.

Show comment
Hide comment
@timbl

timbl Jan 4, 2015

Member

On 2015-01 -03, at 16:54, Mark Nottingham notifications@github.com wrote:

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Never say "just" before "round trips" :-)

In the 1980s one Brian Carpenter was a thought leader in terms of of protocols at CERN and told me memorably that only two things matter when it comes to network protocol performance, the time taken to copy the message, and the number of round trips. Nowadays I guess the two most important things are round trips and round trips ...

That's why I would like to see tables of round trip counts for all the options. Can someone please write this down from a knowledge of the protocols, or do we have to measure it with wire shark or something?!

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

The time it takes when you first use a given source or data is also important, a separate variable from the time you take on subsequent uses.

e.g. Suppose a viral social app had a very long first use time to first byte,that would likely decrease its likelihood of spreading virally as new users would follow another link while it loaded, and be lost.

I agree that the thing to optimize may be mainly subsequent uses, but one should not ignore first use completely.

timbl

director hat off


Reply to this email directly or view it on GitHub.

Member

timbl commented Jan 4, 2015

On 2015-01 -03, at 16:54, Mark Nottingham notifications@github.com wrote:

Can you expand upon "need"? Is it just round trips you're looking to avoid, or...?

Never say "just" before "round trips" :-)

In the 1980s one Brian Carpenter was a thought leader in terms of of protocols at CERN and told me memorably that only two things matter when it comes to network protocol performance, the time taken to copy the message, and the number of round trips. Nowadays I guess the two most important things are round trips and round trips ...

That's why I would like to see tables of round trip counts for all the options. Can someone please write this down from a knowledge of the protocols, or do we have to measure it with wire shark or something?!

Considering that HSTS is cached -- usually for a very long time -- I suspect it's going to have appreciably better performance without trading off security (acknowledging that HSTS isn't perfect there).

The time it takes when you first use a given source or data is also important, a separate variable from the time you take on subsequent uses.

e.g. Suppose a viral social app had a very long first use time to first byte,that would likely decrease its likelihood of spreading virally as new users would follow another link while it loaded, and be lost.

I agree that the thing to optimize may be mainly subsequent uses, but one should not ignore first use completely.

timbl

director hat off


Reply to this email directly or view it on GitHub.

@mnot

This comment has been minimized.

Show comment
Hide comment
@mnot

mnot Jan 12, 2015

Member

I think this is addressed in current text. Tim, if you have a concern, please reopen.

Member

mnot commented Jan 12, 2015

I think this is addressed in current text. Tim, if you have a concern, please reopen.

@mnot mnot closed this Jan 12, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment