Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"1.10 Privacy concerns" Clearly define the term "privacy" as applied within the specification #3304

Open
guest271314 opened this issue Dec 19, 2017 · 22 comments

Comments

@guest271314
Copy link
Contributor

commented Dec 19, 2017

https://html.spec.whatwg.org/#toc-introduction

1.10 Privacy concerns

The term "privacy" is used at least 23 times within the specification (multipage) though the term is not clearly defined.

The term "privacy" should be clearly defined, for example:

Definitions:

"Privacy": ...

to avoid ambiguity and unrealistic expectations of viewers of the HTML specification; relevant to the history, architecture and current technical facts of communications using HTML and the World Wide Web in general.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 19, 2017

What's wrong with applying the dictionary definition here?

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2017

@annevk If that is the route that you decide to take then so be it. The first rule of construction is to determine what the authors meant by the term at the time the term was first used.

If there is ambiguity as to the meaning of the term the history of the term is reviewed.

If there is still ambiguity as to the meaning of the term a technical paper, dictionary or other authoratative text is consulted.

Here, the primary document is the current HTML Standard itself, where no clear definition appears.

The reality of the web is that there is no expectation of privacy of any kind due to the architecture of the web and that all web usage is tracked. From perspective here it would be fitting to convey the reality to the reader instead of abstractions as to what "privacy" is.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2017

The third paragraph above actually come before the second.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 19, 2017

If there's no definition of a term that means you should use the dictionary definition. I think the dictionary definition suffices for what the standard is trying to convey, too.

The reality of the web is that there is no expectation of privacy of any kind due to the architecture of the web and that all web usage is tracked. From perspective here it would be fitting to convey the reality to the reader instead of abstractions as to what "privacy" is.

I don't think you'll find wide agreement on that.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2017

"wide agreement" is irrelevant. The technical facts cannot be disputed that all web usage is tracked.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 19, 2017

There is no "the dictionary". The dictionary that you select as the primary source for the definition of a given term is important.

Given that HTML Standard is a technical document, the primary source should be a technical dictionary within the specific industry.

If necessary will compose a PR to substantiate the facts relevant to all web usage being tacked for inclusion into the HTML Standard.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 19, 2017

It's still unclear to me what's actually wrong here.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

There is an overarching concern that all ongoing communications on the Web are tracked by both primary and third parties. When discussing "privacy" that fact should be clearly indicated. There should not be any expectation of "privacy" for any activity on the Web. Using the term "privacy" without addressing the facts that all communications on the Web are tracked and stored provides the inference that there is a possibility of any form of "privacy" within the domain of the Web. What is wrong is that the above fact that all communications on the Web are tracked and stored is not noted in the specification.

8.7.1.3.1 Security and privacy attempts to convey the above expressed concerns, though stops short of disclosing to the user that

Hijacking all Web usage. User agents should not allow schemes that are key to its normal operation, such as an HTTP(S) scheme, to be rerouted through third-party sites. This would allow a user's activities to be trivially tracked, and would allow user information, even in secure connections, to be collected.

occurs persistently, without exception; via an ISP, carrier, or "third party" entities which consume all of the metadata and, or content transmitted as packets all of the time.

Provide greater context to the as-applied meaning of "privacy" within the HTML Standard - that there should both generally and specifically not be any expectation of privacy on the Web.

The current version of the specification leaves open the incorrect impression that somehow "privacy" is possible on the Web

11.4 Privacy
11.4.1 User tracking
A third-party advertiser (or any entity capable of getting content distributed to multiple sites) could use a unique identifier stored in its local storage area to track a user across multiple sessions, building a profile of the user's interests to allow for highly targeted advertising. In conjunction with a site that is aware of the user's real identity (for example an e-commerce site that requires authenticated credentials), this could allow oppressive groups to target individuals with greater accuracy than in a world with purely anonymous Web usage.

The fact is that the graph of the users' activities as a profile is an ongoing normative occurrence, not an exception, as evidenced by the some time ago by the sessionizers deployed in response to the Tor project. Users reviewing HTML Standard should have no illusions as to the realities of the communication medium that they are using, nor the "third parties" which actively track, profile and graph users - irrespective of the protocols for HTML specifications and implementations employed to thwart or mitigate such tracking.

A note summarizing the above facts - that the Internet, Web, interwebs, etc. should be considered compromised from the onset, and that any "privacy" measures considered should be with the underlying backdrop of third party entity interests in tracking all communications all of the time.

Thank you for your concerns.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

I still don't follow. How would a different definition of privacy help? How would you define it other than most dictionaries?

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

I still don't follow. How would a different definition of privacy help? How would you define it other than most dictionaries?

Currently the term "privacy" is not defined at all at the specification. The term is used in several sections, though left without clear meaning.

To illustrate the importance of context when defining terms, you would not rely on a dictionary alone to define the "fetch" algorithm whatwg/fetch#646, as you have extended the base definition of the term to include a number of steps which will not be found in the dictionary to adequately describe what "fetch" is within the context of modern browsers. WHATWG in essence is now the primary source for the definition of "fetch" within the purview of the Web - any individual or group which mentions "fetch" within the context of the Web should reference the published work of WHATWG relevant to "fetch" for the most up to date and complete definition of the term "fetch".

You could use a dictionary to define the term as-applied within the specification if that is what you decide to do. Though similarly to the description of "fetch" above, "privacy" is not a static noun which does not have contours. From perspective here "privacy" would exclude third parties from reading packets on the wire and storing the associated metadata and or content in one or more facility forever - with the complicity and cooperation of ISP, fibre optic maintainers, web services, etc. The reality is that is occurring. There is no "privacy" in the sense that is defined within a common dictionary. The descriptions within the specification generally seem to refer to the context of "evilsite.com" deploying some nefarious code to track users "cross-site". The current state of privacy is that the wire itself is squeezed for all of the transmitted packets all of the time and stored indefinitely in some N-million+ square foot garage of individuals communication. Within such an environment "privacy" as defined by a common dictionary does not exist.

In any event, technical writing requires clear definitions, as you know. This issue is to relay the fact that the term "privacy" is not currently clearly defined within the HTML Standard.

You can either 1) ignore the facts as to total information awareness relevant to "privacy", and use a common dictionary to include the definition of privacy in the specification; 2) use a technical publication to reference a definition for "privacy", for example the Electronic Frontier Foundation; 3) do nothing and by omission leave the reader of the specification with the inference of the possibility that "privacy" can somehow be achieved by doing nothing other than using a stock browser to navigate the web - as long as you stay away from "evilsite.com" and those pesky iframe's communicating with tabs to set cookies - ignoring the fact that "goodsite.com"'s traffic is tracked at the packet level and stored indefinitely.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

The meaning of "fetch" comes with a lot of implementation requirements and observable differences if we don't all use the same meaning. Can you construct a test that shows the same kind of implementation differences for "privacy"? As far as I can tell it's only ever used in non-normative fashion as a catch-all phrase to encapsulate concerns that are then either detailed in place or do not need to be detailed.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

The basis of a test case case is described above at

From perspective here "privacy" would exclude third parties from reading packets on the wire and storing the associated metadata and or content in one or more facility forever - with the complicity and cooperation of ISP, fibre optic maintainers, web services, etc.

We could construct a form to ask individual readers of the specification what "privacy" means to them - as that is what is important here. That the information is available, whether agreement is reached or not, the technical facts should be available for review and careful consideration. And that casual readers of the specification do not have the illusion that they can switch to "private browsing" and somehow become "anonymous"; that patching cross-site scripting or closing one or two loopholes which allows tabs to communicate and set cookies will cease all workarounds to achieve the same effect; or that "purely anonymous Web usage" is possible.

For example, does "privacy" include third parties not reading your packets in real-time, creating a graph from said packets and storing all of your Web activity forever?

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

Can you construct a test that shows the same kind of implementation differences for "privacy"?

That is actually a sage suggestion. Will do what am able to create a template and perhaps eventually codify the definition of the term here. There is no reason HTML Standard could not define "privacy" itself within the domain of the Web.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

I don't think it matters for how we use the term.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

Everything matters. But, if you would like to steer clear of the definition construction that is fine. The term should still be defined at the specification. You are extremely granular with "fetch" and "streams", yet leave the overarching concern of "privacy" undefined. Be consistent. At least defer to EFF for the current state of "privacy" on the Web. There are open issues dealing with "privacy" #3054 though the term is left undefined. Either we address the issue fully or leave it for a future generation.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

I agree with your concern at a conceptual level, but I think in practice we're quite specific about what the privacy concerns in question are. Though also, there's still a lot of experimentation in the area and various browser configurations (all the way up to Tor) that's not at all ready for standardization. And in such cases being more generic is totally justified in my opinion.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

all the way up to Tor

Well, the sessionizers were deployed to address Tor usage many years ago. We are beyond that point now. There is no mitigation or remedy available to avoid all packets being read and stored, from what am able to gather. Thus, there is no "privacy" on the Web in any sense. Using the term in a generic fashion is a misnomer.

@annevk

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

Citation needed...

(Also, note that everything gets a little blurry at the edges. We don't define the exact cache policies, we don't define OOM behavior, we don't define rendering of form controls, etc. Not everything is defined in the same level of detail as self.navigator.appName.)

@domenic

This comment has been minimized.

Copy link
Member

commented Dec 20, 2017

To be clear, the criteria for changing the standard would be you writing a web platform test showing that a specific sentence in the spec is not followed by browsers, when using the dictionary definition of privacy. If you cannot write such a test, then the dictionary definition is fine for the purposes of this spec, and no change will be made.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2017

Citation needed...

There is ample evidence in the field at large from lectures given at various conferences though there has recently been a film made relevant to the subject matter A Good American.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2017

To be clear, the criteria for changing the standard would be you writing a web platform test showing that a specific sentence in the spec is not followed by browsers, when using the dictionary definition of privacy. If you cannot write such a test, then the dictionary definition is fine for the purposes of this spec, and no change will be made.

Well, the specification does not even include "the dictionary" definition of privacy. The word is merely used within the specification with no definition provided whatsoever.

It is not a matter of browsers not following the specification, but rather the wire itself being tapped at 20TB a second, bypassing any privacy measures browsers might employ.

@guest271314

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2017

If there is an issue viewing the video at the official website here is a link to the film at *outube https://www.youtube.com/watch?v=666wsDcoNrU, where the primary verification is for once the actual primary source at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.