Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document privacy & security implications #26

Closed
igrigorik opened this issue Sep 29, 2015 · 16 comments
Closed

Document privacy & security implications #26

igrigorik opened this issue Sep 29, 2015 · 16 comments

Comments

@igrigorik
Copy link
Member

Source: https://groups.google.com/a/chromium.org/d/msg/blink-dev/tU_Hqqytx8g/HTJebzVHBAAJ

Knowing that a users' downlinkMax is changing can reveal information about what the user is doing (e.g. assuming we get it accurate, via NQE) or where the user is going - watching the user transition from cellular to wifi, for example, may reveal once a user is outside an office building, and watching the user transition from cellular 2g to cellular 3g to cellular 2g, along with other ambient sensors of a device (such as accelerometers, via http://w3c.github.io/deviceorientation/spec-source-orientation.html ) could reveal a user as they transition from cellular towers or move around a city.

@marcoscaceres
Copy link
Contributor

Yeah... that's not so great.

@jkarlin
Copy link

jkarlin commented Sep 29, 2015

Note that switching between cellular and wifi is easily observable via IP address change.

@marcoscaceres
Copy link
Contributor

@jkarlin, how would one observe that within a web page?

@jkarlin
Copy link

jkarlin commented Sep 30, 2015

Right, it requires network requests. But the information is still easily available.

Also, this isn't just a downlinkMax issue, connection.type already exposes cellular vs wifi.

@sleevi
Copy link

sleevi commented Oct 2, 2015

Note that switching between cellular and wifi is easily observable via IP address change

This isn't the case if the user is using a VPN or proxy; we'd end up revealing state that the remote server operator could otherwise not obtain.

This is similar to the privacy issues surrounding WebRTC, and "detect if the user is on a VPN or proxy" is a solution we've explored at great depth and do not believe it to be a reasonable thing a UA can accomplish, nor a good solution across platforms.

@igrigorik
Copy link
Member Author

Quick recap of where are today:

  1. connection.type is already available in Chrome and FF OS and allows you to query the 'coarse' connection type (e.g. bluetooth, wifi, cellular, ethernet, ...).
  2. connection.downlinkMax exposes an Mbps value that does not distinguish between types but may rely on type+subtype to bootstrap the value (e.g. 10Mbps value could be either via LTE, WiFi, or ...). This value may also be determined by some ~Network Quality Estimator implementation which is based on past performance of the network, signal quality, etc.

Combined, these two signals allow the developer to get information like "the user is on a cellular network, with downlink of ~X Mbps". However, even without downlinkMax you can already get information about user switching coarse network types (e.g. wifi -> cellular transitions).


In terms of moving forward, I think there are a couple of separate threads here:

  1. NetInfo exposes information about the user's network - e.g. combination of type and downlinkMax allows the application to track transitions between network types. As such, it does sound like it should require a privileged context.
    • This is not the case today for accessing connection.type, should we revoke this capability for non-privileged contexts?
  2. Should NetInfo require explicit user opt-in - e.g. "https://example.com wants to know your network information -- Yes / No"? Any pitfalls here?

@sleevi
Copy link

sleevi commented Oct 6, 2015

To be clear, what you pose as option 1 is merely a subset of option 2.

If we accept 2, then the only answer for 1 is yes. If we say no to 1, then the only possible answer for 2 is no.

@igrigorik
Copy link
Member Author

@sleevi yep. I guess the missing question here is whether there are any other in-between options?

@igrigorik
Copy link
Member Author

I've merged https://github.com/w3c/netinfo/pull/31, preview: http://w3c.github.io/netinfo/#privacy.

However, above text does not address Ryan's earlier point about VPN + connection.type transitions. Should we add an additional warning clause for this? I don't believe there is anything special we can do here.. as the UA may not know if its running over a VPN connection? Further, even if and when it does, changing behavior would leak the fact that the user is on VPN, which has its own issues?

@igrigorik
Copy link
Member Author

Closing due to inactivity. @sleevi feel free to reopen if you think there is more to be done here.

@sleevi
Copy link

sleevi commented Nov 30, 2015

I think one comment I'd make regarding #31 is that "knowing end-to-end properties reveals information about the first network hop" is not necessarily true, under various scenarios. For example, from an ISP proxy level (whether mobile or transparent), you can only get so much fidelity at the server side - at best, you know the performance metrics to the ISP, but not to the actual user. Now, if you combine that with JS (whether XHR, onload, or Resource Timing), you get more fidelity, but that's if and only if the user has JS involved.

I think it'd be ideal for the privacy section to actually flush out and spell out some of the attacks and mitigation - both to save discussion in the future and to show the threat model being addressed.

For example, the privacy section makes the claim "knowing end-to-end properties reveals information about the first network hop", but doesn't really address how / under what model. Are you presuming the availability of the Resource Timing API? Are you presuming an 'attacker' sniffing with img onload? The privacy section just sort of says "Yeah, there are privacy issues, but nothing more than existing stuff," which I know is the position you've taken, but it doesn't really elaborate.

Let's say someone wanted to mitigate the privacy concerns. They could turn this API off, but that may be either heavy-handed (has more adverse affects than intended) or it may be the wrong knob (e.g. the privacy issues only arise when coupled with other APIs / behaviours).

One way to frame it would be to examine the bits of unique information being offered by this API, and then show comparatively how this information can be obtained via other means, and why it's a similar privacy risk. Or show how 'new' attacks exist by combining with other aspects of information.

@igrigorik
Copy link
Member Author

I think one comment I'd make regarding #31 is that "knowing end-to-end properties reveals information about the first network hop" is not necessarily true, under various scenarios. For example, from an ISP proxy level (whether mobile or transparent), you can only get so much fidelity at the server side - at best, you know the performance metrics to the ISP, but not to the actual user. Now, if you combine that with JS (whether XHR, onload, or Resource Timing), you get more fidelity, but that's if and only if the user has JS involved.

Not sure I follow. NetInfo exposes last hop, not end-to-end. Also, an intermediate proxy can observe timing data in both directions, regardless of where it is in the routing chain?

Are you presuming the availability of the Resource Timing API?

No, just observing the timing of the any fetch (e.g. HTML document) reveals a lot of data.. RTT, throughput, etc. That's what existing applications are already using to get BW estimates and modify app behavior -- except, they're forced to do this after the fact / after one or more fetches.

@sleevi
Copy link

sleevi commented Dec 2, 2015

Not sure I follow. NetInfo exposes last hop, not end-to-end. Also, an intermediate proxy can observe timing data in both directions, regardless of where it is in the routing chain?

To be explicit: The threat model I'm presuming here is a hostile end-point server (evil.example.com), wishing to interrogate as much information as possible about the end user. For further sake of discussion, let's consider a browser that explicitly tries to be privacy preserving in all possible ways (such as Tor Browser Bundle). Finally, let's consider a reasonably paranoid user that is using an upstream proxy (such as over VPN) as a further anonymization tool.

With this model, let's think about:

  • What information NetInfo exposes
  • Whether that information was already available
  • What steps can be taken to mitigate that information

In the current "Privacy Considerations", my concern is that it doesn't really enumerate the concerns, and sort of handwaves as "This information's already out there, so no biggy". But ideally, the privacy considerations would talk about that information, and explain how it's available, so that it can be clear that "If you mitigate X, also mitigate Y", and, conversely, "If you're concerned about Y, you should also be concerned about X"

For example, you mentioned the following:

No, just observing the timing of the any fetch (e.g. HTML document) reveals a lot of data.. RTT, throughput, etc

But this still feels hand-wavy.

For example, under the above model:

  • The server knows the intermediate proxy server's address (via the socket)
  • The server knows the timing of any fetch, based on when it sees the request start and stop, and can also compute information such as RTT, throughput, etc.

Alone, this seems to contradict the statement in the spec that

knowing end-to-end properties reveals information about the first network hop

However, I suspect you were presuming, under your threat model, that JS was enabled. This gives us:

  • With JS enabled, the server can use JS events such as onload to measure the client's perspective, thus measuring information about the proxy (and thus the first network hop).
  • With JS enabled, the server can use the Resource Timing API to measure the client's perspective, thus measuring information about the proxy (and thus the first network hop).

Is that a clearer explanation of the concerns?

Now let's talk about concrete changes to the "Privacy Considerations" sections that might be able to address these concerns, if they're deemed to be founded (I could, after all, be a crazy ranty person)

Privacy Considerations
The Network Information API exposes information about the first network hop between the
user agent and the server; specifically, the type of connection and the upper bound of the
downlink speed, as well as signals whenever this information changes.

This opens up several privacy-hostile attacks that site operators may wish to mount:

  • If it detects that the connection type is other, none, mixed, or unknown, this may be
    a signal that the user is attempting to access the site through means such as a VPN. A site
    may use this as a signal to deny access to the user (e.g. due to geofencing policies)
  • Through detecting changes to the network type (such as transitions between wifi and
    cellular), this may be able to serve as means of:
    • Geolocating a user (are they home, at work, or in transit)
    • Fingerprinting a user (User X transitioned networks at 9:01 AM on Monday, User Y
      transitioned networks at 9:01 AM on Tuesday, and User Z transitioned on Wednesday - with
      enough signals, it may be possible to determine that X, Y, and Z are the same person and
      they are taking the train to work each day)
  • Through detecting changes to the max downlink speed, they may be able to further refine such
    information.

However, these considerations are not new, and sufficiently motivated attackers may already
obtain and exploit such information using existing technologies.

For example, in a UA that supports the Resource Timing API, the attacker can infer the measured
RTT and throughput of the overall network connection, and use that to infer the user's connection
type.

Alternatively, an attacker could use the Fetch API to constantly make (small) requests to a service,
and look for changes in the IP address of the incoming requests to infer the user's network
connection. Further, information such as the source IP may also reveal geolocation or fingerprinting
opportunities similar to those exposed here.

Yet another means of obtaining similar information would be through the WebRTC API and the
(whatever the list of IP addresses thing is called) to obtain information about the user's IP.

While privacy-conscious users may attempt to mitigate existing techniques through the use of
a configured proxy server, if any scripting is allowed at all, then an evil server can leverage
existing technologies to obtain the same information offered directly by this API. As such, while
this API makes it easier to obtain this information, by avoiding the need for additional network
requests, this information is already available to a sufficiently-motivated attacker.

That's probably quite poorly worded, but tries to concretely spell out attacks offered by this API, demonstrate how these attacks are already possible with existing APIs, and thus support the final conclusion that "additional exposure is not substantial".

As the text currently reads, it's hand-waved away what the threat model the spec is concerned about, or how the attacker uses existing techniques, and as such, doesn't really feel like it provides meaningful guidance for implementors and reviewers as to what (some of) the possible privacy considerations are, or why proposed mitigations may not be.

@igrigorik
Copy link
Member Author

Thanks Ryan, this is a definitely a good improvement, reopening while we iterate on the wording..

If it detects that the connection type is other, none, mixed, or unknown, this may be a signal that the user is attempting to access the site through means such as a VPN. A site may use this as a signal to deny access to the user (e.g. due to geofencing policies)

You're implicitly assuming that we report "unknown" when user is on VPN.. what's the reasoning here? Note that we don't have this as a requirement in the spec today, and its entirely clear to me if this is actually enforceable by the browser? As in, do we know even know if we're being routed through a VPN tunnel on all platforms?

@igrigorik
Copy link
Member Author

@sleevi ptal: https://github.com/w3c/netinfo/pull/35/commits - I took the liberty of rewriting some of the points. Hopefully I didn't botch it too badly :-). Also, I skipped the VPN stuff for now, per my question above.

@igrigorik
Copy link
Member Author

Resolved via https://github.com/w3c/netinfo/pull/35.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants