New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our current way of notifying beacon changes can easily cause resource exhaustion #700

Closed
yaronyg opened this Issue Apr 4, 2016 · 1 comment

Comments

Projects
None yet
1 participant
@yaronyg
Member

yaronyg commented Apr 4, 2016

TL;DR - We need to make some string changes or our perf in dealing with beacon changes on native transports is going to be horrifically bad.

Our current approach and its motivation

In our current design when a peer updates its beacons it sends out a brand new peerID. The idea is that listeners shouldn't be able to track the peer across beacon changes because it constantly uses new peerIDs. This peerID then gets sent up the stack from wifi or native to ThaliMobile who will then pass the event on if the peerID is new or if something else like a port changed. This keeps the top layers from getting slammed with repeat notification (which both Wifi and the native code will send).

The problem

In the best case this approach is inefficient. For example, imagine that we are on wifi with peers A and B. At time 0 A sends out a SSDP announcement with id A1. Peer B hears the announcement but it's a bit busy and ignores it for a second. At time 1 peer B changes its beacons (because it updated its DB) and sends out a SSDP announcement with id A2. In theory peer B doesn't know that A1 and A2 came from the same devices so it is going to try and make requests to both devices instead of recognizing that A2 is a replacement for A1 and so A1 can be ignored. This is obviously wasteful but not deadly.

In iOS and Android there will be problems because unlike in WiFi, it is not possible in iOS and Android to connect twice to the same peer. Follow this logic carefully because this part can be a bit confusing.

When we send up a native peerAvailabilityChanged event above the thaliMobile layer we include an IP address (in the case of iOS and Android this will be localhost) and a port. The upper level code can connect to that IP address and port and create as many TCP connections as they want. The reason is that all those TCP connections are muxed into a single TCP connection which is sent across the single native duplex stream between the peers.

So when A1 came up it had port X. When A2 comes up as an event it will have port Y. Note that X != Y. So from the upper layers perspective it loos like A1 and A2 are pointing at different peers. But they aren't. They are the same peer. When the node.js code connects to port X that causes a native connection to be made. When the node.js code then tries to connect to port Y that will cause an attempt to create a second, separate, native connection. It is that second attempt that will fail. Only one native connection per customer.

So what does the node.js code do when it gets the error on the second attempt?

In theory it could just ignore it since the error means that the peers are already connected. But it's not that easy (you didn't think it would be that easy, did you?).

Imagine the following scenario:

Time 0 - Peer B gets announcement A1 with port X
Time 1 - Peer B gets announcement A2 with port Y
Time 2 - Peer B attaches to port X, gets back a beacon set with nothing interesting in it and kills its local TCP connections to the mux listener on port X. Killing the TCP connections to the mux listener does NOT kill the native connection. This is because it's quite common for connections to the mux listener to be replaced. So we will only kill the native connection if there is an inactivity time out or if someone uses the feature I still have to implement to allow for directly killing outgoing native connections.
Time 3 - Peer B attaches to port Y which triggers an error that Peer B is already connecting to the peer behind Port Y.

The problem is, which peer is the error talking about? Remember, Peer B does not know that A1 and A2 belong to the same peer. So now peer B can't get the new beacons from port Y and has forgotten port X! EEEK!

How to fix

A possibly hacky work around is to keep track of all connections and ping them all whenever we get a conflict. But that is messy and silly. The reason its silly is that the concept underlying this approach, that we can hide Peer A's identity is currently wrong.

In the case of WiFi every SSDP announcement might go out with a unique peerID but it always has the exact same IP and port! In the case of iOS for implementation reasons we had to create peerIDs that consist of two parts, a constant UUID followed by a generation counter that changes every time the beacons are updated. So while the Node.js code doesn't know that A1 and A2 belong to the same peer, the native code does! And in Android every beacon announcement goes out with the devices Bluetooth address which is a constant!

So the real solution here is to change things as follows.

  1. We should add a field to the wifi and native peer availability changed events that specifies if the beacons have been updated (this translates to a call to updateAdvertising). This means that a peer availability changed event can communicate a new peer, a peer leaving or an existing peer updating its beacons. This will require (minor) changes in the native layer, the native wrapper, the wifi layer, thali mobile and even notifications (so they trigger on beacon changes and know not to double schedule if they get multiple beacon changes from the same peer).
  2. On WiFi it would probably be cleanest if we changed our peerID to use a design like iOS where we have a UUID we generate when we start up and then a generation counter we use as the beacons are changed. It's tempting to just use the IP and port advertised in the location header but if a device is coming on and off wifi both can change and cause confusion. Although to be fair one could plausibly argue that is a feature.
  3. On iOS we just need to expose what we have. When we communicate up the stack we need to use the UUID we already generate as the peerID and then use the generation counter to drive beacon updates.
  4. On Android we can use the bluetooth address as the peerID and we can use a hash of the BLE address (which changes on every update) as a beacon change flag. Yes, we have counters in the previous cases which give us ordering but we really don't need that level of knowledge. We just need to know that there was a change. So we can keep things simple and leave the update flag as a Boolean.

So was the original idea always wrong?

Actually, no, it wasn't. It was driven by the assumption that we would use BLE on Android not to only to signal new beacons but also to exchange beacons. In that case we really could create anonymity since every time we update the beacons we change our BLE address. But that design will still work when we install it. It just means we won't ever get a second notification with the same ID with a beacon update set to true. And the bugs above don't apply since we wouldn't have connection semantics to the GATT server. We will at worst just get an address not found error.

So this isn't terribly hard to fix.

@yaronyg yaronyg added this to the V1 milestone Aug 3, 2016

@yaronyg yaronyg added 1 - Backlog and removed 0 - Icebox labels Aug 4, 2016

@yaronyg

This comment has been minimized.

Show comment
Hide comment
@yaronyg

yaronyg Sep 26, 2016

Member

We have done this by changing peerAvailabilityChanged to consist of a relatively persistent peerID and a generation.

Member

yaronyg commented Sep 26, 2016

We have done this by changing peerAvailabilityChanged to consist of a relatively persistent peerID and a generation.

@yaronyg yaronyg closed this Sep 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment