-
-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End Nodes: Guidelines for communication with S2+Supervision #5471
Comments
The Problem, Part 1This all seems fine, but a problem arises when end nodes use S2+Supervision for their reporting. Remember that Supervision requires the target to respond, so a typical status update will look like this: sequenceDiagram
participant Z as Z-Wave JS
participant N as End Node
N->>Z: encrypted command, with Supervision
note over Z: processes command
Z->>N: encrypted Supervision REPORT
Compared to end nodes, Z-Wave JS / the Controller is often sending many more commands, e.g. when controlling multiple devices at once. Incoming supervised commands will be handled ASAP, but only after a complete send->ACK flow. We'll assume that all following commands are using S2 encryption. sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
participant N3 as Node 3
participant N4 as Node 4
Z->>N2: command
activate Z
note left of Z: busy / waiting for ACK
N4-->>Z: supervised command
activate N4
note right of N4: waiting for Supervision REPORT
N2->>Z: ACK
deactivate Z
Z-->>N4: Supervision REPORT
deactivate N4
Z->>N3: command
activate Z
note left of Z: busy / waiting for ACK
N3->>Z: ACK
deactivate Z
Impatient End Nodes vs. sensitive 700 series controllers: It can happen that sending a command to a node takes longer than expected. Multiple seconds are not unheard of, especially in imperfect networks. Often, end nodes time out waiting for the Supervision Report very quickly and re-transmit it multiple times: sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
participant N4 as Node 4
note over N2: flaky / slow<br>connection
Z->>N2: command
activate Z
note left of Z: busy / waiting for ACK
N4-->>Z: supervised command
activate N4
note right of N4: waiting
N4-->>Z: supervised command (re-transmit)
note right of N4: 500ms later
N4-->>Z: supervised command (re-transmit)
note right of N4: 500ms later
N4-->>Z: supervised command (re-transmit)
note right of N4: 500ms later
N4-->>Z: supervised command (re-transmit)
N2->>Z: ACK
deactivate Z
Z-->>N4: Supervision REPORT
deactivate N4
I've seen this be repeated 10x or more. In contrast to 500 series controllers, 700 series controllers seem to have more trouble transmitting when there's lots of traffic on the network. In situations like the one above, the incoming traffic itself often causes the outgoing message to take much longer to be transmitted, which causes more messages to be re-transmitted, which causes the outgoing message to take even longer, ... Supervision for everything sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
participant N3 as Node 3
participant N4 as Node 4
participant N5 as Node 5
Z->>N2: Turn ON
activate Z
N2->>Z: ACK
deactivate Z
note left of Z: 1st device ON
Z->>N3: Turn ON
activate Z
N2-->>Z: supervised (A) report
activate N2
N3->>Z: ACK
deactivate Z
note left of Z: 2nd device ON
Z-->>N2: Supervision REPORT (A)
deactivate N2
N2-->>Z: supervised (W) report
activate N2
Z-->>N2: Supervision REPORT (W)
deactivate N2
Z->>N4: Turn ON
activate Z
N3-->>Z: supervised (A) report
activate N3
N3-->>Z: supervised (A) report (re-transmit)
N4->>Z: ACK
deactivate Z
note left of Z: 3rd device ON
N4-->>Z: supervised (A) report
activate N4
N4-->>Z: supervised (A) report (re-transmit)
Z-->>N3: Supervision REPORT (A)
deactivate N3
N3-->>Z: supervised (W) report
activate N3
Z-->>N4: Supervision REPORT (A)
deactivate N4
N4-->>Z: supervised (W) report
activate N4
Z-->>N3: Supervision REPORT (W)
deactivate N3
N4-->>Z: supervised (W) report (re-transmit)
Z-->>N4: Supervision REPORT (W)
deactivate N4
note left of Z: Longer and longer delays<br>between commands
Z->>N5: command
activate Z
N5->>Z: ACK
deactivate Z
note left of Z: 4th device ON
... imagine this for 10 or more devices. Suddenly the end nodes control the communication in the network, and not the controller. |
The Problem, Part 2Remember that I wrote how Supervision is a way to avoid unnecessary status queries because the Supervision Report tells the controlling node that the controlled node has executed the command and is now in the desired state? It turns out that many devices still send an unsolicited update with their new status, even if controlled using Supervision. That unsolicited update uses Supervision of course, so each command now needs at least 4 instead of 2 messages: sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
Z->>N2: Dim to 50% (Supervised)
activate Z
N2-->>Z: ACK
deactivate Z
N2->>Z: Supervision Report: SUCCESS
note left of Z: Knows that Node 2<br> is at 50% brightness
note over Z,N2: ↓ This is completely unnecessary ↓
N2->>Z: reports brightness (supervised)
Z-->>N2: ACK
Z->>N2: Supervision REPORT: SUCCESS
This quickly gets ugly when multiple nodes are involved and the ones sending unnecessary supervised reports get impatient: sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
participant N3 as Node 3
Z->>N2: Dim to 50% (supervised)
activate Z
N2-->>Z: ACK
deactivate Z
Z->>N3: Dim to 50% (supervised)
activate Z
note left of Z: busy / waiting for ACK
N2->>Z: Supervision Report: SUCCESS
note left of Z: knows Node 2 is at 50%
N2->>Z: reports 50% brightness (supervised)
activate N2
note over N2: 500ms later
N2->>Z: reports 50% brightness (supervised, re-transmit)
N3-->>Z: ACK
deactivate Z
N2->>Z: reports 50% brightness (supervised, re-transmit)
Z->>N2: Supervision REPORT: SUCCESS
deactivate N2
N3->>Z: Supervision Report: SUCCESS
note left of Z: knows Node 3 is at 50%
N3->>Z: reports 50% brightness (supervised)
activate N3
Z->>N3: Supervision Report: SUCCESS
deactivate N3
again, imagine this for 10+ nodes. This is how it should look like, even if some nodes are slower to respond: sequenceDiagram
participant Z as Z-Wave JS
participant N2 as Node 2
participant N3 as Node 3
participant N4 as Node 4
participant N5 as Node 5
participant N6 as Node 6
participant N7 as Node 7
participant N8 as Node 8
participant N9 as Node 9
participant N10 as Node 10
Z->>+N2: Dim to 50% (supervised)
activate Z
N2-->>Z: ACK
deactivate Z
Z->>+N3: Dim to 50% (supervised)
activate Z
N2->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 2 is at 50%
N3-->>Z: ACK
deactivate Z
N3->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 3 is at 50%
Z->>+N4: Dim to 50% (supervised)
activate Z
N4-->>Z: ACK
deactivate Z
Z->>+N5: Dim to 50% (supervised)
activate Z
N5-->>Z: ACK
deactivate Z
N5->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 5 is at 50%
N4->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 4 is at 50%
Z->>+N6: Dim to 50% (supervised)
activate Z
N6-->>Z: ACK
deactivate Z
Z->>+N7: Dim to 50% (supervised)
activate Z
N6->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 6 is at 50%
N7-->>Z: ACK
deactivate Z
Z->>+N8: Dim to 50% (supervised)
activate Z
N8-->>Z: ACK
deactivate Z
Z->>+N9: Dim to 50% (supervised)
activate Z
N9-->>Z: ACK
deactivate Z
N7->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 7 is at 50%
N9->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 9 is at 50%
Z->>+N10: Dim to 50% (supervised)
activate Z
N8->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 8 is at 50%
N10-->>Z: ACK
deactivate Z
N10->>-Z: Supervision Report: SUCCESS
note left of Z: knows Node 10 is at 50%
|
The Problem, Part 3Just combine parts 1 and 2: Nodes which...
That's the sad reality today. |
The SolutionAs I wrote earlier, Z-Wave traffic is a sparse resource and needs to be used accordingly. So the primary goal is to avoid unnecessary traffic altogether. Parts of this may not be applicable to all reports a device sends - after all you may need to make sure that some critical reports are received and understood - like an empty battery for a smoke sensor. But I think the above issues can be resolved with a few changes to communication strategy: Do not send unsolicited updates for states controlled using Supervision Do not use Supervision for everything Use a configurable backoff strategy for re-transmits
Maybe: Make Supervision usage configurable Choose sane defaults for reporting configuration: |
A lot of issues in user's networks come down to too much parallel communication. While we have a few guidelines on how to configure devices optimally to prevent this, often it is not possible due to how the devices are implemented. Especially when S2 and Supervision are involved, things can easily go sideways.
Z-Wave Communication Basics
To better understand the issue, let's take a look at how Z-Wave communication works at a high level.
Basic communication flow:
This process is typically very fast (~10ms), but can take several seconds when the controller has trouble reaching the node.
The important part to remember here is that this entire flow needs to be completed before another command can be sent.
Basic communication flow with status updates:
Even if the node got the command, this does not mean it could understand it or even executed it. However, applications usually want to know if a command was executed, e.g. if a light was turned on or a door was unlocked. To guarantee that, Z-Wave JS waits for the node to report its new status. If that doesn't happen within a few seconds, it queries the current status. For simplicity, the controller and protocol-level ACKs are omitted from the following flow:
That status update does not require a response outside of the protocol-level ACK, which is sent automatically by the controller.
When the node does not automatically send status reports (or does not understand the command), this can lead to a couple of seconds of uncertainty until the status has been queried.
Supervised commands:
By using Supervision CC, the node is required to respond whether it understood and executed the command:
This eliminates the uncertainty and it reduces the number of commands to 2 (instead of 2 or 3). Since Z-Wave has very limited bandwidth shared by up to 232 nodes, so reducing the number of commands needed for each action is beneficial.
Encrypted commands:
When encryption is involved, things become a little more complicated. The older standard Security S0 is notorious for adding up to 2 commands overhead for each exchanged command, because it requests a nonce from the target:
Like before, it is unclear if the target node understood the command unless it sends an update, so this exchange may be followed up with a GET and a REPORT, each time exchanging new nonces before.
Security S2 does this better by establishing a shared encryption state which does not need any nonce exchange unless there are communication failures involved and one party gets out of sync.
In case of a decryption failure, the target responds with a nonce report, which will cause the sender to re-transmit its command including a its nonce to re-sync the shared state:
So in order to handle cases where the target cannot decrypt the command, the sender would have to wait for a potential nonce report, so it can re-transmit the command:
While this does work, it introduces unnecessary delays. The nonce report can easily take 0.5 to 1s to be delivered, so the sender should wait at least this long, even if the command was processed within 10ms. This is fine though if few messages need to be delivered (e.g. 2-3 reports from a node to the controller), but very noticeable when trying to control many devices (e.g. when a user wants to turn on 10+ devices).
Supervision to the rescue?
Again, Supervision CC can help with this. It requires the target to respond, so it increases the throughput for successful transmissions:
The text was updated successfully, but these errors were encountered: