Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfbridge MQTT triggered via MQTT_TOPIC_RFOUT ( {identifier}/rfout/set ) is not queued #422

Closed
xoseperez opened this issue Jan 15, 2018 · 25 comments
Assignees
Milestone

Comments

@xoseperez
Copy link
Owner

Originally reported by: Phil Elson (Bitbucket: pelson, GitHub: pelson)


I've got v1.12.0 on my sonoff rf 433 bridge and am using it as a generic MQTT<>RF interface. Essentially, I have many lights configured with home assistant that, when switched, publish the appropriate rf code via the {identifier}/rfout/set MQTT channel.

This works really well for individual commands, but ends up jamming when there are multiple commands being sent at the approximately the same time. I know there is a Ticker being used to allow repeated commands (defined by passing the {command},{repeats} as the data payload to the topic), but I don't see that there is anything preventing the situation where many commands are submitted becoming an RF jammer.

Steps to reproduce:

1. Send multiple rf commands in a short period of time (<0.5s):

$ mosquitto_pub -t rf-bridge/rfout/set -m "2ABC017C042E500551"
$ mosquitto_pub -t rf-bridge/rfout/set -m "2ABC017C0438501501"
$ mosquitto_pub -t rf-bridge/rfout/set -m "2AB2017C042E501401"

2. Observe that not all commands have an effect

To confirm, modifying the payloads to include the repeat count (e.g. 2ABC017C042E500551,2) does have an effect (more likely to succeed) but also queues them up such that there is a terrible disco effect if the same switch is turned on and off in a short period of time.

Looking at the codebase, would doing something like the following have the desired effect? (Note: completely untested, and with very little understanding of the codebase)

--- a/code/espurna/rfbridge.ino
+++ b/code/espurna/rfbridge.ino
@@ -34,6 +34,7 @@ unsigned char _uartpos = 0;
 unsigned char _learnId = 0;
 bool _learnStatus = true;
 bool _rfbin = false;
+bool rfsend_loop_enabled = false;
 
 typedef struct {
     byte code[RF_MESSAGE_SIZE];
@@ -109,7 +110,12 @@ void _rfbSend(byte * message) {
 void _rfbSend() {
 
     // Check if there is something in the queue
-    if (_rfb_message_queue.size() == 0) return;
+    if (_rfb_message_queue.size() == 0) {
+        rfsend_loop_enabled = false;
+        return;
+    } else {
+        rfsend_loop_enabled = true;
+    }
 
     // Pop the first element
     rfb_message_t message = _rfb_message_queue.front();
@@ -127,6 +133,8 @@ void _rfbSend() {
     // if there are still messages in the queue...
     if (_rfb_message_queue.size() > 0) {
         _rfbTicker.once_ms(RF_SEND_DELAY, _rfbSend);
+    } else {
+        rfsend_loop_enabled = false;
     }
 
 }
@@ -141,8 +149,11 @@ void _rfbSend(byte * code, int times) {
     memcpy(message.code, code, RF_MESSAGE_SIZE);
     message.times = times;
     _rfb_message_queue.push_back(message);
-    _rfbSend();
 
+    // If the queue isn't currently being processed, process it!
+    if (rfsend_loop_enabled != true) {
+        _rfbSend();
+    }
 }

I've been struggling to build my own images of espurna (well, technically it is the upload phase that I've had problems with in platformio...), and have had to rely on the on pre-build ones so far, so might struggle to test any proposed changes.

@xoseperez
Copy link
Owner Author

There is a queue in the rfbridge module. The queue holds the individual messages and sends them one by one every RF_SEND_DELAY (500ms). When you send an MQTT message "2ABC017C042E500551,2" two messages are enqueued and sent with half a second delay. If your device uses the same code for ON and OFF you will certainly see some disco lights...

Also, another "feature" might come to play. There is a flood window for switches. It's meant to prevent physical relays to wear out but it applies to any switch actually. So if you are sending a code that ESPurna identifies with a switch (it's in the RFBridge tab in the web interface) the flood window protection will rule and it will prevent more than 5 changes in less than 3 seconds.

@xoseperez
Copy link
Owner Author

Original comment by Phil Elson (Bitbucket: pelson, GitHub: pelson):


Thanks for your reply.

I can see that repeat messages are queued, but can't see where the queue is used for new messages incoming. The problem I have is that I'm getting 5-10 messages at almost exactly the same time (a group of rf-plugs, controlled by home assistant). In this situation, I haven't paired the espurna switches with the codes, and I'm using the MQTT {{identifier}}/rfout/set directly.

Is there any code that prevents concurrent sending (and therefore turning the device into an rf jammer), as I couldn't see it? I was looking for some locking mechanism in void _rfbSend(byte * code, int times) to prevent multiple calls to void _rfbSend(). implementation I'd assumed that was the point of the Ticker, but perhaps that is simply to queue up repeated codes?

Do you have an interest in having the ability to queue up distinct codes/messages? Without such functionality, I'd basically need to push delays into the code that publishes the mqtt messages, which I'd rather do at the receiving end (i.e. in espurna).

@xoseperez
Copy link
Owner Author

Original comment by Biddster (Bitbucket: biddster, GitHub: biddster):


Hi Phil. I've noticed the same behaviour this week too.

I have 4 433mhz lamps controlled by node red which turn on at precisely the same time. Randomly, several of the lamps won't turn on. If I toggle them individually, they are very reliable indeed.

If I protect the rfout/set in node-red using a semaphore (of one handle) it's very reliable. I send 3 messages (1.5s elapsed) and release the semaphore after 2 seconds. Admittedly, it takes 8 seconds for the 4 lamps to turn on, but that's not a deal breaker as it's super reliable.

@xoseperez
Copy link
Owner Author

@pelson @biddster I'm checking the code. The _rfbSend(char*, int) was not properly enqueuing the messages since it always tries to push the first one it receives... I'm changing the code (available in the dev branch) to make it always enqueue them and start the ticker to send them every RF_SEND_DELAY including the first one. It's in the dev branch now.

Maybe one of you two can give it a try?

@xoseperez
Copy link
Owner Author

xoseperez commented Jan 22, 2018

Original comment by Phil Elson (Bitbucket: pelson, GitHub: pelson):


Thanks to you both for the update. The change in b51ffc2 looks good Xose. I'll do my best to figure out my issues with compiling espurna in the next couple of days to try and test it out.

Thanks again!

@xoseperez
Copy link
Owner Author

Original comment by Biddster (Bitbucket: biddster, GitHub: biddster):


Hi @xoseperez, I've think I've got a pio based build environment up a running thanks to homebrew.

I don't have the flash cables to hand as I used my dad's (who lives a long way away) to get started. I presume I can just build the ota and flash via the web admin?

I've tried pio run -e itead-sonoff-rfbridge-ota and I've got a firmware.bin in ~/git/espurna/code/.pioenvs/itead-sonoff-rfbridge-ota.

@xoseperez
Copy link
Owner Author

If your device is connected to the same network you are you can do :

pio run -e itead-sonoff-rfbridge-ota -t upload --upload-port XXX.XXX.XXX.XXX

Where XXX.XXX.XXX.XXX is your device IP.

@xoseperez
Copy link
Owner Author

Original comment by Biddster (Bitbucket: biddster, GitHub: biddster):


Oh that's slick. I'll try the dev branch and try to get you some meaningful results in the next few days.

@xoseperez
Copy link
Owner Author

Original comment by Biddster (Bitbucket: biddster, GitHub: biddster):


Ok, I've tried the dev branch. It doesn't switch any 433mhz things on/off.

Telnet from 1.12.1 which is working:

[MQTT] Received rfbridge/rfout/set => 348001AE050A155154,3
[RFBRIDGE] Trying to match code 155154
[RFBRIDGE] Sending MESSAGE '348001AE050A155154' 3 time(s)
[RFBRIDGE] Action 0xA0
[RFBRIDGE] Action 0xA0
[RFBRIDGE] Action 0xA0

Telnet from HEAD of dev branch:

[142849] [MQTT] Received rfbridge/rfout/set => 348001AE050A155154,3

It just stops and doesn't issue any of the actions.

@xoseperez
Copy link
Owner Author

The debug log looks good. It's sending the code 3 times as requested and receiving 3 acknowledgements from the EFM0 (that's action 0xA0).

Are you sure that code should trigger something? Is that something in the range of the rfbridge?

@xoseperez
Copy link
Owner Author

Original comment by Biddster (Bitbucket: biddster, GitHub: biddster):


The debug log that looks good is from the official build of 1.12.1. I just put it there for comparative purposes.

The second snippet is the output from the head of the dev branch. It doesn't output anything after the initial received message.

@xoseperez
Copy link
Owner Author

Original comment by Phil Elson (Bitbucket: pelson, GitHub: pelson):


Just to add that I've now tested the change, and am pleased to say that it works very well. I was half expecting to see only one light turned on per RF_SEND_DELAY, but the whole thing is relatively quick (and seems to be quicker than the 500ms setting).

Thanks for fixing this so quickly! Really great addition - I'm starting to feel like I need to write my findings down in a blog.... watch this space. :)

@xoseperez
Copy link
Owner Author

@biddster It's working for me...

[109179] [MQTT] Received RFBRIDGE02/rfout/set => 348001AE050A155154,3
[109184] [RFBRIDGE] Trying to match code 155154
[109191] [RFBRIDGE] Enqueuing MESSAGE '348001AE050A155154' 3 time(s)
[110118] [RFBRIDGE] Action 0xA0
[110618] [RFBRIDGE] Action 0xA0
[111118] [RFBRIDGE] Action 0xA0

Can you update and try again?

@pelson
Copy link
Contributor

pelson commented Jan 24, 2018

I'm happy with the change and think it is probably good to close on my behalf.

Thanks @xoseperez!

@biddster
Copy link

Must be my build system I guess. I'll update and try again.

@xoseperez xoseperez self-assigned this Jan 25, 2018
@biddster
Copy link

I've tried clean builds again, pushing the ota directly to the bridge. Same result. It doesn't execute the actions. I've checked the firmware and the version is updating when I push the ota.

If I go to the admin console and upload your released firmware, it all works again. So I think it's something environment on my machine/build. Somehow.

Given that you and @pelson have had success, it must be something I'm doing.

@xoseperez
Copy link
Owner Author

Could it be that you have TERMINAL_SUPPORT and/or SERIAL_DEBUG_ENABLED to 1 in your build?

@biddster
Copy link

I'll check. All I do is:

$ git pull
$ pio run -e itead-sonoff-rfbridge-ota -t upload

@biddster
Copy link

Nope, I've made sure that both TERMINAL_SUPPORT & SERIAL_DEBUG_ENABLED are 0 and I still have the same problem.

Have I missed some build instructions somewhere? My expectation was that git pull && pio lib update && pio run -e itead-sonoff-rfbridge-ota -t upload should be enough.

@xoseperez
Copy link
Owner Author

OK, let's recap.

  • Do you see any [RFBRIDGE] message in the debug log?
  • Are you positive you are upgrading the RF Bridge under test? The default configuration for environment itead-sonoff-rfbridge-ota is default password (fibonacci) and 192.168.4.1 as IP (which means you are connected to the AP of the device)

Can you paste here the debug log on boot?

@xoseperez xoseperez modified the milestones: 1.12.2, 1.13.0 Jan 28, 2018
@biddster
Copy link

Here's what the admin console reports after the ota:

screen shot 2018-01-28 at 22 54 43

This is all I see in the telnet debug:

[185351] [TELNET] Client #0 connected
[319689] [MQTT] Received rfbridge/rfout/set => 348001AE050A155154,3
[320313] [MQTT] Sending rfbridge//app => ESPURNA (PID 25)
[320313] [MQTT] Sending rfbridge//version => 1.12.2b (PID 26)
[320314] [MQTT] Sending rfbridge//host => RFBridge (PID 27)
[320314] [MQTT] Sending rfbridge//ip => x.x.x.x (PID 28)
[320315] [MQTT] Sending rfbridge//mac => xx:xx:xx:xx:xx:xx (PID 29)
[320315] [MQTT] Sending rfbridge//rssi => -54 (PID 30)
[320315] [MQTT] Sending rfbridge//uptime => 320 (PID 31)
[320316] [MQTT] Sending rfbridge//datetime => 2018-01-28 23:56:33 (PID 32)
[320316] [MQTT] Sending rfbridge//freeheap => 20528 (PID 33)
[320316] [MQTT] Sending rfbridge//relay/0 => 0 (PID 34)
[320317] [MQTT] Sending rfbridge//relay/1 => 0 (PID 35)
[320317] [MQTT] Sending rfbridge//relay/2 => 0 (PID 36)
[320317] [MQTT] Sending rfbridge//relay/3 => 0 (PID 37)
[320318] [MQTT] Sending rfbridge//relay/4 => 0 (PID 38)
[320318] [MQTT] Sending rfbridge//relay/5 => 0 (PID 39)
[320318] [MQTT] Sending rfbridge//relay/6 => 0 (PID 40)
[320318] [MQTT] Sending rfbridge//relay/7 => 0 (PID 41)
[320319] [MQTT] Sending rfbridge//vcc => 3160 (PID 42)
[320320] [MQTT] Sending rfbridge//status => 1 (PID 43)
[320384] [MQTT] Publish ACK for PID 25
[322226] [MQTT] Publish ACK for PID 26
[325220] [MQTT] Publish ACK for PID 27
[325221] [MQTT] Publish ACK for PID 28
[325221] [MQTT] Publish ACK for PID 29
[325221] [MQTT] Publish ACK for PID 30
[325221] [MQTT] Publish ACK for PID 31
[325221] [MQTT] Publish ACK for PID 32
[325222] [MQTT] Publish ACK for PID 33
[325222] [MQTT] Publish ACK for PID 34
[325222] [MQTT] Publish ACK for PID 35
[325222] [MQTT] Publish ACK for PID 36
[325222] [MQTT] Publish ACK for PID 37
[325222] [MQTT] Publish ACK for PID 38
[325222] [MQTT] Publish ACK for PID 39
[325223] [MQTT] Publish ACK for PID 40
[325223] [MQTT] Publish ACK for PID 41
[325223] [MQTT] Publish ACK for PID 42
[325223] [MQTT] Publish ACK for PID 43
[340784] [MQTT] Received rfbridge/rfout/set => 348001AE050A155154,3

Is there another log I'm not aware of which contains the boot log?

@xoseperez
Copy link
Owner Author

Looks like you have an extra slash at the end of the MQTT root topic. Maybe you have "{identifier}/" there? Try removing that slash.

@biddster
Copy link

That trailing slash was it:

[177566] [TELNET] Client #0 connected
[181584] [MQTT] Received rfbridge/rfout/set => 348001AE050A154554,3
[181590] [RFBRIDGE] Trying to match code 154554
[181597] [RFBRIDGE] Enqueuing MESSAGE '348001AE050A154554' 3 time(s)
[182522] [RFBRIDGE] Action 0xA0
[183024] [RFBRIDGE] Action 0xA0
[187166] [MQTT] Received rfbridge/rfout/set => 348001AE050A154555,3
[187172] [RFBRIDGE] Trying to match code 154555
[187179] [RFBRIDGE] Enqueuing MESSAGE '348001AE050A154555' 3 time(s)
[188105] [RFBRIDGE] Action 0xA0
[188605] [RFBRIDGE] Action 0xA0
[189115] [RFBRIDGE] Action 0xA0
[194879] [MQTT] Received rfbridge/rfout/set => 348001AE050A154554,3
[194885] [RFBRIDGE] Trying to match code 154554
[194892] [RFBRIDGE] Enqueuing MESSAGE '348001AE050A154554' 3 time(s)
[195819] [RFBRIDGE] Action 0xA0
[196319] [RFBRIDGE] Action 0xA0
[196820] [RFBRIDGE] Action 0xA0
[199400] [MQTT] Received rfbridge/rfout/set => 348001AE050A154555,3
[199406] [RFBRIDGE] Trying to match code 154555
[199413] [RFBRIDGE] Enqueuing MESSAGE '348001AE050A154555' 3 time(s)
[200342] [RFBRIDGE] Action 0xA0
[201340] [RFBRIDGE] Action 0xA0

I don't always see 3 actions logged, see the snippet above.

@xoseperez
Copy link
Owner Author

I'm adding some checking code to prevent that from happening again and fixing the hints on the web UI. Yes, sometimes it skips a message, increasing the RF_SEND_DELAY setting will probably fix that but it could also increase latency.

@biddster
Copy link

Many thanks for all the assistance @xoseperez

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants