Skip to content
This repository has been archived by the owner on Oct 26, 2023. It is now read-only.

MQTT to Home Assistant breaks every approx 24 hours #14

Closed
jeroenterheerdt opened this issue May 31, 2018 · 48 comments
Closed

MQTT to Home Assistant breaks every approx 24 hours #14

jeroenterheerdt opened this issue May 31, 2018 · 48 comments

Comments

@jeroenterheerdt
Copy link

After approx. 24 hours bt-mqtt-gateway messages do not make it over to Home Assistant any more. I do not see any errors in the bt-mqtt-gateway service status and a simple restart resolves the issue. It is annoying though. I am using MQTT on the Home Assistant side, which functions fine (other devices are using it). My source for the bt-mqtt-gateway is a miflora sensor that has not moved.... can I help debug somehow?

@jokerigno
Copy link

Same issue.
The service is enable and on but I have to manually re run "sudo ./gateway.py" to refresh datas

@bbbenji
Copy link
Collaborator

bbbenji commented Jun 5, 2018

Try this branch.

https://github.com/zewelor/bt-mqtt-gateway/tree/test_miflora_hanging_fix

@balsman225
Copy link

balsman225 commented Jun 7, 2018

Having the same issue - though my seems to quit long before 24 hours. I will get your new branch loaded up as soon as I can to test.

Update -
Here what is what I got after updating the miflora.py and rerunning

Starting
Added: miflora with 3600 seconds interval
Subscribing to: homeassistant/status with command: update_all
Updating all workers
name 'mac' is not defined
Traceback (most recent call last):
File "./gateway.py", line 37, in
mqtt.publish(_WORKERS_QUEUE.get(block=True).execute())
File "/root/bt-mqtt-gateway/workers_manager.py", line 21, in execute
messages = self._callback(*self._args)
File "/root/bt-mqtt-gateway/workers/miflora.py", line 19, in status_update
ret += self.update_device_state(name, mac)
NameError: name 'mac' is not defined
name 'mac' is not defined
Traceback (most recent call last):
File "./gateway.py", line 37, in
mqtt.publish(_WORKERS_QUEUE.get(block=True).execute())
File "/root/bt-mqtt-gateway/workers_manager.py", line 21, in execute
messages = self._callback(*self._args)
File "/root/bt-mqtt-gateway/workers/miflora.py", line 19, in status_update
ret += self.update_device_state(name, mac)
NameError: name 'mac' is not defined

@jokerigno
Copy link

jokerigno commented Jun 7, 2018

Same here.

Please notice that MAC are define for both miscale and miflora. Miscale is working, myflora no.

pi@raspberrypi:~/bt-mqtt-gateway $ sudo ./gateway.py -d
Starting
Added: miscale with 43200 seconds interval
Added: miflora with 3600 seconds interval
Subscribing to: homeassistant/status with command: update_all
Updating all workers
[{'topic': 'miscale/weight/kg', 'payload': 63.7}]
name 'mac' is not defined
Traceback (most recent call last):
File "./gateway.py", line 37, in
mqtt.publish(_WORKERS_QUEUE.get(block=True).execute())
File "/home/pi/bt-mqtt-gateway/workers_manager.py", line 21, in execute
messages = self._callback(*self._args)
File "/home/pi/bt-mqtt-gateway/workers/miflora.py", line 19, in status_update
ret += self.update_device_state(name, mac)
NameError: name 'mac' is not defined
name 'mac' is not defined
Traceback (most recent call last):
File "./gateway.py", line 37, in
mqtt.publish(_WORKERS_QUEUE.get(block=True).execute())
File "/home/pi/bt-mqtt-gateway/workers_manager.py", line 21, in execute
messages = self._callback(*self._args)
File "/home/pi/bt-mqtt-gateway/workers/miflora.py", line 19, in status_update
ret += self.update_device_state(name, mac)
NameError: name 'mac' is not defined

@bbbenji
Copy link
Collaborator

bbbenji commented Jun 7, 2018 via email

@jokerigno
Copy link

jokerigno commented Jun 7, 2018

Uhm I don't think so. I don't use official mi apps (mi home or flower care) because I love to see plants status directly from HA (in home or remotely).
So I think the issue have some other origin.

BTW with the new branch miflora doesn't work at all. I went back to the original branch.

@bbbenji
Copy link
Collaborator

bbbenji commented Jun 7, 2018 via email

@zewelor
Copy link
Owner

zewelor commented Jun 9, 2018

I've made error on that branch. Its updated ( commit rebased ), please check now. I dont have miflora myself so I cannot verify it. Exception with undeclared mac variable should be fixed.

@jokerigno
Copy link

Updated:

Now i received the first payload, but also an error:

Starting
Added: miscale with 43200 seconds interval
Added: miflora with 3600 seconds interval
Subscribing to: homeassistant/status with command: update_all
Updating all workers
[{'topic': 'miscale/weight/kg', 'payload': 63.7}]
Exception ignored in: <bound method BluetoothInterface.del of <btlewrap.base.BluetoothInterface object at 0x75abe4b0>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/btlewrap/base.py", line 17, in del
self._backend.disconnect()
File "/usr/local/lib/python3.5/dist-packages/btlewrap/bluepy.py", line 26, in _func_wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/btlewrap/bluepy.py", line 60, in disconnect
self._peripheral.disconnect()
AttributeError: 'NoneType' object has no attribute 'disconnect'
[{'topic': 'miflora/vita/temperature', 'payload': 22.0}, {'topic': 'miflora/vita/moisture', 'payload': 15}, {'topic': 'miflora/vita/light', 'payload': 84}, {'topic': 'miflora/vita/conductivity', 'payload': 276}, {'topic': 'miflora/vita/battery', 'payload': 99}]

@balsman225
Copy link

I updated the miflora.py Worker File. It seems to be working for now - I let it run a while using ./gateway.py -d without issue

I will reboot the Pi and let the service start up on its own and run. I'll keep and eye on it and see if if has the same issue as before. Thank you for your efforts

@bbbenji
Copy link
Collaborator

bbbenji commented Jun 9, 2018 via email

@zewelor
Copy link
Owner

zewelor commented Jun 9, 2018

@jokerigno Are you using some rpi ? Be sure to use latests bluez ( compile manually from latests code ). Bt /bluez for rpi 3 / rpi zero seems not very stable sometimes. That error doesn't provide much info, som not sure what might be wrong. If later update seems to get any info maybe its ok. I'm running that gw with eq3 thermostats and miscale for many weeks without any problems, sometimes bt update randomly fails, I think its some problems with bt stack on rpi or devices so I dont care mouch about it. From my experience bt is not most reliazle protocol to get data from devies, thats why I've wrote that gateway to keep it separated from hassio and to ensure only one bt device is updated / conntected at data refresh.

@bbbenji
Copy link
Collaborator

bbbenji commented Jun 10, 2018

@jokerigno

https://scribles.net/updating-bluez-on-raspberry-pi-5-43-to-5-48/

Replace every instance of 5.48 with 5.49

@jokerigno
Copy link

Yes I'm using a Pi3.

I've run apt-get update and the installed bluez 5.48 using your link (thank you).

pi@raspberrypi:~ $ bluetoothctl -v
bluetoothctl: 5.49

But the issue persists:

pi@raspberrypi:~/bt-mqtt-gateway $ sudo ./gateway.py -d
Starting
Added: miflora with 3600 seconds interval
Added: miscale with 43200 seconds interval
Subscribing to: homeassistant/status with command: update_all
Updating all workers
Exception ignored in: <bound method BluetoothInterface.del of <btlewrap.base.BluetoothInterface object at 0x75b09170>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/btlewrap/base.py", line 17, in del
self._backend.disconnect()
File "/usr/local/lib/python3.5/dist-packages/btlewrap/bluepy.py", line 26, in _func_wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/btlewrap/bluepy.py", line 60, in disconnect
self._peripheral.disconnect()
AttributeError: 'NoneType' object has no attribute 'disconnect'
[{'topic': 'miflora/vita/temperature', 'payload': 23.0}, {'topic': 'miflora/vita/moisture', 'payload': 14}, {'topic': 'miflora/vita/light', 'payload': 3785}, {'topic': 'miflora/vita/conductivity', 'payload': 256}, {'topic': 'miflora/vita/battery', 'payload': 99}]
[{'topic': 'miscale/weight/kg', 'payload': 63.7}]

@zewelor
Copy link
Owner

zewelor commented Jun 10, 2018

That error is inside btlewrap lib. So it looks like there is some bug. Maybe some newer / older version of that lib would solve this. Also I'm not sure if its anything to handle, from that output looks like you are raeding values correctly both from miscale and miflora ? If that exception is annoying, https://github.com/ChristianKuehnel/btlewrap looks like a place to debug issue in that lib.

@jeroenterheerdt
Copy link
Author

I am also on a Pi3 - just using a miflora. The data does not come through after approx. 18 hours and then I need to reboot the pi to make it work again. Not sure how I can help debug.

@zewelor
Copy link
Owner

zewelor commented Jun 12, 2018

@jeroenterheerdt Did you tried this branch: https://github.com/zewelor/bt-mqtt-gateway/tree/test_miflora_hanging_fix ?

@jeroenterheerdt
Copy link
Author

@zewelor oops, I missed that. Trying that one now - it seems to run fine (so no errors as above). I will keep running it for a while and see if the same lock-up happens. Thanks!

@jeroenterheerdt
Copy link
Author

@zewelor I am getting this output:

Starting
Added: miflora with 30 seconds interval
Subscribing to: homeassistant/status with command: update_all
Updating all workers
[{'topic': 'miflora/orchid/temperature', 'payload': 23.4}, {'topic': 'miflora/orchid/moisture', 'payload': 80}, {'topic': 'miflora/orchid/light', 'payload': 2449}, {'topic': 'miflora/orchid/conductivity', 'payload': 3488}, {'topic': 'miflora/orchid/battery', 'payload': 96}]
Exception ignored in: <bound method BluetoothInterface.__del__ of <btlewrap.base.BluetoothInterface object at 0x75973490>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/btlewrap/base.py", line 17, in __del__
    self._backend.disconnect()
  File "/usr/local/lib/python3.6/site-packages/btlewrap/bluepy.py", line 26, in _func_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/btlewrap/bluepy.py", line 60, in disconnect
    self._peripheral.disconnect()
AttributeError: 'NoneType' object has no attribute 'disconnect'
[]

So the first update works fine but then there is an exception after which the update never succeeds.

@zewelor
Copy link
Owner

zewelor commented Jun 12, 2018

That backtrace shows only error from btlewrap library. I dont have any experience with it, and I'm not sure what can cause it. Maybe try to raise that issue on that lib github ( https://github.com/ChristianKuehnel/btlewrap ) so we could get more insight about whats wrong or how to change code in miflora worker. Maybe that worker has some wrong usage but I don't know. Its hard to me to debug it without having miflora device. Thats the only worker I'm not using.

@ChristianKuehnel
Copy link

ChristianKuehnel commented Jun 12, 2018

So it looks like you're trying to disconnect from a sensor that you are not connected to. Thus self._peripheral is None and that raises an exception.

What I can do is check in the disconnect method if we are connected at all and only then call the disconnect.

I just pushed a patch for this. Please try with the latest version from master:
ChristianKuehnel/btlewrap@5287452

ChristianKuehnel added a commit to ChristianKuehnel/btlewrap that referenced this issue Jun 12, 2018
@zewelor
Copy link
Owner

zewelor commented Jun 13, 2018

Thanks for looking into it. From what I understand there is already check for being connected or not? ( https://github.com/ChristianKuehnel/btlewrap/blob/cd34f4096360d56d4b64ca5faad45c15bec1ac6c/btlewrap/base.py#L16 ). Looks like it gets disconnected twice ? I've done quick looking at miflora code and yours, and I think there is possible bug here (?):
https://github.com/ChristianKuehnel/btlewrap/blob/master/btlewrap/base.py#L51
and
https://github.com/ChristianKuehnel/btlewrap/blob/master/btlewrap/base.py#L15

miflora usage:
https://github.com/open-homeautomation/miflora/blob/master/miflora/miflora_poller.py#L68

If I understand it correctly there is double disconnect ? on exit and on del ? Still I'm not sure how it would forbid next values update as errors are catched. Maybe there is some weird state left in the bluetooth adapter / wrapper ? Some more cleaning needed after each update ? Worker is rather simple and recreates miflora poller on each update ( https://github.com/zewelor/bt-mqtt-gateway/blob/test_miflora_hanging_fix/workers/miflora.py#L30 ) , @ChristianKuehnel do you see anything wrong here ?

@ChristianKuehnel
Copy link

Well, I wanted to make sure that we do disconnect from the device whenever the program or part of the program ends. this is to make sure that we leave a clean system state in any case. Even if we crash.

In the normal case end should be called when exiting the context manager. If that fails for whatever reason the destructor del should be called anyway. Thus it might be called it multiple times, which is safe if the disconnect function can deal with multiple calls, which it should be able to now. As I check for None before calling disconnect().

@jeroenterheerdt
Copy link
Author

@ChristianKuehnel how would I use the new version of btlewrap?

@hobbypunk90
Copy link
Contributor

Hey,

i try to reproduce the error and a can't.
i set the update_interval to 20 sec and sync the sensor data to the app. I can reproduce the problem on both branches. 😞
it works as long as nothing else tries to connect to the sensor (perhaps after a "long time" we run into the same problem).
After the app connects and tracks the sensor data bt-mqtt-gateway seams to run into the 8 sec update_device_state timeout, i try to set a higher timeout, result is the same.
I get an RuntimeError without any message or additional information.

i tested it with btlewrap 0.0.2.

@hobbypunk90
Copy link
Contributor

i actually tested it with the updated bluepy wrapper from btlewrap. problem is the same, after a first timeout, it looks like we always get timeouts...

@hobbypunk90
Copy link
Contributor

ok, i have a solution for the problem i think, but i'm not sure about it.

@ChristianKuehnel @bbbenji @zewelor
i'm not deep enough in python and br-mqtt-gateway and btlewrap 😁 but it seams the problem is this:
when we get the btlewrap BluePy backend, first the method __init__ is called, with the with the __enter__, this enter acquires a lock. on a normal run, at the end the method __exit__ is called at the end of the with-block, am i right?
But, when we run into a timeout(i think it comes from bt-mqtt-gateway's used scheduler?), the __exit__ method is never called and the lock is never released.
When i add the destructor with the following lines in btlewrap's base.py the problem seams to be resolved:

    def __del__(self):
        if self._lock.locked():
            self._lock.release()

@zewelor
Copy link
Owner

zewelor commented Aug 9, 2018

@hobbypunk90 Thanks for investigation, that sounds like possible cause. Im not python expert either, just learning step by step. I was suspecting something like this, that why I've changed to recreate miflora poller on each update ( c4ddaae ). From my understainging it should workaround this by recreating backend, so no lock should be alive ? If adding del method solves problem, best would be to add issue on https://github.com/ChristianKuehnel/btlewrap . Also did you tried https://github.com/zewelor/bt-mqtt-gateway/compare/test_miflora_hanging_fix branch if it works ? Maybe its fixed there. I dont have miflora device to test it.

@hobbypunk90
Copy link
Contributor

@zewelor
No problem 😉
Yes I tried the branch with the fix and no it don't help. Recreating the backend don't help, the lock is always the same and I think it's correct 😉
If there would a new lock after recreating you can't be sure, that only one backend tries to use the bluetooth hardware.
Yes it should be an issue an btlewrap, but at this moment I mentioned it here, because I'm not sure if there are better solutions for this problem😁

@jeroenterheerdt
Copy link
Author

so... what do we do?

@zewelor
Copy link
Owner

zewelor commented Aug 9, 2018

@hobbypunk90 Well lock is only valid for workers that are using btlewrapper. Its not global so you still can't be sure that only 1 device access bluetooth hardware, that's general problem with current bluetooth support, that's why I've made that gateway to workaround that. Only thing I can see now is to remove that lock from the outside in each worker ? That doesn't seem good. I would first try issue in btlewrap to see @ChristianKuehnel opinion on this case.

@hobbypunk90
Copy link
Contributor

@zewelor you are absolutely right. I only talked about btlewrap 😉
i know, why i prefer an actor model instead of normal multithreading 😅
is it possible that multiple worker run at the same time?

@ChristianKuehnel
Copy link

I had a short look at the python documentation: https://docs.python.org/3/reference/datamodel.html#object.__exit__

This says that 'exit' is also called when an exception is raised.

@hobbypunk90
Copy link
Contributor

@ChristianKuehnel i know, but i tested it, it is not called if the scheduler timeout kills the function call

@ChristianKuehnel
Copy link

ChristianKuehnel commented Aug 9, 2018 via email

@hobbypunk90
Copy link
Contributor

hobbypunk90 commented Aug 9, 2018

@ChristianKuehnel sorry, i'm also not familiar with the scheduler or how it works.
I only find out, that there is the possibility that __exit__ is not called but __del__ is called, so that btlewrap ends in an unclean state.
every next call ends in an exception because the lock is not acquirable (and it waits until the timeout kills it)

@ChristianKuehnel
Copy link

It's usually a bad idea to just kill a function, as it might leave the system in some inconsistend state. So I would advise against doing this, regardless of the library used. And this example shows it.

If you do not want to wait for miflora to try to reconnect in case of problems, why not use the available parameters for that:

poller = MifloraPoller(interface, retries=0)

And I just added a test case, to make sure if an exception is thrown, that it's handles correctely in the with statement.

@zewelor
Copy link
Owner

zewelor commented Aug 9, 2018

@hobbypunk90 Every update from workers is queued into single queue and updated in single thread to solve concurent usage of bt hardware.

Timeout is handled by https://pypi.org/project/interruptingcow/ . Ive added to to be sure some hanging worker wont block entire gateway. I'm using it with thermostats worker and it works stable for past few months. Thermostat lib used some different bluetooth library inside. As for retries im not confident it will solve all edge cases for example with some weird behaving bluetooth firmware on miflora. From my experience IOT bluetooth devices has some weird bluetooth stack bugs etc, and won't count on its stability etc, thats why I wanted some hard timeout to keep gateway updates going.

@hobbypunk90
Copy link
Contributor

hobbypunk90 commented Aug 9, 2018

@zewelor i understand your problem and i would resolve this problem in the same way😉
@jeroenterheerdt it looks like we now hat to wait to a new release of btlewrap, than this problem should be fixed. 🙂 looks like the needed updates are nearly done 😁

@ChristianKuehnel
Copy link

ChristianKuehnel commented Aug 9, 2018

The change is merged in the master branch of miflora, so you can test it from there.

If this fixes the issue we then need to request a new release from miflora.

@hobbypunk90
Copy link
Contributor

@ChristianKuehnel I tested it, it seams to work 🙂 i can block the sensor with the app and after closing the app we can collect data again 👍😁

@hobbypunk90
Copy link
Contributor

i run the last days with the miflora lib from master and the bug don't occur anymore.
it seems to be solved

@zewelor
Copy link
Owner

zewelor commented Aug 20, 2018

Great, closing :D

@jeroenterheerdt
Copy link
Author

I am still having issues after I did a pull of bt-mqtt-gateway. What do I need to do to get the latest version working?
My current version runs for a while and after some hours the log is filled with:

Sep 17 12:15:36 hostname gateway.py[549]: []

Doing a systemctl restart helps, but not ideal.

@hobbypunk90
Copy link
Contributor

Did you update miflora to the version from git?
I have also this "problem" sometimes, but only for 1 or 2 messages

@jeroenterheerdt
Copy link
Author

@hobbypunk90 this is going to be a stupid question, but how would I do that? what do I need to run where to update miflora?

@jeroenterheerdt
Copy link
Author

Found it. I'll leave it here in case anyone is wondering. First remove any installed miflora library that might be installed using pip: pip uninstall miflora. Then, clone the miflora library from git: https://github.com/open-homeautomation/miflora. Then run setup.py for miflora (setup.py install). I feel stupid for not knowing this.

@zewelor
Copy link
Owner

zewelor commented Sep 22, 2018

I've added miflora from GH as miflora worker requirement, so it should install newest version from GH by default.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants