Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection brokes after publishing in batch #41

Closed
cat4anna opened this issue Jul 24, 2022 · 11 comments
Closed

Connection brokes after publishing in batch #41

cat4anna opened this issue Jul 24, 2022 · 11 comments

Comments

@cat4anna
Copy link

Hi again ;)
Thanks for quick fix for #40 it makes working with docker much easier.

I bumped into another problem while using copas connector, but I'm not sure if it's stricly related to it.
I'm getting closed connection to broken after publishing ~150 messages in batch. Adding "copas.sleep(0)" after each publish solved issue completly. Doing it is good enough for me, but i wanted to let you know about problem.

@xHasKx
Copy link
Owner

xHasKx commented Jul 24, 2022

Hi @pgrabas ,
What is your lua and copas version? Which broker do you use? What QoS for publish?
Maybe you can check the mqtt protocol traffic with tcpdump/wireshark?

@cat4anna
Copy link
Author

I'm using luajit 5.1, copas 3.0.0-3, luasocket 3.0.0-1. Mqtt protocol 3.3.
All messages are qos=0 and retain=false.
Broker is mosqutto 2.0.13-1 running on openwrt 21.02.1.
Issue does not seem to reproduce using interpretted lua 5.1

pcap: mqtt.zip
script: mqtt_batch.zip

@Tieske
Copy link
Contributor

Tieske commented Jul 25, 2022

I cannot reproduce this. With or without the sleep, or with a bigger loop (up to 1000).

That said, I have seen this problem, also copas 3, luasocket 3-beta, and my own branch of luamqtt.

@xHasKx
Copy link
Owner

xHasKx commented Jul 25, 2022

@pgrabas,
I parsed the TCP stream of your pcap file with a Lua script and it's encoded well, so there is no MQTT protocol violation in the TCP traffic. Wireshark shows the same.

But after opening your pcap file in the Wireshark (GUI) it shows the last publish frame no.68 as a grey color because it has the TCP FIN flag set. It looks like a disconnection, which was caused by the client-side, not by the server.

And when luamqtt operates with TCP socket through luasocket module, it's not mixing sending with disconnection (I'm not sure there is a way at all to send data with such FIN flag with luasocket methods). So the next levels above luamqtt should be checked for the issue - copas and luasocket.

Maybe you can try to reproduce such a bug in your environment with luasocket-only sync mode like in the example here - https://github.com/xHasKx/luamqtt/blob/master/examples/sync.lua ?

I also suggest you check the detailed logs of your broker, maybe some limits are hit on its side, just in case...

@Tieske
Copy link
Contributor

Tieske commented Jul 25, 2022

I noticed the flag as well, and indeed, LuaSocket doesn't allow to set such flags afaik. Copas 3 is definitely a suspect, since it had a lot of changes.

I retried reproducing it, including with my own code, but again failed. But currently on a VPN to the broker, so maybe it is related to latencies. Since in your case adding a sleep made it go away as well. I'll try and reproduce when on the local network with my broker again.

@Tieske
Copy link
Contributor

Tieske commented Jul 28, 2022

@pgrabas with Copas 3.0.0, can you try again, but with this line disabled:

https://github.com/lunarmodules/copas/blob/3.0.0/src/copas.lua#L909

To disable the "autoclose" feature.

@Tieske
Copy link
Contributor

Tieske commented Jul 29, 2022

A fix to Copas was merged: lunarmodules/copas#125 so you can try the master branch.
This definitely fixed my issue (I checked with wireshark and had the exact same issue in my capture).

Hopefully I'll be releasing a new Copas version later today.

@Tieske
Copy link
Contributor

Tieske commented Jul 31, 2022

FYI; Copas 4 has been released.

@cat4anna
Copy link
Author

@Tieske I'm amazed that you were able to find&fix it. With copas 4.0.0 problem does not reproduce for me any longer.

@xHasKx
Copy link
Owner

xHasKx commented Jul 31, 2022

Thanks, @Tieske , thanks @pgrabas

@xHasKx xHasKx closed this as completed Jul 31, 2022
@Tieske
Copy link
Contributor

Tieske commented Aug 1, 2022

Lol, the fix was simple. It’s just that it took 2 days and 400 lines of debug code to find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants