Crash report when broker stops #1004

Jooyyce · 2018-12-14T08:46:07Z

VerneMQ version:

1.2.3

Problem:

When we stop VerneMQ broker (with systemctl stop vernemq.service or vernemq stop commands), we have the following crash report in console.log file:

AAAA-MM-DD HH:MM:SS.XXX [error] <0.482.0>@vmq_cluster:is_ready:69 CRASH REPORT Process <0.482.0> with 0 neighbours crashed with reason: bad argument in call to ets:lookup(vmq_status, ready) in vmq_cluster:is_ready/0 line 68

Each crash report matches one failed Erlang process associated with one MQTT client (<0.482.0> is a Erlang PID). In any case, the broker continues to operate. Only the client's process fails. Other clients are not affected by this problem.

The configuration file is not in question. We can reproduce crash report with default config file of VerneMQ with the following changes:

Allow_anonymous
Tcp_listener = 0.0.0.0:1883 (instead of 127.0.0.1:1883)

The MQTT client seems to be the cause of the error. A process does not exit correctly when the broker stops or disconnects abruptly.

The text was updated successfully, but these errors were encountered:

Jooyyce · 2018-12-14T08:55:46Z

We can highlight the problem on version 1.6.2 of VerneMQ and a simple MQTT client (using Paho MQTT).
The AutoReconnect parameter seems to cause crash reports:

when it is set to false: no crash report
when it is set to true: crash report

We use Wireshark to understand network exchanges and differences between this two types of connections.
A TCP connection is initiated before the MQTT connection request. This connection is kept active until the VerneMQ broker closes (the client receives a reset frame of the connection from the broker).
If we have set AutoReconnect to true: at this moment, the client tries to establish a new TCP connection and a new MQTT connection. The TCP connection is accepted by the broker but not the MQTT connection: no Connect Ack (it is usual... No more broker). The broker being closed must receive the connection request but is unable to find an available cluster or node. So the customer process crashes.

larshesel · 2018-12-14T10:42:44Z

Hi, thanks for the report. Looks like there's a case we're not handling while shutting down VerneMQ. Do you have the output from the crash.log file available as well?

m-manns · 2018-12-14T13:56:50Z

Hi,
I work with @Jooyyce and follow this issue.

You can find the outputs of the crash.log file below: (VerneMQ 1.6.2)

2018-12-13 15:03:45 =CRASH REPORT====
  crasher:
    initial call: vmq_ranch:init/5
    pid: <0.464.0>
    registered_name: []
    exception error: bad argument: [{ets,lookup,[vmq_status,ready],[]},{vmq_cluster,is_ready,0,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_cluster.erl"},{line,69}]},{vmq_reg,register_subscriber,4,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_reg.erl"},{line,121}]},{vmq_mqtt_fsm,check_user,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_fsm.erl"},{line,556}]},{vmq_mqtt_fsm,init,3,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_fsm.erl"},{line,144}]},{vmq_mqtt_pre_init,data_in,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_pre_init.erl"},{line,67}]},{vmq_ranch,handle_message,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_ranch.erl"},{line,174}]},{vmq_ranch,loop_,1,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_ranch.erl"},{line,117}]}]
    ancestors: [<0.332.0>,<0.331.0>,ranch_sup,<0.108.0>]
    message_queue_len: 0
    messages: []
    links: [<0.332.0>,#Port<0.6733>]
    dictionary: [{rand_seed,{#{jump => #Fun<rand.16.15449617>,max => 288230376151711743,next => #Fun<rand.15.15449617>,type => exsplus},[285222742060469046|74841631306680564]}},{max_msg_size,0}]
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 1153
  neighbours:
2018-12-13 15:03:45 =ERROR REPORT====
Ranch listener {{0,0,0,0},1883} had connection process started with vmq_ranch:start_link/4 at <0.464.0> exit with reason: {badarg,[{ets,lookup,[vmq_status,ready],[]},{vmq_cluster,is_ready,0,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_cluster.erl"},{line,69}]},{vmq_reg,register_subscriber,4,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_reg.erl"},{line,121}]},{vmq_mqtt_fsm,check_user,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_fsm.erl"},{line,556}]},{vmq_mqtt_fsm,init,3,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_fsm.erl"},{line,144}]},{vmq_mqtt_pre_init,data_in,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_mqtt_pre_init.erl"},{line,67}]},{vmq_ranch,handle_message,2,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_ranch.erl"},{line,174}]},{vmq_ranch,loop_,1,[{file,"/opt/vernemq/distdir/1.6.2/_build/default/lib/vmq_server/src/vmq_ranch.erl"},{line,117}]}]}

Feel free to ask us for more information. Thanks!

larshesel · 2018-12-14T14:10:30Z

If I understand it correctly I think we'd just have to protect the ets table lookup so this doesn't cause errors to be logged when shutting down the broker, but just for my further understanding, does this cause an operational problem for you?

bestouff · 2018-12-17T08:14:18Z

(I'm also in the team)
We count on VerneMQ for being our secure gateway between our cloud and some client boxes in the wild. Seeing it crashing on operation - even if it's harmless in theory - is not reassuring.

dergraf · 2019-01-07T08:49:52Z

@bestouff @m-manns is this in a clustered setup?

bestouff · 2019-01-07T09:25:32Z

No, for now there's a single broker.

larshesel · 2019-01-07T10:36:08Z

@dergraf I think this makes sense - we don't shut down the listeners before tearing down the server processes.

I think the ideal solution would be to have the listeners (the client ones - we probably don't want to include the internal ones?) be the last thing to be started when the broker starts and the first thing to be stopped when the broker shuts down.

I was wondering if we could split vmq_config into two and have the listener part be it's own child of vmq_server_sup and be the last to be started and the first to be stopped...

larshesel · 2019-01-09T13:54:46Z

We did a partial fix for this by making sure to stop MQTT listeners before starting the shutdown of the rest of VerneMQ. It's still possible that an active connection can trigger the error if it subscribes/unsubscribes while VerneMQ is shutting down, but this edge case should be handled when we move the sessions themselves into the supervision hierarchy together with the listeners.

bestouff · 2019-01-09T14:32:16Z

Ok thanks. In which release will we be able to see it ?

larshesel · 2019-01-09T15:29:47Z

1.6.3 or 1.7.0.

sriram05 · 2019-01-31T13:43:54Z

@larshesel I am also experiencing a similar issue. When there is two broker nodes are running successfully behind the haproxy and if i stop one (vernemq stop) all the client connections are refused and further no more clients could not establish a connection.

larshesel · 2019-01-31T13:51:06Z

@sriram05 please open a separate issue descrbing the exact details of your problem.

larshesel added this to the 1.6.3 milestone Dec 17, 2018

larshesel added the bug label Dec 18, 2018

larshesel self-assigned this Jan 7, 2019

larshesel mentioned this issue Jan 8, 2019

Move listener initialization last in the supervision hierarchy #1027

Merged

larshesel closed this as completed in #1027 Jan 9, 2019

henningw mentioned this issue Jun 28, 2022

Crash report on broker restart #2012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash report when broker stops #1004

Crash report when broker stops #1004

Jooyyce commented Dec 14, 2018

Jooyyce commented Dec 14, 2018

larshesel commented Dec 14, 2018

m-manns commented Dec 14, 2018

larshesel commented Dec 14, 2018

bestouff commented Dec 17, 2018

dergraf commented Jan 7, 2019

bestouff commented Jan 7, 2019

larshesel commented Jan 7, 2019

larshesel commented Jan 9, 2019

bestouff commented Jan 9, 2019

larshesel commented Jan 9, 2019

sriram05 commented Jan 31, 2019

larshesel commented Jan 31, 2019

Crash report when broker stops #1004

Crash report when broker stops #1004

Comments

Jooyyce commented Dec 14, 2018

VerneMQ version:

Problem:

Jooyyce commented Dec 14, 2018

larshesel commented Dec 14, 2018

m-manns commented Dec 14, 2018

larshesel commented Dec 14, 2018

bestouff commented Dec 17, 2018

dergraf commented Jan 7, 2019

bestouff commented Jan 7, 2019

larshesel commented Jan 7, 2019

larshesel commented Jan 9, 2019

bestouff commented Jan 9, 2019

larshesel commented Jan 9, 2019

sriram05 commented Jan 31, 2019

larshesel commented Jan 31, 2019