-
-
Notifications
You must be signed in to change notification settings - Fork 394
-
-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processing a received PUBCOMP after client reconnect crashes the session #762
Labels
Comments
dergraf
changed the title
A QoS2 publish (from broker to cluster)
Processing a received PUBCOMP after client reconnect crashes the session
Jul 10, 2018
Fixed in PR #763 |
dergraf
added a commit
to dergraf/vernemq
that referenced
this issue
Jul 23, 2018
- Store the *parsed* PUBREL inside the waiting acks instead of the serialized binary. This enables to reuse the same functionality for handling retried PUBRELs, without caring if it is retried because of a timeout or because of a reconnect. Cleanup places which relied on the serialized binary. - Fixing above enabled the possibility to implement a small performance improvement where we could get rid of using `length()` on the outgoing batch of frames for incrementing the `mqtt_publishes_sent` counter. - Improved `vmq_publish_SUITE.erl` to test that a client responds to PINGREQs before closing the Socket. This validates that 1st. the client hasn't crashed in the meantime (e.g. because not being able to handle the PUBCOMP, see bug vernemq#762), and 2nd that the last received frame on this socket is the PINGRESP. - The improvement above discovered an edgecase with a PUBREL frame that has been retried too fast, indicating a bug in the retry mechanism. This has been fixed with tagging the message IDs inside the retry queue. As a result the retry mechanism can differentiate between retrying a PUBLISH and a PUBREL. Without this fix a retried, already acked PUBLISH, would have retried an unacked PUBREL right away. - Removing several function clauses that were forgotten when fixing vernemq#750
dergraf
added a commit
that referenced
this issue
Jul 23, 2018
…ry mechanism (#763) * Fix QoS2 retry for reconnecting clients, and retry mechanism improvement - Store the *parsed* PUBREL inside the waiting acks instead of the serialized binary. This enables to reuse the same functionality for handling retried PUBRELs, without caring if it is retried because of a timeout or because of a reconnect. Cleanup places which relied on the serialized binary. - Fixing above enabled the possibility to implement a small performance improvement where we could get rid of using `length()` on the outgoing batch of frames for incrementing the `mqtt_publishes_sent` counter. - Improved `vmq_publish_SUITE.erl` to test that a client responds to PINGREQs before closing the Socket. This validates that 1st. the client hasn't crashed in the meantime (e.g. because not being able to handle the PUBCOMP, see bug #762), and 2nd that the last received frame on this socket is the PINGRESP. - The improvement above discovered an edgecase with a PUBREL frame that has been retried too fast, indicating a bug in the retry mechanism. This has been fixed with tagging the message IDs inside the retry queue. As a result the retry mechanism can differentiate between retrying a PUBLISH and a PUBREL. Without this fix a retried, already acked PUBLISH, would have retried an unacked PUBREL right away. - Removing several function clauses that were forgotten when fixing #750 * rename deliver_bin to deliver_pubrel
dergraf
added a commit
to dergraf/vernemq
that referenced
this issue
Aug 21, 2018
…ry mechanism (vernemq#763) * Fix QoS2 retry for reconnecting clients, and retry mechanism improvement - Store the *parsed* PUBREL inside the waiting acks instead of the serialized binary. This enables to reuse the same functionality for handling retried PUBRELs, without caring if it is retried because of a timeout or because of a reconnect. Cleanup places which relied on the serialized binary. - Fixing above enabled the possibility to implement a small performance improvement where we could get rid of using `length()` on the outgoing batch of frames for incrementing the `mqtt_publishes_sent` counter. - Improved `vmq_publish_SUITE.erl` to test that a client responds to PINGREQs before closing the Socket. This validates that 1st. the client hasn't crashed in the meantime (e.g. because not being able to handle the PUBCOMP, see bug vernemq#762), and 2nd that the last received frame on this socket is the PINGRESP. - The improvement above discovered an edgecase with a PUBREL frame that has been retried too fast, indicating a bug in the retry mechanism. This has been fixed with tagging the message IDs inside the retry queue. As a result the retry mechanism can differentiate between retrying a PUBLISH and a PUBREL. Without this fix a retried, already acked PUBLISH, would have retried an unacked PUBREL right away. - Removing several function clauses that were forgotten when fixing vernemq#750 * rename deliver_bin to deliver_pubrel
dergraf
added a commit
that referenced
this issue
Aug 21, 2018
…ry mechanism (#763) * Fix QoS2 retry for reconnecting clients, and retry mechanism improvement - Store the *parsed* PUBREL inside the waiting acks instead of the serialized binary. This enables to reuse the same functionality for handling retried PUBRELs, without caring if it is retried because of a timeout or because of a reconnect. Cleanup places which relied on the serialized binary. - Fixing above enabled the possibility to implement a small performance improvement where we could get rid of using `length()` on the outgoing batch of frames for incrementing the `mqtt_publishes_sent` counter. - Improved `vmq_publish_SUITE.erl` to test that a client responds to PINGREQs before closing the Socket. This validates that 1st. the client hasn't crashed in the meantime (e.g. because not being able to handle the PUBCOMP, see bug #762), and 2nd that the last received frame on this socket is the PINGRESP. - The improvement above discovered an edgecase with a PUBREL frame that has been retried too fast, indicating a bug in the retry mechanism. This has been fixed with tagging the message IDs inside the retry queue. As a result the retry mechanism can differentiate between retrying a PUBLISH and a PUBREL. Without this fix a retried, already acked PUBLISH, would have retried an unacked PUBREL right away. - Removing several function clauses that were forgotten when fixing #750 * rename deliver_bin to deliver_pubrel
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Expected behavior
A client dies, but it has unacked PUBREL frames. Once the client reconnects VerneMQ resends the PUBREL and expects a PUBCOMP. When VerneMQ processes the PUBCOMP it removes the PUBREL from the "waiting Acks".
Actual behaviour
A client dies, but it has unacked PUBREL frames. Once the client reconnects VerneMQ resends the PUBREL and expects a PUBCOMP. When VerneMQ processes the PUBCOMP the session crashes.
This case should be covered in https://github.com/erlio/vernemq/blob/master/apps/vmq_server/test/vmq_publish_SUITE.erl#L177, however the testcase doesn't check that the session is actually still alive after sending the PUBCOMP.
The Crash log:
The text was updated successfully, but these errors were encountered: