-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
messages blocking a queue and causing runaway message_sender #14862
Comments
@pohutukawa if you're up for it, I'd recommend upgrading to master with Another option is to stop the Zulip server and then use |
Hello @zulip/server-production members, this issue was labeled with the "area: production" label, so you may want to check it out! |
The |
Thanks @timabbott, I'll try the upgrade to Git As for the Markdown DoS scenario: I'd doubt that any of our guys would be doing that, at least intentionally. Having said that, though, is there a way to selectively access the content of a/the problematic message to get to the bottom of the problem? |
Upgrade from Git to master done. Let's see what happens now. Note: |
That seems wrong; is your server actually working? But in any case, the
No, because the message is failing to complete sending, so it's not stored on the server at all. |
The server is working, but the
But the RabbitMQ broker is running:
Yes, but I guess it must be persisted in the queue. Anyway, the server "feels" normal since the upgrade to the Git master version yesterday, even though the inspection via RabbitMQ tools on the queues won't work. Very odd ... |
I would expect this to totally break Zulip, so that's very odd. maybe you somehow have two instances of rabbitmq with different nodenames, only one of which is down? Check |
Just referenced this ticket from elsewhere, and noticed I had not given final feedback. Now (I don't know why) the |
We've had recently the issue arising that the Zulip chat server (on a DigitalOcean instance with 2 vCPUs and 4 GB of RAM) is starting to behave strangely and does not deliver its services reliably any more. It is a fully patched Ubuntu 18.04 instance with an upgraded Zulip 2.1.4 install on the basis of the DigitalOcean droplet (originally with Zulip 2.1.2).
In particular one or more of the
message_sender
processors/workers are going ballistic and start to guzzle up nearly 100% CPU load each. If that happens, some messages are not delivered any more, and on restarts, messages are wildly delivered out of order. The situation does not get better with service restarts (even repeatedly) or entire server (host) restarts.Things I have done over the last week:
message_sender
queue on the RabbitMQ broker.Note: On inital install, we have migrated previous public Slack content to the Zulip server to have better continuity.
Here the runaway
message_sender
worker is visible (worker_num=1
):When this happens, apparently the
message_sender
queue runs fuller and fuller through use:The queue doesn't get emptier, but retains an increasing number of messages that don't seem to get pulled off the queue. In the DigitalOcean load graph, it is visible that at 4:20 pm I had purged the
message_sender
queue, "normalising" things again. Then at about 8:45 am again the problem has re-occurred:How can this problem be solved? We really like Zulip (and don't want to go back to Slack or go to something like MS Teams), but at the moment the system is not really usable due to unreliability. How can I fix things?
The text was updated successfully, but these errors were encountered: