-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: nwaku node gets blocked because the postgres database is blocked #2783
Comments
In case it's unclear why the node might be stuck, note that it generally is possible to use GDB on the binary - i.e. attach to the running (stalled) process and use |
Thanks for the comment! I'll do that next time. For now, I started to stress the |
After further analysis with @NagyZoltanPeter, 🙌 , we run the following command:
And saw that there were two processes with
Then, from within the database docker container, we run
With that, we concluded that we need to strengthen the logic around partition creation so that multiple nodes can create partitions concurrently without blocking each other. Another option would be to just have a separate app aimed at database maintenance: partition creation; database migrations; etc, but at first, we'll try to enhance the current approach. |
I'm thinking that your PR #2784 should stop the issue from happening! |
It is so, but instead, a fail to lock does not mean an error, just a sign an other node attemps to create the necessary partition(s). |
This is still happening nowadays in |
Problem
While QA was working on
shards.test
they realized that the Store queries didn't work.After analyzing further, the whole
nwaku
node (store-01.do-ams3.shards.test.status.im) was completely stopped surely because the postgres database was blocked.We (@cammellos , @richard-ramos , @Ivansete-status ) went to the Postgres server (store-db-01.do-ams3.shards.test.)
As a curiosity, even if we tried to perform any simple query, i.e.
SELECT * from messages limit 1;
, it got blocked too.We managed to make the
nwaku
node progress again after killing the blockedSELECT
queries.Impact
The node gets completely blocked
To reproduce
We don't have a clear procedure to replicate the issue but if happened in
shards.test
running versionv0.28.1
and the node getting new messages and regular store requests.Screenshots/logs
Example of evidence about how the
nwaku
node gets blocked:nwaku version/commit hash
v0.28.0-2-ga96a6b94
Additional context
Discord thread: https://discord.com/channels/1110799176264056863/1246045563833815080
The text was updated successfully, but these errors were encountered: