-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: wakunode2 systemd unit restarts about 10-15x per day #2173
Comments
@vpavlin as discussed this am I've upgraded the nodes to
|
Thanks for verifying! We'll start digging then:) Can you share more logs before the error kicks in? |
@vpavlin here's more logs - everything seems completely healthy before the failure:
|
I am not 100% sure, but my hunch would be a potential race condition in how we use Nim
Becuase we could be potentially reading from, adding to and deleting items from the @jm-clius Any thoughts? Is it possible that multiple filter requests get handled in parallel? Should we use SharedTable instead? |
Regarding the systemd restarts, I consulted it with a friend from systemd team and he suggested that maybe we do not set He say it is possible that the socket then stays in some locked state in kernel for some time, which results in that error when systemd tries to restart it. Given where the error happens, I'd say it is due to DiscV5 port. @jm-clius any thoughts who would be the right person to reach out to? |
Morning all! The next error...
...happens due to an assert failure in: IMHO, the most plausible candidate to cause this issue is the next point: nwaku/waku/node/peer_manager/peer_store/waku_peer_storage.nim Lines 55 to 59 in 2cb0989
Given this problem is very difficult to replicate from our side, I suggest creating a tailored image with additional logs in order to pinpoint the precise place where this happens. |
Problem
We are seeing a lot of restarts for the systemd unit running
wakunode2
with version0.20.0
To reproduce
Our
systemd
unit definition:Expected behavior
A clear and concise description of what you expected to happen.
Screenshots/logs
wakunode2
runs for a while, is connected to peers and eventually starts failing with the following:Additionally the
systemd
unit tries restarting and quite often is fails restarting with the following:The text was updated successfully, but these errors were encountered: