Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: restart loop of current master #2362

Closed
jakubgs opened this issue Jan 18, 2024 · 7 comments · Fixed by #2363
Closed

bug: restart loop of current master #2362

jakubgs opened this issue Jan 18, 2024 · 7 comments · Fixed by #2363
Assignees
Labels
bug Something isn't working status-waku-integ All issues relating to the Status Waku integration.

Comments

@jakubgs
Copy link
Contributor

jakubgs commented Jan 18, 2024

Problem

Current builds of master get stuck in a restart loop on status.prod nodes.

The current setup involves PostgreSQL database, for flags see this.

Llogs

Here's the loop in logs:

DBG 1/7 Setting up storage                     topics="wakunode main" tid=1 file=wakunode2.nim:63
DBG 2/7 Retrieve dynamic bootstrap nodes       topics="wakunode main" tid=1 file=wakunode2.nim:71
DBG Discovering nodes using Waku DNS discovery topics="wakunode app" tid=1 file=app.nim:192 url=enrtree://AL65EKLJAUXKKPG43HVTML5EFFWEZ7L4LOKTLZCLJASG4DSESQZEC@prod.status.nodes.status.im
DBG init WakuDnsDiscovery                      topics="wakunode app" tid=1 file=waku_dnsdisc.nim:93 locationUrl=enrtree://AL65EKLJAUXKKPG43HVTML5EFFWEZ7L4LOKTLZCLJASG4DSESQZEC@prod.status.nodes.status.im
DBG init success                               topics="wakunode app" tid=1 file=waku_dnsdisc.nim:99
INF Finding peers using Waku DNS discovery     topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:52
INF Successfully discovered ENR                topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:66 count=6
INF Successfully discovered nodes              topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:83 count=6
DBG 3/7 Initializing node                      topics="wakunode main" tid=1 file=wakunode2.nim:78
INF Initializing networking                    tid=1 file=waku_node.nim:143 addrs="@[/dns4/node-02.do-ams3.status.prod.statusim.net/tcp/30303, /dns4/node-02.do-ams3.status.prod.statusim.net/tcp/443/wss]"
DBG no relay sharding information, peer filtering disabled topics="waku discv5" tid=1 file=waku_discv5.nim:62
DBG 4/7 Mounting protocols                     topics="wakunode main" tid=1 file=wakunode2.nim:85
INF Created WakuMetadata protocol              topics="waku node" tid=1 file=protocol.nim:115 clusterId=0
DBG Setting max message size                   topics="wakunode app" tid=1 file=app.nim:448 num_bytes=153600
INF mounting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:377
INF relay mounted successfully                 topics="waku node" tid=1 file=waku_node.nim:396
DBG subscribe                                  topics="waku node" tid=1 file=waku_node.nim:269 pubsubTopic=/waku/2/default-waku/proto
DBG subscribe                                  topics="waku relay" tid=1 file=protocol.nim:222 pubsubTopic=/waku/2/default-waku/proto
INF mounting rendezvous discovery protocol     topics="waku node" tid=1 file=waku_node.nim:1079
INF mounting libp2p ping protocol              topics="waku node" tid=1 file=waku_node.nim:1037
ERR 4/7 Mounting protocols failed              topics="wakunode main" tid=1 file=wakunode2.nim:89 error="failed to setup archive driver: Postgres has been configured but not been compiled. Check compiler definitions."

Then it just restarts:

admin@node-02.do-ams3.status.prod:/docker/nim-waku % d
CONTAINER ID   NAMES      IMAGE                                                     CREATED         STATUS
364fa9231fb2   nim-waku   harbor.status.im/wakuorg/nwaku:deploy-status-prod-trace   8 minutes ago   Restarting (1) 54 seconds ago

nwaku version/commit hash

8a9fad29

@jakubgs jakubgs added the bug Something isn't working label Jan 18, 2024
@jakubgs
Copy link
Contributor Author

jakubgs commented Jan 18, 2024

Interestingly when I build v0.23.0 with -d:chronicles_enabled_topics:"waku\ node":TRACE it also loops:

INF Finding peers using Waku DNS discovery     topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:51
INF Successfully discovered ENR                topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:65 count=6
INF Successfully discovered nodes              topics="waku dnsdisc" tid=1 file=waku_dnsdisc.nim:82 count=6
DBG no relay sharding information, peer filtering disabled topics="waku discv5" tid=1 file=waku_discv5.nim:62
INF Created WakuMetadata protocol              topics="waku node" tid=1 file=protocol.nim:115 clusterId=0
INF mounting relay protocol                    topics="waku node" tid=1 file=waku_node.nim:378
INF relay mounted successfully                 topics="waku node" tid=1 file=waku_node.nim:397
DBG subscribe                                  topics="waku node" tid=1 file=waku_node.nim:271 pubsubTopic=/waku/2/default-waku/proto
DBG subscribe                                  topics="waku relay" tid=1 file=protocol.nim:219 pubsubTopic=/waku/2/default-waku/proto
INF mounting rendezvous discovery protocol     topics="waku node" tid=1 file=waku_node.nim:1062
INF mounting libp2p ping protocol              topics="waku node" tid=1 file=waku_node.nim:1020

@jakubgs
Copy link
Contributor Author

jakubgs commented Jan 18, 2024

Known to work fine on v0.23.1-rc.0/dcd93339.

@gabrielmer gabrielmer self-assigned this Jan 18, 2024
@gabrielmer
Copy link
Contributor

We generally see that error when the image is not being compiled with the -d:postgres flag.

If I use the PRs image instead (quay.io/wakuorg/nwaku-pr:2355) the node seems to come up properly.

Where can we see the compilation flags used for harbor.status.im/wakuorg/nwaku:deploy-status-prod-trace?

@chair28980 chair28980 added the status-waku-integ All issues relating to the Status Waku integration. label Jan 19, 2024
@apentori
Copy link
Contributor

The jenkins parameters for NIMFLAGS is -d:disableMarchNative -d:chronicles_colors:none -d:insecure -d:chronicles_enabled_topics:"waku\ node":TRACE

Adding the flag -d:postgres to build https://ci.infra.status.im/job/nim-waku/job/deploy-status-prod-trace/6

@gabrielmer
Copy link
Contributor

Issue seems to be fixed after adding the flag. Closing :)

@apentori
Copy link
Contributor

The image is starting fine.

There are some error due to duplicate insert with the same messageIndex ( due to the architecture of 2 nodes saving on the same db)

TRC 2024-01-19 10:59:25.410+00:00 failed to insert message                   topics="waku archive" tid=1 file=archive.nim:119 err="error in runStmt: error in dbConnQueryPrepared calling waitQueryToFinish: error in query: ERROR:  duplicate key value violates unique constraint \"messageindex\"\nDETAIL:  Key (messagehash)=(48d9ba522308a1d3725b5e1d08783e91f3395e7578849b448f53b78ecd1c1916) already exists.\n

I will have to filter this log in Logstash

One of the interesting line seems to be the follwing one however the messageHash isn't readable.

TRC 2024-01-19 10:59:25.817+00:00 handling message                           topics="waku archive" tid=1 file=archive.nim:114 pubsubTopic=/waku/2/default-waku/proto contentTopic=/waku/1/0x73cfe5ea/rfc26 timestamp=1705661965681370155 digest="(data: [168, 205, 226, 148, 152, 192, 146, 7, 4, 184, 237, 224, 149, 155, 59, 230, 218, 201, 181, 21, 156, 176, 82, 6, 129, 35, 76, 133, 164, 81, 60, 133])" messageHash="[95, 170, 23, 190, 182, 78, 76, 195, 184, 41, 132, 11, 131, 70, 248, 93, 162, 191, 114, 2, 200, 4, 239, 1, 156, 153, 59, 36, 184, 210, 103, 196]"

@gabrielmer could you update the log to print the hash in a readable format please ?

@gabrielmer
Copy link
Contributor

Created draft PR with the logging fix. I haven't been able to reproduce the log in my local setup though.

@apentori can you please try the image quay.io/wakuorg/nwaku-pr:2363 to confirm that log format is ok before merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working status-waku-integ All issues relating to the Status Waku integration.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants