Updating docker stack crashes cluster #1783

regnerisch · 2024-06-11T14:05:25Z

Description

We run a 3 node docker swarm cluster with typesense on each node. It's configured as discribed in this guide: https://typesense.org/docs/guide/docker-swarm-high-availability.html

After updating the typesense configurations e.g. adding new labels to the services, I run docker stack deploy -c docker-stack.yaml my_stack. Looking inside the typesense container I get the following errors:

W20240611 13:49:50.319334   166 raft_server.cpp:721] Multi-node with no leader: refusing to reset peers.
I20240611 13:49:55.816380   205 node.cpp:1579] node default_group:10.0.2.163:8107:8108 term 2 start pre_vote
W20240611 13:49:55.816478   205 node.cpp:1589] node default_group:10.0.2.163:8107:8108 can't do pre_vote as it is not in 10.0.2.51:8107:8108,10.0.2.57:8107:8108,10.0.2.63:8107:8108
I20240611 13:50:00.321345   166 raft_server.cpp:693] Term: 2, pending_queue: 0, last_index: 3, committed: 0, known_applied: 0, applying: 0, pending_writes: 0, queued_writes: 0, local_sequence: 0

The nodes file looks like: typesense-1:8107:8108,typesense-2:8107:8108,typesense-3:8107:8108

It seems that typesense resolves the hostnames and caches the resolved ip addresses. This leeds to a cluster crash after updating the stack as the container may get another IP. I did specify 10.0.2.0/24 as subnet and would expect that typesense will find the new ip addresses. Especially because i specified the hostnames and not an static ip.

Steps to reproduce

Setup Docker Swarm (at least 2 nodes)
Setup typesense as described at https://typesense.org/docs/guide/docker-swarm-high-availability.html
Change some services config inside docker-stack.yaml e.g. adding labels or add an depends_on to force docker stack deploy to create a new container
Run docker stack deploy -c docker-stack.yaml {stack_name}
Watch logs docker logs -f {container_id}

Expected Behavior

Cluster will stay alive.

Actual Behavior

Cluster crashes.

Metadata

Typesense Version: 26

OS: Fedora 39 (Docker version 26.1.4, build 5650f9b)

The text was updated successfully, but these errors were encountered:

kishorenc · 2024-06-12T07:28:54Z

It seems that typesense resolves the hostnames and caches the resolved ip addresses.

We don't cache the IP -- we resolve it periodically when we check for changes to the nodes file.

You have to do rolling rotation of the nodes with enough time for the cluster to recover because a 3-node raft cluster requires atleast 2 nodes to be up for healthy functioning.

Otherwise, you can add the reset-peers-on-error flag to make the cluster hard reset clustering state if multiple containers have been rotated at one go.

regnerisch · 2024-06-12T12:11:34Z

Thanks, if --reset-peers-on-error is enabled, can it happen to loose all the data as only one container writes it? Or will it keep the data and "recover" it?

kishorenc · 2024-06-12T15:33:35Z

We don't recommend using that flag if you have high volume of writes. The best way to do rotation is container by container. Some people have mostly static clusters whether they don't mind using this flag for fast recovery.

regnerisch · 2024-06-13T08:51:19Z

Thank you very much!

regnerisch changed the title ~~Updating docker stack crashes cluster config~~ Updating docker stack crashes cluster Jun 11, 2024

regnerisch closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating docker stack crashes cluster #1783

Updating docker stack crashes cluster #1783

regnerisch commented Jun 11, 2024 •

edited

Loading

kishorenc commented Jun 12, 2024

regnerisch commented Jun 12, 2024

kishorenc commented Jun 12, 2024

regnerisch commented Jun 13, 2024

Updating docker stack crashes cluster #1783

Updating docker stack crashes cluster #1783

Comments

regnerisch commented Jun 11, 2024 • edited Loading

Description

Steps to reproduce

Expected Behavior

Actual Behavior

Metadata

kishorenc commented Jun 12, 2024

regnerisch commented Jun 12, 2024

kishorenc commented Jun 12, 2024

regnerisch commented Jun 13, 2024

regnerisch commented Jun 11, 2024 •

edited

Loading