Skip to content
This repository has been archived by the owner on Feb 18, 2021. It is now read-only.

After failover, master and slave both alive #22

Open
adnklauser opened this issue Nov 19, 2018 · 5 comments
Open

After failover, master and slave both alive #22

adnklauser opened this issue Nov 19, 2018 · 5 comments

Comments

@adnklauser
Copy link

Scenario
Restart one master node (kubectl delete pod xxx) to simulate a service interruption.

Expected behaviour
Slave becomes active immediately and when the master is back up (restarted by k8s) and synchronized, there is still only one active ActiveMQ artemis instance for that master/slave pair.

Actual behaviour
Slave becomes active immediately (✔️), but after k8s restarts the master pod, it, too, is considered active (❌), at least from the perspective of k8s (1/1 pods). The consequence of this is that k8s would route requests to both master and slave (via the service DNS)

Additional information
I haven't really tested much beyond this observation. I don't know if the master node would have actually responded to requests. But I find it a bit weird that the system doesn't return to the original state after a failover.

The Artemis HA documentation suggests to use <allow-failback>true</allow-failback> on the slave and <check-for-live-server>true</check-for-live-server> on the master. I must confess, I don't understand why the chart explicitly configures the opposite, but my experience with Artemis is very limited so far.

@DanSalt
Copy link

DanSalt commented Jun 5, 2019

Hi @adnklauser

Yes - you're right. I've just checked the latest version of our charts we have locally, and we do indeed have those flags you mention set. The other main difference between our charts and the ones here is in the shared configmap, where it sets:
<address>jms</address>
Which will only distribute messages starting with 'jms' - which for us didn't include everything. The default in these charts should probably be blank as a generic case (to include everything).

I'll work on a PR for these charts. Hope this helps!

@andrusstrockiy
Copy link

We face the same issue when tried to run above chart without any persistence for live (master) node

Even with :
true on the slave and trueOn the masters.
After restart New master pod starts from scratch and forms new cluster (apparently without persistence data dir configuration of broxer.xml is ignored completely )

Hence , you get a split brain with running two masters :

  • master from old cluster formation ( slave0 took his role)
  • and just formed new cluster from restart of master0
    Conclusion:
  1. Don’t try that chart without persistence storage in your cluster even with above options.
  2. That’s an Artemis problem checked with 2.10
    To reproduce on your local setup just Form a cluster. Then remove the data dir for live cluster0 with available broker.xml start a new live server

Workaround:
In case slplitbrain recreate slave once again and keep an eye on your formation

@chandras-xl
Copy link

Are there any updates regarding this issue? I want to use this helm chart in k8s production, But the aforementioned issue still exists, and as a workaround am deleting the slave pod when master restarts. I also tried adding the <check-for-live-server>true</check-for-live-server> and <allow-failback>true</allow-failback> on respective master and slave configmap file but still it doesn't work. Can we expect any upgraded helm chart with proper failover and failback?

@andrusstrockiy
Copy link

@chandras-xl The issue is not with chart but with Artemis cluster configuration itself.
So If you don’t have any kind persistent storage inside your k8s, move your cluster formation on aretmis to virtual machine I.e as docker image (docker-compose) or run as daemon

@chandras-xl
Copy link

@andrusstrockiy Thank you! the failover and failback worked after using persistent storage on my k8s cluster.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants