Production ready? #14

somejavadev · 2018-08-07T13:29:53Z

Hi, thank you for this chart. Would you say this chart is able to be used in a production environment?

vromero · 2018-09-11T23:48:42Z

I wouldn't use it in production. A number of reasons:

haven't decided if generate config or use KUBEPING
artemis can't handle dynamic cluster sizes (the cluster with static size has to be formed on start), I have no idea what to do about this.
haven't completed the integration with prometheus, a messaging broker without metrics/alarms is more a problem than a solution
Not sure what to do about the loadbalancing. Today slave is not-ready but not ready messes up with things like helm install --wait or with deployment of replica>1 stateful sets. No idea yet what to do about this.

DanSalt · 2018-09-12T12:50:36Z

Hi @vromero, (and @somejavadev for info)

Agree with your assessment, but it's close! If it helps, we've been using a modified version of your charts in our environment, with the aim of taking them to production. We have a number of changes, and at some point I'll aim to fold them into a PR for you to take a look at.

A few comments on your points above:

We have dynamic clustering working in Artemis, with a couple of caveats. It's currently using static connectors (as per your latest changes), which means that the set of nodes used for discovery is fixed. Artemis nodes do keep the whole cluster state in-memory, but use the static connectors to determine the cluster topology. If you scale up the cluster, the new nodes have larger lists of static connectors (e.g. references to ALL nodes), but the existing nodes only know about the ones it defined at deploy-time. Scaling down is more of an issue, because you're taking away nodes that might then be called upon to get cluster topology. But as long as each node has at least one available node in its list, discovery works 'good enough'. The restart time on pods is pretty small too, so depending on your use case and number of nodes, it doesn't cause too much problem.
Your Docker image was pretty Prometheus-ready, to be honest. All we had to do in the charts was enable the JMX_EXPORTER and create a ServiceMonitor for Prometheus Operator to scrape it. We have created a neat Grafana dashboard that shows all the instances and their important data. I much prefer this to using the ActiveMQ console because hooking up all the individual console instances via Ingress is a pain - would be better if the AMQ Console could connect to all the other nodes (it can show them in the cluster diagram, so theoretically possible)
Load Balancing was an interesting one, and we 'fixed' this by (a) telling Kubernetes not to wait for unready endpoints on slave nodes only and (b) changing the readiness probe to the 'core' endpoint, not the console. This way (1) the Slave nodes remain not-ready, and this excludes them from the load balancer/DNS (which is what we want). As soon as a node drops and the slave becomes ready, it is included in DNS/load balancing. The only annoyance is that by probing the core endpoint, it causes log entries for badly terminated connections. Whilst this isn't a perfect fix, it's good enough.
Finally, we do have a prototype of a version of the charts that uses JGroups for dynamic discovery (backed by DB), but there are a number of worrying version mis-matches of JGroups and KUBEPING that prevent us from going fully down this route. Once the versions align better, we may resume this path.

Hope all this helps.

Cheers,
Dan

vromero · 2018-09-12T22:08:17Z

This sounds awesome, I'm looking forward to seeing the PR, Feel free to drop small PRs whenever you feel like it, no need to wait for a big thing.

azman0101 · 2019-12-20T09:56:47Z

Is there any update on this project production readiness status ?

vromero · 2019-12-20T11:48:45Z

I'm afraid not. I keep playing in my head with this and even if with the fantastic insights from @DanSalt I stilll believe the clustering model of artemis does not play well with K8s. And hence, I'd probably end up reducing the chart to a master-slave configuration (which is what Artemis does anyways in a >2 cluster, eit just picks two nodes to be master and (a single) slave). I'd also get rid of the load balancer, as the model in artemis expects you to know the master and slave directios and it just plays awfully bad with the K8s loadbalancers.
That is probably the only thing still missing. I'd probably also add some example grafana dashboards and a great deal of documentation.

vromero closed this as completed Sep 11, 2018

y0zg mentioned this issue May 18, 2020

Production ready? review 2020 #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production ready? #14

Production ready? #14

somejavadev commented Aug 7, 2018

vromero commented Sep 11, 2018

DanSalt commented Sep 12, 2018

vromero commented Sep 12, 2018

azman0101 commented Dec 20, 2019

vromero commented Dec 20, 2019

Production ready? #14

Production ready? #14

Comments

somejavadev commented Aug 7, 2018

vromero commented Sep 11, 2018

DanSalt commented Sep 12, 2018

vromero commented Sep 12, 2018

azman0101 commented Dec 20, 2019

vromero commented Dec 20, 2019