Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering fails when k8s cluster is not "cluster.local" #74

Closed
cbluth opened this issue Oct 23, 2018 · 14 comments
Closed

Clustering fails when k8s cluster is not "cluster.local" #74

cbluth opened this issue Oct 23, 2018 · 14 comments

Comments

@cbluth
Copy link

cbluth commented Oct 23, 2018

When the kubernetes cluster is named something other than cluster.local, then vernemq clustering fails.

See here: https://github.com/erlio/docker-vernemq/blob/ec8ffe2bba98c7a8cc96e7e471cc3118f93e2c24/bin/vernemq.sh#L32

root@vernemq-0:~# vmq-admin cluster show
Node 'VerneMQ@vernemq-0..backend.svc.cluster.local' not responding to pings.
root@vernemq-0:~# 
@francois-travais
Copy link
Contributor

I'm not sure that the cluster name is to blame here, there is no namespace in the nodename URL. This may be related to #70

@cbluth
Copy link
Author

cbluth commented Oct 24, 2018

what happens if the FQDN of the clusterip kubernetes api is kubernetes.default.svc.cluster2.local instead of kubernetes.default.svc.cluster.local?

@cbluth
Copy link
Author

cbluth commented Oct 24, 2018

using kubespray and changing the value here: https://github.com/kubernetes-incubator/kubespray/blob/master/inventory/sample/group_vars/k8s-cluster/k8s-cluster.yml#L109-L110

changing that value will cause vernemq to fail when joining the cluster.

francois-travais added a commit to francois-travais/docker-vernemq that referenced this issue Oct 25, 2018
Signed-off-by: François Travais <francois.travais@gmail.com>
dergraf pushed a commit that referenced this issue Oct 26, 2018
Signed-off-by: François Travais <francois.travais@gmail.com>
@zeisi
Copy link

zeisi commented Dec 10, 2018

I'm having the same problem, it looks like the subdomain gets set, even if it's empty.

VERNEMQ_KUBERNETES_SUBDOMAIN=$(curl -X GET $insecure --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt https://kubernetes.default.svc.$DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME/api/v1/namespaces/$DOCKER_VERNEMQ_KUBERNETES_NAMESPACE/pods?labelSelector=app=$DOCKER_VERNEMQ_KUBERNETES_APP_LABEL -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" | jq '.items[0].spec.subdomain' | sed 's/"//g' | tr '\n' '\0')
if [ $VERNEMQ_KUBERNETES_SUBDOMAIN == "null" ]; then
VERNEMQ_KUBERNETES_HOSTNAME=${MY_POD_NAME}.${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}
else
VERNEMQ_KUBERNETES_HOSTNAME=${MY_POD_NAME}.${VERNEMQ_KUBERNETES_SUBDOMAIN}.${DOCKER_VERNEMQ_KUBERNETES_NAMESPACE}.svc.${DOCKER_VERNEMQ_KUBERNETES_CLUSTER_NAME}
fi

In line 29, there is a call to curl and jq, which can't be found?
Excerpt from my startup log:

/usr/sbin/start_vernemq: line 28: jq: command not found
/usr/sbin/start_vernemq: line 28: curl: command not found
/usr/sbin/start_vernemq: line 29: [: ==: unary operator expected
/usr/sbin/start_vernemq: line 37: jq: command not found
/usr/sbin/start_vernemq: line 37: curl: command not found

This results in a vernemq node name with two consecutive dots (as the subdomain is empty)

cat /etc/vernemq/vm.args
...
-name VerneMQ@vernemq-2..labor.svc.cluster.local
...

Whereas my namespace is labor.

If I understand it correctly, the Dockerfile for 1.6.2 on github does not contain the apt install parameters for curl and jq:

RUN \
apt-get update \
&& apt-get -y install openssl iproute2 \
&& rm -rf /var/lib/apt/lists/*

Interestingly enough, the Dockerfile file on Dockerhub DOES cointain the correct parametes for curl & jq:
https://hub.docker.com/r/erlio/docker-vernemq/~/dockerfile/

Is it possible that somehow the tags on Dockerhub got messed up and the thing we are pulling isn't the latest one?

@zeisi
Copy link

zeisi commented Dec 10, 2018

I've just realized, the missing apt argument for curl and jq are fixed in the master branch

RUN \
apt-get update \
&& apt-get -y install openssl iproute2 curl jq \
&& rm -rf /var/lib/apt/lists/*

I will try to build the image myself and report back.

@zeisi
Copy link

zeisi commented Dec 10, 2018

It looks like the image on Dockerhub is in fact NOT 1.6.2 (the one with the highest version number), but in fact a fixed version with the correct apt-get parametes for curl and jq.
So, using the image rlio/docker-vernemq:latest works, with a now working vmq-admin. It was pushed 4 days ago (06.12.2018).

vernemq@vernemq-2:~$ vmq-admin
Usage: vmq-admin <sub-command>

  Administrate the cluster.

  Sub-commands:
    node        Manage this node
    cluster     Manage this node's cluster membership
    session     Retrieve session information
    plugin      Manage plugin system
    listener    Manage listener interfaces
    metrics     Retrieve System Metrics
    api-key     Manage API keys for the HTTP management interface
    trace       Trace various aspects of VerneMQ
  Use --help after a sub-command for more details.

Would be great if the latest would get a tag like 1.6.2-kubernetes-fix or similar :)
Then this issue could be closed?

@larshesel
Copy link
Contributor

larshesel commented Dec 10, 2018

@dergraf can we rebuild 1.6.2 with latest master or create push under another tag as proposed?

@dergraf
Copy link
Contributor

dergraf commented Dec 10, 2018

This is indeed strange.
The way it is supposed to work at the moment is the following:
erlio/docker-vernemq:latest should contain a release that's built from the VerneMQ sources at the 1.6.2 release tag. Not straightforward is that latest won't point to VerneMQ master, it always points to the VerneMQ release tag. So in that sense latestmeans the latest version of the docker-vernemq repo and not the latest version of vernemq. We're currently moving away fromm dockerhub automatic builds to a Travis based build pipeline, which would allow us to address this request.

@zeisi what is vernemq version gives you in those containers?

@dergraf
Copy link
Contributor

dergraf commented Dec 10, 2018

@larshesel do you think it would be save to always fetch master (unless a proper version is provided with the build-args)

@francois-travais
Copy link
Contributor

francois-travais commented Dec 10, 2018

The annoying thing for me right now is that if I don't want to use the tag latest, for a production setup, I cannot rely on the version tags your pushing. Right now the tag 1.6.2 is just broken and fixed in latest. One expects to be able to rely on the tagged version.

Moreover the tag 1.6.2 introduced a completely new directory layout which prevents me from upgrading my resources in k8s with persistent volumes. If I want to upgrade from 1.6.1 to 1.6.2 I have to drain and shut down the entire cluster before starting a new one. It's not something one expects when bumping a patch version.

@zeisi
Copy link

zeisi commented Dec 10, 2018

@dergraf hm, running vernemq version with the latest tag just returns an empty newline.
It looks like there's no proper version set?

$cat /vernemq/lib/env.sh | grep APP_VERSION
APP_VERSION=

APP_VERSION is actually empty here (not sure if it is the only place where it should be set).

$ ls -l /vernemq/lib/
....
drwxr-xr-x. 4 vernemq vernemq    33 Dec  6 11:55 vernemq_dev-0.0.0+build.17.ref6e41cce
....

So it looks like the latest tag on Dockerhub points to a dev version :(

@dergraf
Copy link
Contributor

dergraf commented Dec 10, 2018

@francois-travais I completely understand your points (both).

The way we currently build Docker images might not be the best way to do it. We try to match the Docker tags with the exact VerneMQ version. I see the pros and cons of this approach, but I am happy to discuss this, especially as we're moving away from erlio/docker-vernemq to a new vernemq/vernemq repo where we push images built via Travis instead of the Docker automatic builds.

As mentioned, I completely understand your point, but we actually tried to reach out to the community exactly because the mentioned directory changes (see #80), we even requested feedback over Slack on this matter. As we got the agreement from several of our rather heavy docker users we adapted the images that way. I am sorry that you missed this, would have been great to know your perspective on this.

@dergraf
Copy link
Contributor

dergraf commented Dec 10, 2018

@zeisi the vernemq_dev is a library that's used inside VerneMQ and has nothing to do with the build target.

@dergraf
Copy link
Contributor

dergraf commented Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants