sftd: add support for multiple SFT servers (#1325)

* sftd: add support for multiple SFT servers * The ingress assigns an SFT allocation request to a random SFT * Each sftd pod is made aware of an URL on which it is directly reachable, and will return the URL in the response to the client. e.g. Pod `sftd-0` will be assigned `https://sft.example.com/sfts/sftd-0` * The client tells this URL to other clients willing to join the call * Other clients make a request to this URL * The ingress points requests to `/sfts` to the `join-call` deployment, which will redirect to the specific pod, such that the client can join the conference call of the other client
wireapp · Feb 17, 2021 · 3f819a6 · 3f819a6
1 parent a144ede
commit 3f819a6
Show file tree

Hide file tree

Showing 10 changed files with 306 additions and 23 deletions.
diff --git a/charts/sftd/README.md b/charts/sftd/README.md
@@ -1,27 +1,84 @@
 # SFTD Chart
 
+In theory the `sftd` chart can be installed on its own, but it's usually
+installed as part of the `wire-server` umbrella chart.
+
+## Parameters
+
+### Required
+| Parameter       | Description                                                                                 |
+|-----------------|---------------------------------------------------------------------------------------------|
+| `host`          | The domain name on which the SFT will be reachable. Should point to your ingress controller |
+| `allowOrigin`   | Allows CORS requests on this domain. Set this to the domain of your wire webapp.            |
+
+
+### Bring your own certificate
+| Parameter       | Description                                                                                 |
+|-----------------|---------------------------------------------------------------------------------------------|
+| `tls.key`       | Private key of the TLS certificate for `host`                                               |
+| `tls.crt`       | TLS certificate for `host`                                                                  |
+
+### Cert-manager certificate
+
+| Parameter       | Description                                                                                                                                        |
+|-----------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
+| `tls.issuerRef` | describes what [Issuer](https://cert-manager.io/docs/reference/api-docs/#meta.cert-manager.io/v1.ObjectReference)  to use to request a certificate |
+
+
+### Other (optional) parameters
+
+| Parameter                       | Default | Description                                                                                                                                                                           |
+|---------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `terminationGracePeriodSeconds` | `10`    | The time to wait after terminating an sft node before shutting it down. Useful to wait for a pod to have less calls before shutting down. Pod won't take new calls whilst terminating |
+| `replicaCount`                  | `1`     | Amount of SFT servers to run. Only one SFT server can run per node. So  `replicaCount <= nodeCount`                                                                               |
+| `nodeSelector`, `affinity`      | `{}`    | Used to constraint SFT servers to only run on specific nodes                                                                                                                          |
+
+Please see [values.yaml](./values.yaml) for an overview of other parameters that can be configured.
+
 ## Deploy
 
-Replace `example.com` with your own domain here.
 
-Using your own certificates:
+#### As part of `wire-server` umbrella chart
+
+The `sftd` is deployed as part of the `wire-server` umbrella chart. You can
+edit the `values.yaml` of your `wire-server` chart to configure sftd.
 
+```yaml
+sftd:
+  host: sftd.example.com
+  allowOrigin: https://webapp.example.com
+  tls:
+    # The https://cert-manager.io issuer to use to retrieve a certificate
+    issuerRef:
+      kind: ClusterIssuer
+      name: letsencrypt-prod
 ```
-helm install sftd wire/sftd  \
-  --set host=sftd.example.com \
-  --set allowOrigin=https://webapp.example.com \
-  --set-file tls.crt=/path/to/tls.crt \
-  --set-file tls.key=/path/to/tls.key
+
+#### Standalone
+
+You can also install `sftd` as stand-alone. This is useful if you want to be
+more careful with releases and want to decouple the release lifecycle of `sftd`
+and `wire-server`.  For example, if you set `terminationGracePeriodSeconds` to
+allow calls to drain to a large number (say a few hours), this would make the
+deployment of the `wire-server` umbrella-chart that usually is snappy to run
+very slow.
+
+In `wire-server` chart's `values.yaml` you should set:
+```yaml
+tags:
+  sftd: false
 ```
+To make sure that the umbrella chart does not deploy sftd too.
 
-Using Cert-manager:
 ```
 helm install sftd wire/sftd \
-  --set host=example.com \
+  --set host=sftd.example.com \
   --set allowOrigin=https://webapp.example.com \
-  --set tls.issuerRef.name=letsencrypt-staging
+  --set-file tls.crt=/path/to/tls.crt \
+  --set-file tls.key=/path/to/tls.key
 ```
 
+
 the `host` option will be used to set up an `Ingress` object.
 
 The domain in `host` must point to the public IP you have deployed to handle
@@ -31,12 +88,78 @@ You can switch between `cert-manager` and own-provided certificates at any
 time. Helm will delete the `sftd` secret automatically and then cert-manager
 will create it instead.
 
-It is important that `allowOrigin` is synced with the domain where the web app is hosted
+
+`allowOrigin` MUST be in sync the domain where the web app is hosted
 as configured in the `wire-server` chart or the webapp will not be able to contact the SFT
 server.
 
-You should configure `brig` to hand out the SFT server to clients by setting
-`brig.optSettings.setSftStaticUrl=https://sftd.example.com:443` on the `wire-server` chart
+You MUST configure `brig` to hand out the SFT server to clients, in order for clients to be
+able to use the new conference calling features:
+
+```yaml
+brig:
+  # ...
+  optSettings:
+    # ...
+    setSftStaticUrl: https://sftd.example.com:443
+```
+
+## Routability
+
+We currently require network connectivity between clients and the SFT server
+and between the SFT server and the restund servers. In other words; the SFT
+server needs to be directly reachable on its public IP to clients and should be
+able to reach the restund servers on their public IPs.
+
+More exotic setups _are_ possible but are currently *not* officially supported. Please
+contact us if you have different constraints.
+
+## Rollout
+
+Kubernetes will shut down pods and start new ones when rolling out a release. Any calls
+that were in progress on said pod will be terminated and will cause the call to drop.
+
+Kubernetes can be configured to wait for a certain amount of seconds before
+stopping the pod. During this timeframe new calls wil not be initiated on the
+pod, but existing calls will also not be disrupted. If you want to roll out a
+release with minimal impact you can set the
+[`terminationGracePeriodSeconds`](./values.yaml#L18) option to the maximum
+length you want to wait before cutting off calls.
+
+For example to cordon SFTs for one hour before dropping calls:
+```
+helm upgrade sftd wire/sftd --set terminationGracePeriodSeconds=3600
+```
+
+Currently due to the fact we're using a `StatefulSet` to orchestrate update
+rollouts, and `StatefulSet`s will not replace all pods at once but instead
+one-for-one (aka. *rolling update*), a rollout of a release will take `oldReplicas * terminationGracePeriodSeconds`
+to complete.
+
+
+## Scaling up or down
+
+You can scale up and down by specifying `replicas`:
+
+```yaml
+sftd:
+  replicaCount: 3
+```
+
+By default we provision *1* replica.
+
+Note that due to the usage of `hostNetwork` there can only be _one_ instance of
+`sftd` per Kubernetes node.  You will need as many nodes available as you have
+replicas.
+
+As a rule of thumb we support *50* concurrent connections per *1 vCPU*. These
+numbers might improve as we work on optimizing the SFTD code. You should adjust
+the amount of replicas based on your expected usage patterns and Kubernetes
+node specifications.
+
+If you're using a Kubernetes cloud offering, we recommend setting up cluster
+auto-scaling so that you automatically provision new Kubernetes nodes when the
+amount of replicas increases above the amount of nodes available.
 
 
 ## Multiple sftd deployments in a single cluster
@@ -69,8 +192,8 @@ node4
 Then we can make two `sftd` deployments and make sure Kubernetes schedules them on distinct set of nodes:
 
 ```
-helm install sftd-prod charts/sftd    --set 'nodeSelector.wire\.com/role=sftd-prod' ...other-flags
-helm install sftd-staging charts/sftd --set 'nodeSelector.wire\.com/role=sftd-staging' ...other-flags
+helm install wire-prod charts/wire-server --set 'nodeSelector.wire\.com/role=sftd-prod' ...other-flags
+helm install wire-staging charts/wire-server --set 'nodeSelector.wire\.com/role=sftd-staging' ...other-flags
 ```
 
 ## No public IP on default interface
@@ -110,3 +233,29 @@ kernel for free ports, which by default are in the `32768-61000` range
 On a default installation these ranges do not overlap and sftd should never have
 conflicts with kubernetes components. You should however check that on your OS
 these ranges aren't configured differently.
+
+
+
+# Future work
+
+We're (ab-)using a `StatefulSet` to give each pod a stable DNS name and use
+that to route call join requests to the right calling service.
+
+Downside of `StatefulSet` is that rollouts are slow -- propoerionally to how
+high you set  `terminationGracePeriodSeconds`.
+
+However, it seems that `coredns` supports to be configured to have the same DNS
+behaviour for any pods, not just pods in `StatefulSet`s.
+(https://github.com/kubernetes/kubernetes/issues/47992#issuecomment-499580692)
+
+This requires a deployer of wire to edit their cluster's CoreDNS config to set
+the
+[`endpoint_pod_names`](https://github.com/coredns/coredns/tree/master/plugin/kubernetes)
+option which they might not have the ability to do.
+
+If you are able to set this setting, you could use a `Deployment` instead of a
+`StatefulSet`.  The benefit of a `Deployment` is that it replaces all pods at
+once; such that you do not have to wait `replicaCount *
+terminationGracePeriodSeconds` for a rollout to finish but just
+`terminationGracePeriodSeconds`.  This drastically improves operations. We
+should expose this as an option for a future release.
diff --git a/charts/sftd/templates/_helpers.tpl b/charts/sftd/templates/_helpers.tpl
@@ -41,6 +41,11 @@ app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
 {{- end }}
 app.kubernetes.io/managed-by: {{ .Release.Service }}
 {{- end }}
+{{- define "sftd.join-call.labels" -}}
+helm.sh/chart: {{ include "sftd.chart" . }}
+{{ include "sftd.join-call.selectorLabels" . }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end }}
 
 {{/*
 Selector labels
@@ -49,3 +54,7 @@ Selector labels
 app.kubernetes.io/name: {{ include "sftd.name" . }}
 app.kubernetes.io/instance: {{ .Release.Name }}
 {{- end }}
+{{- define "sftd.join-call.selectorLabels" -}}
+app.kubernetes.io/name: join-call
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end }}
diff --git a/charts/sftd/templates/configmap-join-call.yaml b/charts/sftd/templates/configmap-join-call.yaml
@@ -0,0 +1,20 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: {{ include "sftd.fullname" . }}-join-call
+  labels:
+    {{- include "sftd.join-call.labels" . | nindent 4 }}
+
+data:
+  default.conf.template: |
+    server {
+      listen 8080;
+      resolver ${NAMESERVER};
+
+      location /healthz { return 204; }
+
+      location ~ ^/sfts/([a-z0-9\-]+)/(.*) {
+        proxy_pass http://$1.sftd.${POD_NAMESPACE}.svc.cluster.local:8585/$2;
+      }
+
+    }
diff --git a/charts/sftd/templates/deployment-join-call.yaml b/charts/sftd/templates/deployment-join-call.yaml
@@ -0,0 +1,64 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: {{ include "sftd.fullname" . }}-join-call
+  labels:
+    {{- include "sftd.join-call.labels" . | nindent 4 }}
+spec:
+  replicas: {{ .Values.joinCall.replicaCount }}
+  selector:
+    matchLabels:
+      {{- include "sftd.join-call.selectorLabels" . | nindent 6 }}
+  template:
+    metadata:
+      labels:
+        {{- include "sftd.join-call.selectorLabels" . | nindent 8 }}
+      annotations:
+        checksum/configmap: {{ include (print .Template.BasePath "/configmap-join-call.yaml") . | sha256sum }}
+    spec:
+      {{- with .Values.imagePullSecrets }}
+      imagePullSecrets:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
+      securityContext:
+        {{- toYaml .Values.podSecurityContext | nindent 8 }}
+      volumes:
+        - name: nginx-config
+          configMap:
+            name: {{ include "sftd.fullname" . }}-join-call
+      containers:
+        - name: nginx
+          securityContext:
+            {{- toYaml .Values.securityContext | nindent 12 }}
+          image: "{{ .Values.joinCall.image.repository }}:{{ .Values.joinCall.image.tag }}"
+          imagePullPolicy: {{ .Values.image.pullPolicy }}
+          ports:
+            - name: http
+              containerPort: 8080
+              protocol: TCP
+          livenessProbe:
+            httpGet:
+              path: /healthz
+              port: http
+          readinessProbe:
+            httpGet:
+              path: /healthz
+              port: http
+          resources:
+            {{- toYaml .Values.resources | nindent 12 }}
+          volumeMounts:
+            - mountPath: /etc/nginx/conf.d/default.conf.template
+              name: nginx-config
+              subPath: default.conf.template
+          env:
+            - name: POD_NAMESPACE
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.namespace
+          command:
+            - "/bin/sh"
+            - "-c"
+            - |
+              export NAMESERVER=`cat /etc/resolv.conf | grep "nameserver" | awk '{print $2}' | tr '\n' ' '`
+              envsubst '$NAMESERVER $POD_NAMESPACE' < /etc/nginx/conf.d/default.conf.template > /etc/nginx/conf.d/default.conf
+              exec nginx -g 'daemon off;'
diff --git a/charts/sftd/templates/ingress.yaml b/charts/sftd/templates/ingress.yaml
@@ -17,7 +17,11 @@ spec:
     - host: "{{ .Values.host }}"
       http:
         paths:
-          - path: /
+          - path: /sft/
             backend:
               serviceName: "{{ include "sftd.fullname" . }}"
               servicePort: sft
+          - path: /sfts/
+            backend:
+              serviceName: "{{ include "sftd.fullname" . }}-join-call"
+              servicePort: http
diff --git a/charts/sftd/templates/service-join-call.yaml b/charts/sftd/templates/service-join-call.yaml
@@ -0,0 +1,13 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: {{ include "sftd.fullname" . }}-join-call
+  labels:
+    {{- include "sftd.join-call.labels" . | nindent 4 }}
+spec:
+  ports:
+    - port: 80
+      targetPort: http
+      name: http
+  selector:
+    {{- include "sftd.join-call.selectorLabels" . | nindent 4 }}
diff --git a/charts/sftd/templates/statefulset.yaml b/charts/sftd/templates/statefulset.yaml
@@ -5,10 +5,10 @@ metadata:
   labels:
     {{- include "sftd.labels" . | nindent 4 }}
 spec:
-  # TODO: Make configurable in follow-up PR
-  # This is 1 on purpose as we need more machinery to make multiple SFTs work.
-  # Work for that is tracked in: https://github.com/wireapp/wire-server-deploy/pull/383
-  replicas: 1
+  replicas: {{ .Values.replicaCount }}
+  # Allows sfts to start up and shut down in parallel when scaling up and down.
+  # However this does not affect upgrades.
+  podManagementPolicy: Parallel
   serviceName: {{ include "sftd.fullname" . }}
   selector:
     matchLabels:
@@ -65,6 +65,10 @@ spec:
               valueFrom:
                 fieldRef:
                   fieldPath: status.podIP
+            - name: POD_NAME
+              valueFrom:
+                fieldRef:
+                  fieldPath: metadata.name
           volumeMounts:
             - name: external-ip
               mountPath: /external-ip
@@ -79,7 +83,7 @@ spec:
               else
                 ACCESS_ARGS="-A ${EXTERNAL_IP}"
               fi
-              exec sftd  -I "${POD_IP}" -M "${POD_IP}" ${ACCESS_ARGS} -u "https://{{ required "must specify host" .Values.host }}"
+              exec sftd  -I "${POD_IP}" -M "${POD_IP}" ${ACCESS_ARGS} -u "https://{{ required "must specify host" .Values.host }}/sfts/${POD_NAME}"
           ports:
             - name: sft
               containerPort: 8585