Skip to content
This repository has been archived by the owner on Mar 5, 2024. It is now read-only.

Helm charts #53

Closed
wants to merge 4 commits into from
Closed

Helm charts #53

wants to merge 4 commits into from

Conversation

ewbankkit
Copy link

Addresses #30.
I did not address #29 and assume that the TLS certificates and private keys are stored as Kubernetes secrets as described in https://github.com/uswitch/kiam/blob/master/docs/TLS.md.
Based on the deployment YAMLs in https://github.com/uswitch/kiam/tree/master/deploy.
Still a WIP as I have not yet got kiam working in my environment - server health and liveness checks are failing; I still have to enable debugging as described in #17.

@ewbankkit ewbankkit mentioned this pull request Apr 23, 2018
@pingles
Copy link
Contributor

pingles commented Apr 23, 2018

Oh wow, I was looking at #30 and added the "Help needed" label a few minutes ago :)

Let us know what happens once you've got the additional gRPC logs with more detail about why the health checks fail.

@ewbankkit
Copy link
Author

The health/liveness check error was my fault:

/ # GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=8 /health --cert=/etc/kiam/tls/kiam.server.pem --key=/etc/kiam/tls/kiam.server-key.pem --ca=/etc/kiam/tls/kiam.ca.pem --server-address=loc
alhost:443
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc42047b320
INFO: 2018/04/24 13:58:34 dialing to target with scheme: ""
INFO: 2018/04/24 13:58:34 could not get resolver for scheme: ""
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/04/24 13:58:34 balancerWrapper: is pickfirst: false
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/04/24 13:58:34 grpc: failed dns SRV record lookup due to lookup _grpclb._tcp.localhost on 10.100.0.10:53: no such host.
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no address available 
INFO: 2018/04/24 13:58:34 balancerWrapper: got update addr from Notify: [{127.0.0.1:443 <nil>} {[::1]:443 <nil>}]
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: new subconn: [{127.0.0.1:443 0  <nil>}]
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: new subconn: [{[::1]:443 0  <nil>}]
WARNING: 2018/04/24 13:58:34 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp [::1]:443: connect: cannot assign requested address"; Reconnecting to {[::1]:443 0  <nil>}
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242500, CONNECTING
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242550, CONNECTING
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242550, TRANSIENT_FAILURE
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242550, CONNECTING
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242550, TRANSIENT_FAILURE
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242500, TRANSIENT_FAILURE
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc42047b320
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242500, CONNECTING
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc42047b320
WARNING: 2018/04/24 13:58:34 Failed to dial 127.0.0.1:443: connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for Kiam Server, not localhost:443"; please retry.
INFO: 2018/04/24 13:58:34 balancerWrapper: handle subconn state change: 0xc420242500, SHUTDOWN
INFO: 2018/04/24 13:58:34 ccBalancerWrapper: updating state and picker called by balancer: TRANSIENT_FAILURE, 0xc42047b320
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
WARN[0000] error checking health: rpc error: code = Unavailable desc = there is no connection available 
FATA[0001] error retrieving health: rpc error: code = Unavailable desc = there is no connection available 

Copy and paste error when generating the server's key and cert 😀.

@ewbankkit
Copy link
Author

OK, got kiam working on the Amazon EKS preview using this chart.
The only slightly wonky things were:

  • Realizing that I need to mount the host's CA certificates into the server pod (thanks to failed to load system roots and no roots provided - TLS error #36). On Amazon Linux 2 the CA certificates are in /etc/ssl/certs but the PEM files in there are symlinks so I had to mount the true host location /etc/pki/ca-trust/extracted/pem
  • Couldn't use kubernetes.io/role: node vs. kubernetes.io/role: master as node selectors for agent vs. server as you can't run pods on the Amazon-managed masters. Ended up using AZ (e.g. failure-domain.beta.kubernetes.io/zone=us-west-2b) to partition agent and server nodes and made sure any pods requiring IAM roles run on the same nodes as the agent

Rebased and squashed my previous commits, plus today's fixes, into a single commit.

Next is to upgrade to the v2.6 image.

@ewbankkit
Copy link
Author

It looks like the only configuration change for v2.6 is the addition of the --assume-role-arn command-line flag for the server. I'll need to add this to values.yaml and document.

@ewbankkit
Copy link
Author

ewbankkit commented Apr 30, 2018

I'll update to v2.7 now that's the latest version.
This will include the new --session-duration command-line flag for the server.

@ewbankkit
Copy link
Author

Updated to kiam v2.7.
This is the generated yaml for the default values:

helm install --dry-run --debug --name test deploy/charts/kiam
[debug] Created tunnel using local port: '58647'

[debug] SERVER: "127.0.0.1:58647"

[debug] Original chart version: ""
[debug] CHART PATH: /development/golang/src/github.com/uswitch/kiam/deploy/charts/kiam

NAME:   test
REVISION: 1
RELEASED: Mon Apr 30 11:01:00 2018
CHART: kiam-0.3.0
USER-SUPPLIED VALUES:
{}

COMPUTED VALUES:
agent:
  dnsPolicy: ClusterFirstWithHostNet
  extraArgs: {}
  extraEnv: {}
  extraHostPathMounts: []
  host:
    interface: cali+
    iptables: false
    port: 8181
  image:
    pullPolicy: IfNotPresent
    repository: quay.io/uswitch/kiam
    tag: v2.7
  log:
    jsonOutput: true
    level: info
  nodeSelector: {}
  podAnnotations: {}
  podLabels: {}
  prometheus:
    port: 9620
    scrape: true
    syncInterval: 5s
  resources: {}
  tls:
    caFileName: ca.pem
    certFileName: agent.pem
    keyFileName: agent-key.pem
    mountPath: /etc/kiam/tls
    secretName: kiam-agent-tls
  tolerations: []
  updateStrategy: OnDelete
extraArgs: {}
rbac:
  create: false
  serviceAccountName: default
server:
  assumeRoleArn: null
  cache:
    syncInterval: 1m
  extraArgs: {}
  extraEnv: {}
  extraHostPathMounts: []
  image:
    pullPolicy: Always
    repository: quay.io/uswitch/kiam
    tag: v2.7
  log:
    jsonOutput: true
    level: info
  nodeSelector: {}
  podAnnotations: {}
  prometheus:
    port: 9620
    scrape: true
    syncInterval: 5s
  resources: {}
  roleBaseArn: null
  service:
    port: 443
    targetPort: 443
  sessionDuration: 15m
  tls:
    caFileName: ca.pem
    certFileName: server.pem
    keyFileName: server-key.pem
    mountPath: /etc/kiam/tls
    secretName: kiam-server-tls
  tolerations: []
  updateStrategy: OnDelete

HOOKS:
MANIFEST:

---
# Source: kiam/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: kiam
    chart: kiam-0.3.0
    heritage: Tiller
    release: test
  name: test-kiam-server
  namespace: default
spec:
  clusterIP: None
  selector:
    app: kiam
    role: server
  ports:
  - name: grpc
    port: 443
    targetPort: 443
    protocol: TCP
---
# Source: kiam/templates/agent-daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: kiam
    chart: kiam-0.3.0
    heritage: Tiller
    release: test
  name: test-kiam-agent
  namespace: default
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9620"
      labels:
        app: kiam
        role: agent
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      tolerations:
        []
        
      volumes:
        - name: tls
          secret:
            secretName: kiam-agent-tls
        - name: xtables
          hostPath:
            path: /run/xtables.lock
      containers:
        - name: kiam
          image: "quay.io/uswitch/kiam:v2.7"
          imagePullPolicy: IfNotPresent
          command:
            - /agent
          args:
            - --host-interface=cali+
            - --json-log
            - --level=info
            - --port=8181
            - --cert=/etc/kiam/tls/agent.pem
            - --key=/etc/kiam/tls/agent-key.pem
            - --ca=/etc/kiam/tls/ca.pem
            - --server-address=test-kiam-server:443
            - --prometheus-listen-addr=0.0.0.0:9620
            - --prometheus-sync-interval=5s
          env:
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
          volumeMounts:
            - mountPath: /etc/kiam/tls
              name: tls
            - mountPath: /var/run/xtables.lock
              name: xtables
          livenessProbe:
            httpGet:
              path: /ping
              port: 8181
            initialDelaySeconds: 3
            periodSeconds: 3
  updateStrategy:
    type: OnDelete
---
# Source: kiam/templates/server-daemonset.yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: kiam
    chart: kiam-0.3.0
    heritage: Tiller
    release: test
  name: test-kiam-server
  namespace: default
spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9620"
      labels:
        app: kiam
        role: server
    spec:
      serviceAccountName: "default"
      tolerations:
        []
        
      volumes:
        - name: tls
          secret:
            secretName: kiam-server-tls
      containers:
        - name: kiam
          image: "quay.io/uswitch/kiam:v2.7"
          imagePullPolicy: Always
          command:
            - /server
          args:
            - --json-log
            - --level=info
            - --bind=0.0.0.0:443
            - --cert=/etc/kiam/tls/server.pem
            - --key=/etc/kiam/tls/server-key.pem
            - --ca=/etc/kiam/tls/ca.pem
            - --role-base-arn-autodetect
            - --session-duration=15m
            - --sync=1m
            - --prometheus-listen-addr=0.0.0.0:9620
            - --prometheus-sync-interval=5s
          volumeMounts:
            - mountPath: /etc/kiam/tls
              name: tls
          livenessProbe:
            exec:
              command:
              - /health
              - --cert=/etc/kiam/tls/server.pem
              - --key=/etc/kiam/tls/server-key.pem
              - --ca=/etc/kiam/tls/ca.pem
              - --server-address=localhost:443
              - --server-address-refresh=2s
              - --timeout=5s
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 10
          readinessProbe:
            exec:
              command:
              - /health
              - --cert=/etc/kiam/tls/server.pem
              - --key=/etc/kiam/tls/server-key.pem
              - --ca=/etc/kiam/tls/ca.pem
              - --server-address=localhost:443
              - --server-address-refresh=2s
              - --timeout=5s
            initialDelaySeconds: 3
            periodSeconds: 10
            timeoutSeconds: 10
  updateStrategy:
    type: OnDelete

@ewbankkit
Copy link
Author

@pingles Once you are happy I will submit the chart as an Incubator chart to https://github.com/kubernetes/charts and work on the items listed here.
Who would you like listed as chart maintainers?

@ewbankkit ewbankkit changed the title [WIP] Helm charts Helm charts Apr 30, 2018
@pingles
Copy link
Contributor

pingles commented Apr 30, 2018

@ewbankkit cool, that'd be great, thanks!

We don't use Helm so I'd be delighted if you'd like to pick up owning the Helm integration? 😄

@ewbankkit
Copy link
Author

Sure, why not.

@ewbankkit
Copy link
Author

Work moved to helm/charts#5330.
Closing this PR.

@ewbankkit ewbankkit closed this Apr 30, 2018
@pingles
Copy link
Contributor

pingles commented Apr 30, 2018

Thank you for contributing this!

@ewbankkit ewbankkit deleted the issue-30 branch May 14, 2018 17:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants