[tf:env+module] Refactor k8s module and introduce LB by lucendio · Pull Request #418 · wireapp/wire-server-deploy

lucendio · 2021-02-08T20:41:57Z

Please note that this PR is best being reviewed by following the commit history

Step 1:
- define default machine size within the k8s module
- remove var in env definition that is used nowhere
Step 2:
- splitting things up into multiple files
Step 3:
- refactor hetzner-kubernetes module:
  - Features:
    - allow to define a volume per group
    - all k8s machines are now sharing a private network
  - address FUTUREWORK
  - move from 'count' to 'for_each' in order to gain more control over which machine is created or destroyed
- introduce new interface for the environment configuration
  - move from k8s node count to k8s_cluster and machine groups

Examples

Default, in case no k8s cluster should be defined. (Not nice, but allowing it to not exist might complicate code)

k8s_cluster = {
  cloud = ""
  machine_groups = []
}

A single-node instance (e.g. Wire Demo)

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cpn"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane", "node" ]
    }
  ]
}

Separate control plane and node groups with volumes (e.g. for etcd)

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cps"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane" ]
      volume = {
        size = 10
      }
    },
    {
      group_name = "nodes"
      machine_ids = [ "1", "2" ]
      machine_type = "cx11"
      component_classes = [ "node" ]
    }
  ]
}

Step 4:
- Introduce LB

k8s_cluster = {
  ...
  load_balancer = true
  ...
}

Step 5:
- refactor DNS module to support defining different IPs for SPF

arianvp · 2021-02-10T09:11:48Z

Having to manually specify a list machine id's feels like a smell to me. Why can't we use count here? You only run into trouble with count if you use it to index variables etc; but for a "I have a pool of nodes that are exactly the same" it seems the right choice to me.
I only care about the size of the node-pool; I should never care about the names of the nodes in a node-pool in kubernetes.

In the way we're currently using count we do not run into the "which machine gets created or destroyed" problem yet. This only occurs when you're using count.index to index some variable. e.g. var.names[count.index]. Which is something we currently do not do https://www.terraform.io/docs/language/meta-arguments/count.html#when-to-use-for_each-instead-of-count

arianvp · 2021-02-10T10:24:08Z

Second remark:

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cpn"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane", "node" ]
    }
  ]
}

I don't know if I would put this much freedom at the vars level. I'd push some of these concerns down to the terraform code. We're now at a point we're sort of duplicating a subset of the terraform DSL in vars. We're giving a bit too much freedom in that it allows for configurations that might not work in practice. E.g. with etcd it only works in uneven configurations. 1, 3 or 5 nodes. But the variables give us the freedom to configure etcd with 2 nodes. This is not ideal.

l. I would just hardcode at the terraform level that we have 3 control-plane nodes and only make configurable whether we consider them as a worker pool or not.

resource "hetzner_server" "controlplane" {
   count = 3
   name = "controlplane${count.index}"
   labels = { "component-classes.wire.com/control-plane": true,  "component-classes.wire.com/node": vars.controlplane_is_worker }
}

I can get behind being able to define multiple node pools in vars though. That sounds useful (due to the lack of autoscaling groups in hetzner)

lucendio · 2021-02-10T13:41:11Z

Having to manually specify a list machine id's feels like a smell to me. Why can't we use count here?

No good reason. You are right here. Especially since the group_name still allows to life cycle machines by life cycling the whole group. We were already talking about this aspect when we discussed the schema initially. It just happen that I forgot about this, and still had the old way (machine_ids) in the test configuration. (addressed here)

A Note on exposing this kind of level in the environment configuration: in a follow-up verbal discussion, we decided to go for this for now, migrate the the different configurations across all environments to the current schema and then reiterate on the overall approach by looking into the idea of having some kind of thin TF layer in the form of opinionated 'template modules' which could then leverage the modules doing the heavy lifting and in turn would look very similar (or maybe even simpler) compared to the k8s_cluster schema. Downside of this, an environment would start to contain again some actual TF code.

lucendio · 2021-02-16T21:44:41Z

ansible/kubernetes.yml

+        - file:
+            path: "{{ artifacts_dir }}/admin.conf"
+            state: absent
+


@arianvp this probably breaks an assumption for in the offline package, doesnt it?

Note, this is the first step (out of two) to automate sopsing kubeconfig in an $ENV_DIR when bootstrapping a K8s cluster.

akshaymankar

I dropped some questions inline.
Apart from that I have one question: What is the migration path for the old environments? Do we recreate them or there is some terraform state migration that we have to do?

akshaymankar · 2021-02-17T13:01:14Z

terraform/modules/hetzner-kubernetes/machines.variables.tf

+  #   group_name = string
+  #   machine_id = string
+  #   machine_type = string
+  #   component_classes = list(string)


As I understand there are 2 valid component classes: "controlplane" and "node" and terraform provides no way of doing enums. So, can we do the next best thing which would be to document and validate this?

I have a feeling that we are over-generalizing here. Nobody wants a "machine" in their kubernetes cluster. Maybe it is better to accept "pools" for nodes and controlplane separately. This will also get rid of the enum problem.

This seems to be the same concern as I expressed here #418 (comment)

My biggest concern is that we are giving too much freedom. E.g. nothing is stopping you from accidentally assigning 2 or 4 control-plane nodes; whilst those setups do not work in etcd. you should only have 1 or 3 control-plane nodes and in very rare cases where you have many regions or zones, an etcd cluster of 5 is possible (but has bad performance. As every write needs consensus of 4 nodes instead instead of 2 )

I would still suggest us having two sections:

control_plane = { count = 3 # can be 3 or 1; but no other value is_worker = true # Whether the control plane group should be treated as a node pool or not. machine_type = "cx31" } node_pools = [ { count = 200 name = "basic", machine_type = "cx41" } { count = 50, name = "highpower", machine_type = "cx61" } ]

akshaymankar · 2021-02-17T13:04:45Z

terraform/environment/kubernetes.cluster.vars.tf

+variable "k8s_cluster" {
+  description = "represents Kubernetes cluster"
+  # type = object({
+  #   cloud = string
+  #   load_balancer = optional(bool)
+  #   load_balancer_ports = optional(list(
+  #     object({
+  #       name = string
+  #       protocol = string
+  #       listen = number
+  #       destination = number
+  #     })
+  #   ))
+  #   machine_groups = list(object({
+  #     group_name = string
+  #     machine_count = optional(number)
+  #     machine_type = string
+  #     component_classes = list(string)
+  #     volume = optional(object({
+  #       size = number
+  #       format = optional(string)
+  #     }))
+  #   }))
+  # })
+  type = any
+  default = {}
+}


Why do we want to encode all kinds of k8s_clusters in one type of specification? I have a feeling that this will bite us in future as we might get into specifying AWS specific things somehow. Maybe it is better to just have this variable be called "hetzner_k8s_cluster". When we start providing an option to support AWS, we can just create another variable called "aws_k8s_cluster"?

We should also ensure that we don't create even number of etcds as it provide the same fault tolerance as creating 1 less etcd.

akshaymankar · 2021-02-17T13:06:10Z

terraform/environment/kubernetes.inventory.tf

+      }
+      vars = merge(
+        {
+          # NOTE: please consider setting variables in the inventory of the respective environment instead


I maybe not have enough context, but I cannot figure out what this comment means?
Also, why is merging with an empty map required?

akshaymankar · 2021-02-17T13:24:34Z

terraform/modules/hetzner-kubernetes/load-balancer.resources.tf

+  load_balancer_type = (
+    length(var.machines) <= 5 ? "lb11" : (
+    length(var.machines) <= 75 ? "lb21" : (
+    length(var.machines) <= 150 ? "lb31" : null))
+  )


I see that this is a required thing which could potentially be null, does terraform plan fail in this case? What error would the user get?

It fails with

Error: Required attribute is not set on ../modules/hetzner-kubernetes/load-balancer.resources.tf line 7, in resource "hcloud_load_balancer" "lb": 7: load_balancer_type = null

This is IMHO sufficiently self-explanatory after looking at the corresponding LOCs. Please yell if you disagree!

Hmm it is a bit scary though. Say I bump my cluster from 5 to 6 nodes; that will get the load-balancer destroyed and recreated and cause a period of downtime. This sounds less than ideal. Wouldn't this better be something we should bubble up and let the administrator make a concious decision about?

You absolutely have a point here, but before drawing any conclusion, I think we should test whether it actually causes destruction and downtime. Additionally, that the LB might get destroyed should be completely transparent in the TF plan. Wouldn't this suffice as a heads-up and pull back if necessary?

Yes I suppose it does.

I'm convinced :) Consider this resolved

arianvp · 2021-02-19T11:25:50Z

terraform/modules/hetzner-kubernetes/machines.resources.tf

+}
+
+
+resource "hcloud_server_network" "snw" {


I think this should preferably not live in the kubernetes module, but should be an input to the kubernetes module.

variable "network_id" { description = "The network in which to deploy this kubernetes cluster" }

Say I have another module defining cassandra cluster or elasticsearch cluster; I want these to be in the same private network as the kubernetes cluster. (Because I want kubernetes nodes to be able to talk to these services).

This can then be composed in the environment folder :

resource "hcloud_server_network" "wire_network" { } module "cassandra-cluster" { source = "../modules/hetzner-cassandra" network_id = hcloud_server_network.wire_network.id count = 24 } module "elasticsearch-cluster" { source = "../modules/hetzner-elasticsearch" network_id = hcloud_server_network.wire_network.id count = 24 } module "k8s-cluster" { source = "../modules/hetzner-kubernetes" network_id = hcloud_server_network.wire_network.id count = 5 }

Another option would be to let the kubernetes module provide the private network as an output, but this is harder for refactoring and a bit awkward (Why would the cassandra module have to depend on the kubernetes module; for example)

See Also the section about Dependency Inversion in the module composition guide.

The scenario you are describing was exactly @jmatsushita and my reasoning behind introducing machine_groups. This allows to create only K8s groups (controlpales & nodes) but also other groups in addition, by deriving the Ansible groups from whatever items are defined in component_classes. Meaning a item of component_classes equals to an Ansible inventory group. This way, all machines would share the same private network.

Hmm I really feel like we're trying to re-invent the wheel here a bit too much still. What do we gain from using machine_groups in tfvars over defining a resource (or a module) in terraform code? We're coming up with our own DSL for defining machine groups; which terraform itself already has... Defining how we provision cassandra and elasticsearch groups whilst calling the kubernetes module also sounds somewhat confusing to me. I am still not really fond of this idea.

Why I am not fond of it:

The terraform code alone will not describe the topology of a Wire installation anymore. There will just be one node in the graph called "machine" and it's up to the user to define a topology in their tfvars that makes sense.

If we instead have explicit cassandra, kubernetes and elasticsearch hcloud_server resources (or modules if there is a group of resources you want to abstract) then it's clear from the terraform code what a wire deployment topology looks like.

This means we have to explicitly document how people should configure this topology in tfvars instead of having it evident in the code itself.

arianvp · 2021-02-19T11:28:52Z

terraform/modules/hetzner-kubernetes/machines.resources.tf

+
+
+resource "hcloud_volume" "volumes" {
+  for_each = {


What do we need statically defined volumes for? If we need storage in kubernetes, we should do that through PersistentVolume abstraction and the csi-driver.

Etcd will require local storage to function properly; so it can't use external volumes either.

That was a FUTUREWORK I just tried to address.

As we're already experiencing disk latency issues on the local disks on hetzner every once in a while (or at least that's the running theory) I'm not sure it's a good idea to run the etcd data directory on a network disk .

Usually replacing an etcd node with an empty node will just work; as it will backfill from the other server nodes. (When the data is large; it can help copying over data from the old node first before killing it; and then copying it to the new node; but we can do that manually)

lucendio · 2021-02-22T17:16:20Z

After a verbal discussion, we decided to merge as is and address the following remaining issues in a follow-up PR (#423), because with a slight change of design it will straight-forward to make a kubernetes-hetzner module more robust and specialized.

validate etcd (or only give specific choices) according to the Raft toleration behaviour
change interface design according to the suggestion
decide whether to keep volume creation in there
factor out private network creation

arianvp · 2021-02-22T19:55:06Z

The CI failure is fixed in #424

* define default machine size within the k8s module * remove var in env definition that is used nowhere

* affected: SFT, K8s cluster * removed unnecessary quoting of identifiers/names

THis change set introduces the concept of cluster groups, which allows to define control plane and nodes separately. In the future it can be adapted for any sort of services, where a service might contain of one or more groups. And, it still allows to define a single-machine k8s installation. Features: * allow to define a volume per group * all k8s machines are now sharing a private network Fixes: * move from 'count' to 'for_each' in order to gain more control over which machine is created or destroyed

module: * expose IPs (either nodes public IPs or LB) * allow LB to be optional env: * rename some files to better represent their content

A group a a whole can already be life cycled, no need to explicitly maintain machine identifiers

* inform Kubespray about the LB IP so that it's being added as SAN to kube-api cert

THe LB usually does not send emails, thus using its IP(s) in an SPF record does not make sense. So, we need provide the node IP(s), because the SMTP server (deployed via Helm) will on one them. This change set makes it possible to define a list of IPv4 addresses specifically for the SPF record. If list is empty, no SPF record is being created. NOTE: it would have been nicer to generate the node IPs list similar to how it's done in kubernetes.inventory.tf, but Terraform had other plans. At the moment, I guess it's a bug that Tf wont see the indirect dependency. Otherwise it would not explain why it works having the logic moved into the k8s module.

* move Kubespray/Ansible variables, whose value is not determined by Terraform configuration, over into individual environment configuration * at the same time, set LB variables to ensure kubeconfig (admin.conf) has LB's external IP configured, while cluster-internal kube-apiserver communication may not take a detour through the LB (depends on environment-specific inventory) The following vars are not being migrated: * ansible_python_interpreter -> current kubespray version alreayd reqyuires Ansible 2.9 which has the more sufisticated way of discoveirng the interpreter * helm_enabled -> the Tiller times are over, and if we need, we deploy charts from local file system * kubeconfig_localhost -> will be moved to ansible/kubernetes.yaml * bootstrap_os -> deprecated in Kubespray See https://github.com/zinfra/cailleach/pull/515/commits/49c9a88ff5bcda5888b955b74aaec5052c5f3a0d

This ensures that the conventions for this item within an ENV_DIR is being adhered to. (encrypted: kubeconfig, decrypted: kubeconfig.dec)

…nt containing K8s * moved TF vars in the environment definition into separate files to allow easier referencing from docs

We are now use Ansible v2.8 (#404) but also moved the variable out of the generated inventory (#418). This change re-introduces the setting on a global level. Environments that rely on an older version of Ansible as well running on older systems still have to set in the respective inventory.

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch 2 times, most recently from f1c5cd5 to 2c1e37a Compare February 10, 2021 00:29

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch 3 times, most recently from 9a27dbc to 186793a Compare February 16, 2021 21:40

lucendio commented Feb 16, 2021

View reviewed changes

lucendio marked this pull request as ready for review February 17, 2021 01:30

akshaymankar reviewed Feb 17, 2021

View reviewed changes

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from 186793a to a43db9c Compare February 18, 2021 14:25

arianvp reviewed Feb 19, 2021

View reviewed changes

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from a43db9c to 48e4884 Compare February 22, 2021 15:05

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from 48e4884 to 5d6c6ae Compare February 22, 2021 17:18

lucendio mentioned this pull request Feb 22, 2021

[RFC] Introduce Terraform interface modules to reflect atomic pieces of infrastructure #423

Closed

4 tasks

lucendio added 11 commits February 22, 2021 21:10

[tf:env+k8s-module] Rename some vars

605bbcd

* define default machine size within the k8s module * remove var in env definition that is used nowhere

[tf:env] Shuffle k8s objects around

79e0247

[tf:env] Splitting up inventory code based on service / piece of infra

2d81c40

* affected: SFT, K8s cluster * removed unnecessary quoting of identifiers/names

[tf:env+k8s-module] Add load balancer to the module

62bf912

module: * expose IPs (either nodes public IPs or LB) * allow LB to be optional env: * rename some files to better represent their content

[tf:env] Moving from machine_ids to machine_count

82d6210

A group a a whole can already be life cycled, no need to explicitly maintain machine identifiers

[tf:env+k8s-module] Allow to overwrite default LB port mapping

189efb2

* inform Kubespray about the LB IP so that it's being added as SAN to kube-api cert

[ansible:k8s] Properly rename kubeconfig after bootstrapping K8s cluster

bad6945

This ensures that the conventions for this item within an ENV_DIR is being adhered to. (encrypted: kubeconfig, decrypted: kubeconfig.dec)

[tf:env,ansible,helm] Add documentation on how to set up an environme…

9b3fc87

…nt containing K8s * moved TF vars in the environment definition into separate files to allow easier referencing from docs

lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from 5d6c6ae to 9b3fc87 Compare February 22, 2021 20:10

arianvp approved these changes Feb 22, 2021

View reviewed changes

lucendio merged commit ed85895 into develop Feb 23, 2021

lucendio deleted the tf-env-refactor-k8s-creation-hetzner branch February 23, 2021 10:42

lucendio mentioned this pull request Mar 4, 2021

[ansible] Re-introduce interpreter path setting globally #434

Merged

Conversation

lucendio commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples

Uh oh!

arianvp commented Feb 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arianvp commented Feb 10, 2021

Uh oh!

lucendio commented Feb 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akshaymankar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arianvp Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arianvp Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arianvp Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arianvp Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucendio commented Feb 22, 2021

Uh oh!

arianvp commented Feb 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lucendio commented Feb 8, 2021 •

edited

Loading

arianvp commented Feb 10, 2021 •

edited

Loading

lucendio commented Feb 10, 2021 •

edited

Loading

arianvp Feb 17, 2021 •

edited

Loading

arianvp Feb 19, 2021 •

edited

Loading

arianvp Feb 22, 2021 •

edited

Loading

arianvp Feb 19, 2021 •

edited

Loading