Skip to content

[tf:env+module] Refactor k8s module and introduce LB#418

Merged
lucendio merged 11 commits intodevelopfrom
tf-env-refactor-k8s-creation-hetzner
Feb 23, 2021
Merged

[tf:env+module] Refactor k8s module and introduce LB#418
lucendio merged 11 commits intodevelopfrom
tf-env-refactor-k8s-creation-hetzner

Conversation

@lucendio
Copy link
Copy Markdown
Contributor

@lucendio lucendio commented Feb 8, 2021

Please note that this PR is best being reviewed by following the commit history

  • Step 1:

    • define default machine size within the k8s module
    • remove var in env definition that is used nowhere
  • Step 2:

    • splitting things up into multiple files
  • Step 3:

    • refactor hetzner-kubernetes module:
      • Features:
        • allow to define a volume per group
        • all k8s machines are now sharing a private network
      • address FUTUREWORK
      • move from 'count' to 'for_each' in order to gain more control over which machine is created or destroyed
    • introduce new interface for the environment configuration
      • move from k8s node count to k8s_cluster and machine groups

Examples

Default, in case no k8s cluster should be defined. (Not nice, but allowing it to not exist might complicate code)

k8s_cluster = {
  cloud = ""
  machine_groups = []
}

A single-node instance (e.g. Wire Demo)

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cpn"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane", "node" ]
    }
  ]
}

Separate control plane and node groups with volumes (e.g. for etcd)

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cps"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane" ]
      volume = {
        size = 10
      }
    },
    {
      group_name = "nodes"
      machine_ids = [ "1", "2" ]
      machine_type = "cx11"
      component_classes = [ "node" ]
    }
  ]
}
  • Step 4:
    • Introduce LB
k8s_cluster = {
  ...
  load_balancer = true
  ...
}
  • Step 5:
    • refactor DNS module to support defining different IPs for SPF

@lucendio lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch 2 times, most recently from f1c5cd5 to 2c1e37a Compare February 10, 2021 00:29
@arianvp
Copy link
Copy Markdown
Contributor

arianvp commented Feb 10, 2021

Having to manually specify a list machine id's feels like a smell to me. Why can't we use count here? You only run into trouble with count if you use it to index variables etc; but for a "I have a pool of nodes that are exactly the same" it seems the right choice to me.
I only care about the size of the node-pool; I should never care about the names of the nodes in a node-pool in kubernetes.

In the way we're currently using count we do not run into the "which machine gets created or destroyed" problem yet. This only occurs when you're using count.index to index some variable. e.g. var.names[count.index]. Which is something we currently do not do https://www.terraform.io/docs/language/meta-arguments/count.html#when-to-use-for_each-instead-of-count

@arianvp
Copy link
Copy Markdown
Contributor

arianvp commented Feb 10, 2021

Second remark:

k8s_cluster = {
  cloud = "hetzner"
  machine_groups = [
    {
      group_name = "cpn"
      machine_ids = [ "1" ]
      machine_type = "cx51"
      component_classes = [ "controlplane", "node" ]
    }
  ]
}

I don't know if I would put this much freedom at the vars level. I'd push some of these concerns down to the terraform code. We're now at a point we're sort of duplicating a subset of the terraform DSL in vars. We're giving a bit too much freedom in that it allows for configurations that might not work in practice. E.g. with etcd it only works in uneven configurations. 1, 3 or 5 nodes. But the variables give us the freedom to configure etcd with 2 nodes. This is not ideal.

l. I would just hardcode at the terraform level that we have 3 control-plane nodes and only make configurable whether we consider them as a worker pool or not.

resource "hetzner_server" "controlplane" {
   count = 3
   name = "controlplane${count.index}"
   labels = { "component-classes.wire.com/control-plane": true,  "component-classes.wire.com/node": vars.controlplane_is_worker }
}

I can get behind being able to define multiple node pools in vars though. That sounds useful (due to the lack of autoscaling groups in hetzner)

@lucendio
Copy link
Copy Markdown
Contributor Author

lucendio commented Feb 10, 2021

Having to manually specify a list machine id's feels like a smell to me. Why can't we use count here?

No good reason. You are right here. Especially since the group_name still allows to life cycle machines by life cycling the whole group. We were already talking about this aspect when we discussed the schema initially. It just happen that I forgot about this, and still had the old way (machine_ids) in the test configuration. (addressed here)

A Note on exposing this kind of level in the environment configuration: in a follow-up verbal discussion, we decided to go for this for now, migrate the the different configurations across all environments to the current schema and then reiterate on the overall approach by looking into the idea of having some kind of thin TF layer in the form of opinionated 'template modules' which could then leverage the modules doing the heavy lifting and in turn would look very similar (or maybe even simpler) compared to the k8s_cluster schema. Downside of this, an environment would start to contain again some actual TF code.

@lucendio lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch 3 times, most recently from 9a27dbc to 186793a Compare February 16, 2021 21:40
- file:
path: "{{ artifacts_dir }}/admin.conf"
state: absent

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arianvp this probably breaks an assumption for in the offline package, doesnt it?

Note, this is the first step (out of two) to automate sopsing kubeconfig in an $ENV_DIR when bootstrapping a K8s cluster.

@lucendio lucendio marked this pull request as ready for review February 17, 2021 01:30
Copy link
Copy Markdown
Member

@akshaymankar akshaymankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped some questions inline.
Apart from that I have one question: What is the migration path for the old environments? Do we recreate them or there is some terraform state migration that we have to do?

# group_name = string
# machine_id = string
# machine_type = string
# component_classes = list(string)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand there are 2 valid component classes: "controlplane" and "node" and terraform provides no way of doing enums. So, can we do the next best thing which would be to document and validate this?

I have a feeling that we are over-generalizing here. Nobody wants a "machine" in their kubernetes cluster. Maybe it is better to accept "pools" for nodes and controlplane separately. This will also get rid of the enum problem.

Copy link
Copy Markdown
Contributor

@arianvp arianvp Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the same concern as I expressed here #418 (comment)

My biggest concern is that we are giving too much freedom. E.g. nothing is stopping you from accidentally assigning 2 or 4 control-plane nodes; whilst those setups do not work in etcd. you should only have 1 or 3 control-plane nodes and in very rare cases where you have many regions or zones, an etcd cluster of 5 is possible (but has bad performance. As every write needs consensus of 4 nodes instead instead of 2 )

I would still suggest us having two sections:

control_plane =  {
  count  = 3  # can be 3 or 1; but no other value
  is_worker = true  #  Whether the control plane group should be treated as a node pool or not. 
  machine_type = "cx31"
}
node_pools = [
{ count = 200 name = "basic",  machine_type = "cx41" }
{ count = 50, name = "highpower",  machine_type = "cx61" }
]

Comment on lines +3 to +29
variable "k8s_cluster" {
description = "represents Kubernetes cluster"
# type = object({
# cloud = string
# load_balancer = optional(bool)
# load_balancer_ports = optional(list(
# object({
# name = string
# protocol = string
# listen = number
# destination = number
# })
# ))
# machine_groups = list(object({
# group_name = string
# machine_count = optional(number)
# machine_type = string
# component_classes = list(string)
# volume = optional(object({
# size = number
# format = optional(string)
# }))
# }))
# })
type = any
default = {}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to encode all kinds of k8s_clusters in one type of specification? I have a feeling that this will bite us in future as we might get into specifying AWS specific things somehow. Maybe it is better to just have this variable be called "hetzner_k8s_cluster". When we start providing an option to support AWS, we can just create another variable called "aws_k8s_cluster"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also ensure that we don't create even number of etcds as it provide the same fault tolerance as creating 1 less etcd.

}
vars = merge(
{
# NOTE: please consider setting variables in the inventory of the respective environment instead
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maybe not have enough context, but I cannot figure out what this comment means?
Also, why is merging with an empty map required?

Comment on lines +7 to +11
load_balancer_type = (
length(var.machines) <= 5 ? "lb11" : (
length(var.machines) <= 75 ? "lb21" : (
length(var.machines) <= 150 ? "lb31" : null))
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is a required thing which could potentially be null, does terraform plan fail in this case? What error would the user get?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails with

Error: Required attribute is not set

  on ../modules/hetzner-kubernetes/load-balancer.resources.tf line 7, in resource "hcloud_load_balancer" "lb":
   7:   load_balancer_type = null

This is IMHO sufficiently self-explanatory after looking at the corresponding LOCs. Please yell if you disagree!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it is a bit scary though. Say I bump my cluster from 5 to 6 nodes; that will get the load-balancer destroyed and recreated and cause a period of downtime. This sounds less than ideal. Wouldn't this better be something we should bubble up and let the administrator make a concious decision about?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You absolutely have a point here, but before drawing any conclusion, I think we should test whether it actually causes destruction and downtime. Additionally, that the LB might get destroyed should be completely transparent in the TF plan. Wouldn't this suffice as a heads-up and pull back if necessary?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I suppose it does.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced :) Consider this resolved

@lucendio lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from 186793a to a43db9c Compare February 18, 2021 14:25
}


resource "hcloud_server_network" "snw" {
Copy link
Copy Markdown
Contributor

@arianvp arianvp Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should preferably not live in the kubernetes module, but should be an input to the kubernetes module.

variable "network_id" {
   description = "The network in which to deploy this kubernetes cluster"
}

Say I have another module defining cassandra cluster or elasticsearch cluster; I want these to be in the same private network as the kubernetes cluster. (Because I want kubernetes nodes to be able to talk to these services).

This can then be composed in the environment folder :

resource "hcloud_server_network" "wire_network" {
}

module "cassandra-cluster" {
  source = "../modules/hetzner-cassandra"
  network_id = hcloud_server_network.wire_network.id
  count = 24
}

module "elasticsearch-cluster" {
  source = "../modules/hetzner-elasticsearch"
  network_id = hcloud_server_network.wire_network.id
  count = 24
}


module "k8s-cluster" {
  source = "../modules/hetzner-kubernetes"
  network_id = hcloud_server_network.wire_network.id
  count = 5
}

Another option would be to let the kubernetes module provide the private network as an output, but this is harder for refactoring and a bit awkward (Why would the cassandra module have to depend on the kubernetes module; for example)

See Also the section about Dependency Inversion in the module composition guide.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario you are describing was exactly @jmatsushita and my reasoning behind introducing machine_groups. This allows to create only K8s groups (controlpales & nodes) but also other groups in addition, by deriving the Ansible groups from whatever items are defined in component_classes. Meaning a item of component_classes equals to an Ansible inventory group. This way, all machines would share the same private network.

Copy link
Copy Markdown
Contributor

@arianvp arianvp Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I really feel like we're trying to re-invent the wheel here a bit too much still. What do we gain from using machine_groups in tfvars over defining a resource (or a module) in terraform code? We're coming up with our own DSL for defining machine groups; which terraform itself already has... Defining how we provision cassandra and elasticsearch groups whilst calling the kubernetes module also sounds somewhat confusing to me. I am still not really fond of this idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why I am not fond of it:

The terraform code alone will not describe the topology of a Wire installation anymore. There will just be one node in the graph called "machine" and it's up to the user to define a topology in their tfvars that makes sense.

If we instead have explicit cassandra, kubernetes and elasticsearch hcloud_server resources (or modules if there is a group of resources you want to abstract) then it's clear from the terraform code what a wire deployment topology looks like.

This means we have to explicitly document how people should configure this topology in tfvars instead of having it evident in the code itself.



resource "hcloud_volume" "volumes" {
for_each = {
Copy link
Copy Markdown
Contributor

@arianvp arianvp Feb 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need statically defined volumes for? If we need storage in kubernetes, we should do that through PersistentVolume abstraction and the csi-driver.

Etcd will require local storage to function properly; so it can't use external volumes either.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a FUTUREWORK I just tried to address.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we're already experiencing disk latency issues on the local disks on hetzner every once in a while (or at least that's the running theory) I'm not sure it's a good idea to run the etcd data directory on a network disk .

Usually replacing an etcd node with an empty node will just work; as it will backfill from the other server nodes. (When the data is large; it can help copying over data from the old node first before killing it; and then copying it to the new node; but we can do that manually)

@lucendio lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from a43db9c to 48e4884 Compare February 22, 2021 15:05
@lucendio
Copy link
Copy Markdown
Contributor Author

After a verbal discussion, we decided to merge as is and address the following remaining issues in a follow-up PR (#423), because with a slight change of design it will straight-forward to make a kubernetes-hetzner module more robust and specialized.

  • validate etcd (or only give specific choices) according to the Raft toleration behaviour
  • change interface design according to the suggestion
  • decide whether to keep volume creation in there
  • factor out private network creation

@arianvp
Copy link
Copy Markdown
Contributor

arianvp commented Feb 22, 2021

The CI failure is fixed in #424

* define default machine size within the k8s module
* remove var in env definition that is used nowhere
* affected: SFT, K8s cluster
* removed unnecessary quoting of identifiers/names
THis change set introduces the concept of cluster groups, which allows to define
control plane and nodes separately. In the future it can be adapted for any sort
of services, where a service might contain of one or more groups. And, it still
allows to define a single-machine k8s installation.

Features:
* allow to define a volume per group
* all k8s machines are now sharing a private network

Fixes:
* move from 'count' to 'for_each' in order to gain more control over which machine
  is created or destroyed
module:
* expose IPs (either nodes public IPs or LB)
* allow LB to be optional

env:
* rename some files to better represent their content
A group a a whole can already be life cycled, no need to explicitly
maintain machine identifiers
* inform Kubespray about the LB IP so that it's being added as SAN to kube-api cert
THe LB usually does not send emails, thus using its IP(s) in an SPF record
does not make sense. So, we need provide the node IP(s), because the SMTP
server (deployed via Helm) will on one them.
This change set makes it possible to define a list of IPv4 addresses
specifically for the SPF record. If list is empty, no SPF record is being
created.

NOTE: it would have been nicer to generate the node IPs list similar to how
it's done in kubernetes.inventory.tf, but Terraform had other plans. At the
moment, I guess it's a bug that Tf wont see the indirect dependency. Otherwise
it would not explain why it works having the logic moved into the k8s module.
* move Kubespray/Ansible variables, whose value is not determined by Terraform
  configuration, over into individual environment configuration
* at the same time, set LB variables to ensure kubeconfig (admin.conf) has
  LB's external IP configured, while cluster-internal kube-apiserver communication
  may not take a detour through the LB (depends on environment-specific inventory)

The following vars are not being migrated:
* ansible_python_interpreter -> current kubespray version alreayd reqyuires Ansible
  2.9 which has the more sufisticated way of discoveirng the interpreter
* helm_enabled -> the Tiller times are over, and if we need, we deploy charts from
  local file system
* kubeconfig_localhost -> will be moved to ansible/kubernetes.yaml
* bootstrap_os -> deprecated in Kubespray

See https://github.com/zinfra/cailleach/pull/515/commits/49c9a88ff5bcda5888b955b74aaec5052c5f3a0d
This ensures that the conventions for this item within an ENV_DIR is being
adhered to. (encrypted: kubeconfig, decrypted: kubeconfig.dec)
…nt containing K8s

* moved TF vars in the environment definition into separate files to allow
  easier referencing from docs
@lucendio lucendio force-pushed the tf-env-refactor-k8s-creation-hetzner branch from 5d6c6ae to 9b3fc87 Compare February 22, 2021 20:10
@lucendio lucendio merged commit ed85895 into develop Feb 23, 2021
@lucendio lucendio deleted the tf-env-refactor-k8s-creation-hetzner branch February 23, 2021 10:42
lucendio added a commit that referenced this pull request Mar 4, 2021
We are now use Ansible v2.8 (#404) but also moved the variable out of the generated
inventory (#418). This change re-introduces the setting on a global level. Environments
that rely on an older version of Ansible as well running on older systems still have to set
in the respective inventory.
lucendio added a commit that referenced this pull request Mar 4, 2021
We are now use Ansible v2.8 (#404) but also moved the variable out of the generated
inventory (#418). This change re-introduces the setting on a global level. Environments
that rely on an older version of Ansible as well running on older systems still have to set
in the respective inventory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants