[tf:env+module] Refactor k8s module and introduce LB#418
Conversation
f1c5cd5 to
2c1e37a
Compare
|
Having to manually specify a list machine id's feels like a smell to me. Why can't we use In the way we're currently using |
|
Second remark: I don't know if I would put this much freedom at the vars level. I'd push some of these concerns down to the terraform code. We're now at a point we're sort of duplicating a subset of the terraform DSL in vars. We're giving a bit too much freedom in that it allows for configurations that might not work in practice. E.g. with etcd it only works in uneven configurations. l. I would just hardcode at the terraform level that we have resource "hetzner_server" "controlplane" {
count = 3
name = "controlplane${count.index}"
labels = { "component-classes.wire.com/control-plane": true, "component-classes.wire.com/node": vars.controlplane_is_worker }
}I can get behind being able to define multiple node pools in vars though. That sounds useful (due to the lack of autoscaling groups in hetzner) |
No good reason. You are right here. Especially since the A Note on exposing this kind of level in the environment configuration: in a follow-up verbal discussion, we decided to go for this for now, migrate the the different configurations across all environments to the current schema and then reiterate on the overall approach by looking into the idea of having some kind of thin TF layer in the form of opinionated 'template modules' which could then leverage the modules doing the heavy lifting and in turn would look very similar (or maybe even simpler) compared to the |
9a27dbc to
186793a
Compare
| - file: | ||
| path: "{{ artifacts_dir }}/admin.conf" | ||
| state: absent | ||
|
|
There was a problem hiding this comment.
@arianvp this probably breaks an assumption for in the offline package, doesnt it?
Note, this is the first step (out of two) to automate sopsing kubeconfig in an $ENV_DIR when bootstrapping a K8s cluster.
akshaymankar
left a comment
There was a problem hiding this comment.
I dropped some questions inline.
Apart from that I have one question: What is the migration path for the old environments? Do we recreate them or there is some terraform state migration that we have to do?
| # group_name = string | ||
| # machine_id = string | ||
| # machine_type = string | ||
| # component_classes = list(string) |
There was a problem hiding this comment.
As I understand there are 2 valid component classes: "controlplane" and "node" and terraform provides no way of doing enums. So, can we do the next best thing which would be to document and validate this?
I have a feeling that we are over-generalizing here. Nobody wants a "machine" in their kubernetes cluster. Maybe it is better to accept "pools" for nodes and controlplane separately. This will also get rid of the enum problem.
There was a problem hiding this comment.
This seems to be the same concern as I expressed here #418 (comment)
My biggest concern is that we are giving too much freedom. E.g. nothing is stopping you from accidentally assigning 2 or 4 control-plane nodes; whilst those setups do not work in etcd. you should only have 1 or 3 control-plane nodes and in very rare cases where you have many regions or zones, an etcd cluster of 5 is possible (but has bad performance. As every write needs consensus of 4 nodes instead instead of 2 )
I would still suggest us having two sections:
control_plane = {
count = 3 # can be 3 or 1; but no other value
is_worker = true # Whether the control plane group should be treated as a node pool or not.
machine_type = "cx31"
}
node_pools = [
{ count = 200 name = "basic", machine_type = "cx41" }
{ count = 50, name = "highpower", machine_type = "cx61" }
]
| variable "k8s_cluster" { | ||
| description = "represents Kubernetes cluster" | ||
| # type = object({ | ||
| # cloud = string | ||
| # load_balancer = optional(bool) | ||
| # load_balancer_ports = optional(list( | ||
| # object({ | ||
| # name = string | ||
| # protocol = string | ||
| # listen = number | ||
| # destination = number | ||
| # }) | ||
| # )) | ||
| # machine_groups = list(object({ | ||
| # group_name = string | ||
| # machine_count = optional(number) | ||
| # machine_type = string | ||
| # component_classes = list(string) | ||
| # volume = optional(object({ | ||
| # size = number | ||
| # format = optional(string) | ||
| # })) | ||
| # })) | ||
| # }) | ||
| type = any | ||
| default = {} | ||
| } |
There was a problem hiding this comment.
Why do we want to encode all kinds of k8s_clusters in one type of specification? I have a feeling that this will bite us in future as we might get into specifying AWS specific things somehow. Maybe it is better to just have this variable be called "hetzner_k8s_cluster". When we start providing an option to support AWS, we can just create another variable called "aws_k8s_cluster"?
There was a problem hiding this comment.
We should also ensure that we don't create even number of etcds as it provide the same fault tolerance as creating 1 less etcd.
| } | ||
| vars = merge( | ||
| { | ||
| # NOTE: please consider setting variables in the inventory of the respective environment instead |
There was a problem hiding this comment.
I maybe not have enough context, but I cannot figure out what this comment means?
Also, why is merging with an empty map required?
| load_balancer_type = ( | ||
| length(var.machines) <= 5 ? "lb11" : ( | ||
| length(var.machines) <= 75 ? "lb21" : ( | ||
| length(var.machines) <= 150 ? "lb31" : null)) | ||
| ) |
There was a problem hiding this comment.
I see that this is a required thing which could potentially be null, does terraform plan fail in this case? What error would the user get?
There was a problem hiding this comment.
It fails with
Error: Required attribute is not set
on ../modules/hetzner-kubernetes/load-balancer.resources.tf line 7, in resource "hcloud_load_balancer" "lb":
7: load_balancer_type = null
This is IMHO sufficiently self-explanatory after looking at the corresponding LOCs. Please yell if you disagree!
There was a problem hiding this comment.
Hmm it is a bit scary though. Say I bump my cluster from 5 to 6 nodes; that will get the load-balancer destroyed and recreated and cause a period of downtime. This sounds less than ideal. Wouldn't this better be something we should bubble up and let the administrator make a concious decision about?
There was a problem hiding this comment.
You absolutely have a point here, but before drawing any conclusion, I think we should test whether it actually causes destruction and downtime. Additionally, that the LB might get destroyed should be completely transparent in the TF plan. Wouldn't this suffice as a heads-up and pull back if necessary?
There was a problem hiding this comment.
I'm convinced :) Consider this resolved
186793a to
a43db9c
Compare
| } | ||
|
|
||
|
|
||
| resource "hcloud_server_network" "snw" { |
There was a problem hiding this comment.
I think this should preferably not live in the kubernetes module, but should be an input to the kubernetes module.
variable "network_id" {
description = "The network in which to deploy this kubernetes cluster"
}Say I have another module defining cassandra cluster or elasticsearch cluster; I want these to be in the same private network as the kubernetes cluster. (Because I want kubernetes nodes to be able to talk to these services).
This can then be composed in the environment folder :
resource "hcloud_server_network" "wire_network" {
}
module "cassandra-cluster" {
source = "../modules/hetzner-cassandra"
network_id = hcloud_server_network.wire_network.id
count = 24
}
module "elasticsearch-cluster" {
source = "../modules/hetzner-elasticsearch"
network_id = hcloud_server_network.wire_network.id
count = 24
}
module "k8s-cluster" {
source = "../modules/hetzner-kubernetes"
network_id = hcloud_server_network.wire_network.id
count = 5
}
Another option would be to let the kubernetes module provide the private network as an output, but this is harder for refactoring and a bit awkward (Why would the cassandra module have to depend on the kubernetes module; for example)
See Also the section about Dependency Inversion in the module composition guide.
There was a problem hiding this comment.
The scenario you are describing was exactly @jmatsushita and my reasoning behind introducing machine_groups. This allows to create only K8s groups (controlpales & nodes) but also other groups in addition, by deriving the Ansible groups from whatever items are defined in component_classes. Meaning a item of component_classes equals to an Ansible inventory group. This way, all machines would share the same private network.
There was a problem hiding this comment.
Hmm I really feel like we're trying to re-invent the wheel here a bit too much still. What do we gain from using machine_groups in tfvars over defining a resource (or a module) in terraform code? We're coming up with our own DSL for defining machine groups; which terraform itself already has... Defining how we provision cassandra and elasticsearch groups whilst calling the kubernetes module also sounds somewhat confusing to me. I am still not really fond of this idea.
There was a problem hiding this comment.
Why I am not fond of it:
The terraform code alone will not describe the topology of a Wire installation anymore. There will just be one node in the graph called "machine" and it's up to the user to define a topology in their tfvars that makes sense.
If we instead have explicit cassandra, kubernetes and elasticsearch hcloud_server resources (or modules if there is a group of resources you want to abstract) then it's clear from the terraform code what a wire deployment topology looks like.
This means we have to explicitly document how people should configure this topology in tfvars instead of having it evident in the code itself.
|
|
||
|
|
||
| resource "hcloud_volume" "volumes" { | ||
| for_each = { |
There was a problem hiding this comment.
What do we need statically defined volumes for? If we need storage in kubernetes, we should do that through PersistentVolume abstraction and the csi-driver.
Etcd will require local storage to function properly; so it can't use external volumes either.
There was a problem hiding this comment.
That was a FUTUREWORK I just tried to address.
There was a problem hiding this comment.
As we're already experiencing disk latency issues on the local disks on hetzner every once in a while (or at least that's the running theory) I'm not sure it's a good idea to run the etcd data directory on a network disk .
Usually replacing an etcd node with an empty node will just work; as it will backfill from the other server nodes. (When the data is large; it can help copying over data from the old node first before killing it; and then copying it to the new node; but we can do that manually)
a43db9c to
48e4884
Compare
|
After a verbal discussion, we decided to merge as is and address the following remaining issues in a follow-up PR (#423), because with a slight change of design it will straight-forward to make a kubernetes-hetzner module more robust and specialized.
|
48e4884 to
5d6c6ae
Compare
|
The CI failure is fixed in #424 |
* define default machine size within the k8s module * remove var in env definition that is used nowhere
* affected: SFT, K8s cluster * removed unnecessary quoting of identifiers/names
THis change set introduces the concept of cluster groups, which allows to define control plane and nodes separately. In the future it can be adapted for any sort of services, where a service might contain of one or more groups. And, it still allows to define a single-machine k8s installation. Features: * allow to define a volume per group * all k8s machines are now sharing a private network Fixes: * move from 'count' to 'for_each' in order to gain more control over which machine is created or destroyed
module: * expose IPs (either nodes public IPs or LB) * allow LB to be optional env: * rename some files to better represent their content
A group a a whole can already be life cycled, no need to explicitly maintain machine identifiers
* inform Kubespray about the LB IP so that it's being added as SAN to kube-api cert
THe LB usually does not send emails, thus using its IP(s) in an SPF record does not make sense. So, we need provide the node IP(s), because the SMTP server (deployed via Helm) will on one them. This change set makes it possible to define a list of IPv4 addresses specifically for the SPF record. If list is empty, no SPF record is being created. NOTE: it would have been nicer to generate the node IPs list similar to how it's done in kubernetes.inventory.tf, but Terraform had other plans. At the moment, I guess it's a bug that Tf wont see the indirect dependency. Otherwise it would not explain why it works having the logic moved into the k8s module.
* move Kubespray/Ansible variables, whose value is not determined by Terraform configuration, over into individual environment configuration * at the same time, set LB variables to ensure kubeconfig (admin.conf) has LB's external IP configured, while cluster-internal kube-apiserver communication may not take a detour through the LB (depends on environment-specific inventory) The following vars are not being migrated: * ansible_python_interpreter -> current kubespray version alreayd reqyuires Ansible 2.9 which has the more sufisticated way of discoveirng the interpreter * helm_enabled -> the Tiller times are over, and if we need, we deploy charts from local file system * kubeconfig_localhost -> will be moved to ansible/kubernetes.yaml * bootstrap_os -> deprecated in Kubespray See https://github.com/zinfra/cailleach/pull/515/commits/49c9a88ff5bcda5888b955b74aaec5052c5f3a0d
This ensures that the conventions for this item within an ENV_DIR is being adhered to. (encrypted: kubeconfig, decrypted: kubeconfig.dec)
…nt containing K8s * moved TF vars in the environment definition into separate files to allow easier referencing from docs
5d6c6ae to
9b3fc87
Compare
Please note that this PR is best being reviewed by following the commit history
Step 1:
Step 2:
Step 3:
k8s_clusterand machine groupsExamples
Default, in case no k8s cluster should be defined. (Not nice, but allowing it to not exist might complicate code)
A single-node instance (e.g. Wire Demo)
Separate control plane and node groups with volumes (e.g. for etcd)