-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firewall rules for Autopilot clusters are ineffective #1230
Comments
This might indeed need further configuration. Most likely we are not setting the network tags appropriately for the node pools that autopilot auto-provisions. |
Maybe. The firewall rules that are automatically created by GKE do include a tag, but it's not just the name of the cluster — there's a machine-generated suffix, as well. I could not figure out how to generate that tag in my own config. |
Yeah, we currently rely on the network tag matching the cluster name. For autopilot, that association might not hold true. Unfortunately it looks like we're blocked on adding support to the provider: hashicorp/terraform-provider-google#11051 |
This is what I did to create a rule to target nodes in autopilot cluster after the node pool is created.
Its neither elegant nor robust. But afaik there's no way to interrogate GKE to get a node's network tags. |
I think the issue might be about how Autopilot modules set network tags for firewall rules: target_tags = [local.cluster_network_tag] The cluster_network_tag = "gke-${var.name}" Other (GKE-managed) firewall rules have a different tag: |
Yes, as @ferrarimarco highlights, the target_tags for the FW rule are a tag that is different to the gke-managed tag that ends up on the nodes that we can't access other than seeing it turn up in FW rules for intra-cluster coms or LBs. The problem is this: the shadow firewall rules, webhook firewall rules etc - their target_tags are a tag that is never attached to the cluster. I have a PR open to attach the edit to add: This has the highest impact for the intra-egress rule which is required if the vpc has a deny 0.0.0.0/0 Without this rule the control plane can't talk to respective other CPs and the cluster fails to even come up. PR if people have thoughts: #1655 |
Is the autopilot module be setting |
The beta-autopilot-private-cluster and beta-autopilot-public-cluster modules set I don't know the how or what of the tag we see from the gke auto created FW rules. Presumably it is on the nodes though because the gke provisioned firewall rules do work and also I note that service of type loadBalancer's respective provisioned firewalls for healthchecks have this same tag as a target_tag work - but this tag is inaccessible of course other than the "reflections" we see of it in the FW rules etc, because we can't get our hands on the nodes with autopilot clusters. But we can add auto provisioning network tags so we can add our "cluster_network_tag" to that to make sure additional FW rules are effective at least. |
I believe the issue here is that module creates its own local terraform-google-kubernetes-engine/modules/beta-autopilot-private-cluster/main.tf Line 111 in 493149d
terraform-google-kubernetes-engine/modules/beta-autopilot-private-cluster/firewall.tf Line 36 in 29d9259
However, when creating the cluster, the module takes user-provided variable terraform-google-kubernetes-engine/modules/beta-autopilot-private-cluster/cluster.tf Lines 100 to 107 in 493149d
I believe the firewall rules should just use |
@gustaff-weldon Hey thanks for commenting, haven't really got any eyes on this so appreciate the engagement! hmm I don't think we can use And the second thing I think is it would be a bit obscuring. This var is understood as the list of network tags to attach to the nodes, which people may have several of - like to allow reaching internal services or whatever, we don't want to create new FW rules targeting those tags, they already have a purpose presumably - for instance, the terraform-example-foundation the allow-google-apis tag is an example of a tag that would be on a bunch of VMs that are fairly "locked down" and you may chuck it on your cluster too, but we wouldn't want all the instances with that to also get a bunch of other FW rules now target them as well, it's an option for users to make sure any other required FW rules or tag-values they already know they'll need get on the nodes. We already have the local IDK if you've taken a look but lmk what you think https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/pull/1655/files#diff-a3a07be3819af6f1a65ed243a613e69b544b7274d248a4b4d7d4d127aec40aee |
@GorginZ thanks for getting back. What I suggested is only one of the possible solutions. I was thinking that you might raise an empty network tags as a concern :) And I understand that you do not want to create firewall rules targeting Your solution would work, I would also consider keeping conditions on locals level eg.:
Firewall rules would target |
Thanks @gustaff-weldon. Conditions in locals is probably nicer. |
Hey guys I dont know if this will be helpful for you or not but I see its been giving you a hassle for a while, such an oversight by Google. So I automated the process and shared the script, this is a docker container that goes out and fetches the current list of IP's on your cluster and then updates your firewall rule removing any ip's not in the cluster and making sure that the IP's are there for the current cluster nodes. It does not update the firewall rules unless there is a discrepancy. It could probably be improved, I just needed something that worked so that our production cluster was not impacted every time autopilot updated our nodes. |
TL;DR
The
add_cluster_firewall_rules
and related firewall options do not appear to have any effect for Autopilot clusters. I've configured all of them, with higher priority than my default firewall rules in a shared VPC, but the hit counters show 0 hits. Meanwhile, the lower priority generic rules in my firewall are showing hits and are permitting the cluster to pass traffic.Expected behavior
The higher priority firewall rules added by the module should be in effect, passing traffic, and showing hits.
Observed behavior
The rules are ignored.
Terraform Configuration
Terraform Version
Additional information
I'm not yet very experienced with GKE and firewall rules, but I have not yet figured out how to make firewall tags work with GKE Autopilot clusters in my own configs. It was only after I removed them that my firewall rules became effective for GKE Autopilot clusters. I noticed that this module is applying tags to the generated firewall rules (where the tag is equal to the GKE Autopilot cluster's name), so perhaps that's what's going on?
The text was updated successfully, but these errors were encountered: