Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store ClusterConfig #642

Closed
errordeveloper opened this issue Mar 18, 2019 · 20 comments
Closed

store ClusterConfig #642

errordeveloper opened this issue Mar 18, 2019 · 20 comments

Comments

@errordeveloper
Copy link
Contributor

With another API version bump (v1alpha5) we should take the opportunity and start storing cluster config inside the cluster. We will probably need a design proposal for this that will outline the intended uses, limitations, ...etc.

It's also been suggested that SMPS (#583) maybe a good option, but we should probably start with a ConfigMap.

Currently we have a few use-cases:

  • eksctl get -o=yaml
  • eksctl apply

We might still need to create something that lets us compile ClusterConfig based on CloudFormation stacks, but caching ClusterConfig inside the cluster would still be a good idea anyway.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Mar 20, 2019

The challenge with this is how to track whether user has interfered with cluster config outside of eksctl.
A plausible approach that comes to mind is this:

  • store config used when anything gets created or modified
  • store checksums and details of the stacks at the time when the create/update action was completed
  • prior to any action:
    • validate checksum
    • run CloudFormation drift detector
    • on failure, diff stored stack template agains actual stack template to help user understand what's the problem

This way we should be able to avoid having to compile configuration objects from a running cluster, essentially by tracking all changes we make with a proof that we can verify when needed.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 20, 2019

Sharing quick thoughts:

  • SSM Parameter Store is known to have very low API rate limit, so probably using DynamoDB or more simply S3(depends on our requirements. DynamoDB should be better for powerful querying capability)
  • I'd love it if eksctl-created CloudFormation stacks remaining as the authoritative source of info for the real clusters. That is, it would make sense to me if eksctl get -o yaml $cluster_name fetches the cfn stack, locates the stored config via stack tag(like eksctl.io/stored-at: s3://mybucket/eksctl/clusters/YOUR_CLUSTER_NAME), prints the stored config.

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Mar 20, 2019

@mumoshu thanks for sharing this!

SSM Parameter Store is known to have very low API rate limit

I didn't know, personally, I wasn't sure if it could be our best authoritative store, I think it could of use as a destination to store a record of cluster configuration for consumption by other things that integrate with SSM, as it seems like a kind of tool that corporate IT compliance may have good uses for.

I'd love it if eksctl-created CloudFormation stacks remaining

Yes, I think that should be the case, but let's discuss once we have a more concrete implementation ready.

S3 or DynamoDB

I'd use either of those, but I'm hearing from some of the users that introducing yet another AWS service could be a challenge.
Actually, I'm thinking we can to just use Kubernetes ConfigMaps, because EKS is reliable and we don't have to worry about control plane availability. Eventually we can create a CRD, perhaps when API goes to beta. So it makes most sense, as we are certainly looking to implement Cluster API in the near future.

@mumoshu
Copy link
Contributor

mumoshu commented Mar 20, 2019

Actually, I'm thinking we can to just use Kubernetes ConfigMaps, because EKS is reliable and we don't have to worry about control plane availability

Yep, that makes sense!

From an another perspective, I started seeing various multi-cluster deployment tool, not only federation v2 but also shipper and so on, that relies on a "control-plane Kubernetes cluster".

eksctl controller deployed onto the control-plane EKS cluster, storing cluster configs as configmaps, seems to fit naturally to the overall picture of an organization that manages multiple eksctl clusters.

I'd suggest DynamoDB or S3 if we REALLY need stores other than ConfigMaps. But it would be the best for me if we don't need to add them for the first implementation.

@errordeveloper
Copy link
Contributor Author

eksctl controller deployed onto the control-plane EKS cluster

Yes, so the idea is to finish up #19, then implement #20, and after that we can focus on alignment with Cluster API.

Shipper is an interesting example, but there others like kubermatic/machine-controller, Gardener and, of course, Giant Swarm.

@errordeveloper
Copy link
Contributor Author

This would also potentially allow us to implement monitoring alters for divergence.

@pawelprazak
Copy link

@errordeveloper errordeveloper removed this from the v1alpha5 milestone Apr 29, 2019
@errordeveloper
Copy link
Contributor Author

I'm removing this from the milestone, as MVP is not a simple as I hoped for. We should also create a design document for this actually.

@martina-if
Copy link
Contributor

related to #2255

@JoshuaFox
Copy link

JoshuaFox commented Jun 3, 2020

Apparently the reason that eksctl cannot read the configuration of an EKS cluster is that the cluster config is not built into the EKS clusters but rather must be stored separately.

And yet the information does exist -- you can get it with the AWS SDK for EKS, for example with function DescribeNodegroup.

So... if the SDK can do it, why can't eksctl?

@martina-if
Copy link
Contributor

@JoshuaFox cross-replying here, as I said in #2255 eksctl can definitely retrieve all or most of the configuration used to create the cluster, but this has not been implemented yet. If there are specific things that you are missing in the output of any of the eksctl get subcommands we would be very happy to review a PR for that.

@Callisto13
Copy link
Contributor

Callisto13 commented Jan 20, 2021

I'm removing this from the milestone, as MVP is not a simple as I hoped for. We should also create a design document for this actually.

Hey @errordeveloper we are looking at perhaps bringing this into scope again, can you tell us anything about complications/complexity you found when you looked into it? Hopefully you can save us covering the same ground 😄

@michaelbeaumont
Copy link
Contributor

I think we should only do this if we can't recreate the config from the existing resources, as that's the reason why terraform introduced state, but it's not clear we have the same restriction with EKS.

@errordeveloper
Copy link
Contributor Author

kops has the ability of discovering resources that belong to it, but it's not quite trivial and version dependent.

For deletion purposes CloudFormation is actually very handy, as you only need to delete a stack and don't need to know what are the resources that a certain version of eksctl created and how those were labeled etc. (At least that's the theory).

The reason for storing the config was that it would make it possible to re-generate the CloudFormation template for updates, and if there is a drift, it could be detected also.

I did share some more extensive notes on this before... I'll ping you on slack.

@Callisto13
Copy link
Contributor

pasting in highlights from the above mentioned slack conversation for context:

so my original thinking was that not storing the config in the cluster is a shortcoming
one is ought to be able to look at a cluster and tell how is it configured
getting that info from AWS APIs is technically possible, but quite tedious
and actually different version of eksctl may produce different CloudFormation templates for the same config, e.g. due to a bug/workaround
and config is certainly meant to be the source of truth, right? 😉
the idea was basically to store it as cache, and build validation mechanisms around that
the reason I gave up on the MVP originally was probably because I’ve not had the time for it at all
ah, I recall now what the blocker may had been
there was no way to actually render ClusterConfig to CloudFormation templates without making additional calls
and at the time we kept adding out-of-band calls to AWS for stuff that CloudFormation didn’t support, and those were some of the important newly added features at the time
…so in ideal world one should be able to go from config to CloudFormation templates and back
but going from CloudFomration to config is hard, as CloudFormation has more fields that eksctl is not aware of, and the value may have dramatic effects
that’s basically how I arrived at the idea of caching the config and building cache validation by means of rendering the templates and checking hashes of the results
and if hashes don’t match, I thought we could do a diff and show it to the user
with drift detector being another thing to consider, as it may indicate what resources user messed with
so that’s what that issue was all about…
cause a full update to all stacks was not possible without having correct config
I am quite convinced that terraform wouldn’t make life easier
most just because terraform is designed with the assumption that users interact with it directly
I don’t see a way of wrapping terraform to achieve what eksctl has to achieve
openshift-install wraps terraform actually and hides it from the user completely, but they it in create-only mode
they don’t even use it to delete resources
I’ve looked a few times and haven’t seen any other attemps of creating something more high level on top of terraform
terraform user is really expected to be able to observe state changes
you maybe be able to hide that, but you may end up building an AI of some sort 😉

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label Mar 25, 2021
@Callisto13 Callisto13 removed the stale label Mar 25, 2021
@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the stale label May 27, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jun 1, 2021

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as completed Jun 1, 2021
torredil pushed a commit to torredil/eksctl that referenced this issue May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants