Introduce 'crd' datasource to exploit custom k8s resources #111

tommasopozzetti · 2020-04-09T20:12:28Z

This PR is the logical continuation of the work started in #75 and #89.

The Kubernetes world is evolving around the concept of operators and custom resources to manage the various aspects and behaviors of components.

This PR introduces a CRD for representing fluentd configurations, to move away from the generic ConfigMap. This approach has a double benefit. On one hand, it allows to have a dedicated resource
for fluentd configurations, which enables to manage them in a more consistent way. It is possible, for example, to obtain fluentd configs simply by querying the cluster for that resource

kubectl get fluentdconfig

It is also possible to create new configs for a new application simply by attaching a FluentdConfig resource to the application manifests, rather than using a more generic ConfigMap with specific names and/or labels.

apiVersion: logging.csp.vmware.com/v1beta1
kind: FluentdConfig
metadata:
  name: fd-config
spec:
  fluentconf: |
    <match kube.ns.**>
      @type relabel
      @label @NOTIFICATIONS
    </match>

    <label @NOTIFICATIONS>
     <match **>
       @type null
     </match>
    </label>

On the other hand, the use of CRDs should benefit the efficiency of the config-reloader. In fact, especially in clusters with a lot of applications that utilize ConfigMaps dynamically (for
example for leader election purposes), a lot of updates happen to ConfigMaps, which are delivered to the config-reloader. The config-reloader needs to perform a bit of work for each of those updates to realize they are not meant for it and discard them. Using CRDs, the config-reloader can register with the API Server to watch only FluentdConfig resources, thus being notified only when really necessary.

The "CRD" has been introduced as a new datasource, configurable through the helm chart values, to allow users that are currently set up with ConfigMaps and do not want to perform the switchover to FluentdConfigs, to be able to keep on using them.

The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed.

carpenterm · 2020-04-17T15:50:03Z

Hi @tommasopozzetti thank you for this contribution, looks great! May take a little time for us to review and test this out, but it's definitely something we are keen to merge.

carpenterm · 2020-04-17T15:56:16Z

config-reloader/datasource/kubedatasource/fluentdconfig.go

+	"github.com/vmware/kube-fluentd-operator/config-reloader/config"
+	kfoClient "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/clientset/versioned"
+	kfoInformers "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/informers/externalversions"
+	kfoListersV1beta1 "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/listers/logging.csp.vmware.com/v1beta1"


Could we please change logging.csp.vmware.com to logs.vdp.vmware.com? Appreciate its a rather pedantic change but aligns more appropriately to our internal group names.

Thanks for the feedback. I will try to get the change done either today or early next week.
I went with logging.csp.vmware.com as it is the prefix used by kube-fluentd-operator in the namespace annotations. Does it make sense to change those too to keep it consistent?

It would make sense to change them and keep them consistent, but we'd just need to highlight it in the release notes of the next release to make sure people weren't relying on it to point to a different configmap name.

vmwclabot · 2020-04-17T22:58:46Z

@tommasopozzetti, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding Signed-off-by: John Doe <john.doe@email.org> to the last line of each Git commit message. The e-mail address used to sign must match the e-mail address of the Git author. Click here to view the Developer Certificate of Origin agreement.

Commit ae4e385 must be signed by tommypozzetti@hotmail.it

tommasopozzetti · 2020-04-17T23:01:33Z

Hi @carpenterm,
I pushed the change from logging.csp.vmware.com to logs.vdp.vmware.com
An example of FluentdConfig resource now will be:

apiVersion: logs.vdp.vmware.com/v1beta1
kind: FluentdConfig
metadata:
  name: fd-config
spec:
  fluentconf: |
    <match kube.ns.**>
      @type relabel
      @label @NOTIFICATIONS
    </match>

    <label @NOTIFICATIONS>
     <match **>
       @type null
     </match>
    </label>

dimalbaby · 2020-04-28T09:09:09Z

config-reloader/datasource/kubedatasource/fluentdconfig.go

+// CRD Management for autoinstall
+///////////////////////////////////////////
+
+var fluentdConfigCRD = v1.CustomResourceDefinition{


@tommasopozzetti This is a great PR. I think v1 for crd is supported from 1.16 onwards. Can we handle this based on a feature flag? or do you think any better approach to handle between v1 and v1beta1?
I observed this when I pulled this change and tested it locally. In worst-case scenario, if those crds are created outside the code, can that be provided as a workaround?

Since this a major feature, can you also consider a doc update ?

Agreed, we need to ensure it doesn't fail on older Kubernetes versions prior to 1.16.

The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed.

We should definitely have the ability to turn this off and do it manually. It's an interesting approach to install the CRDs from the code but we need to decide if this is something we really want to do longer term - if we install from the code we also need to maintain/upgrade in code vs a YAML file. This has the advantage of having less loosely coupled dependencies, I guess it depends how frequently we see that CRD changing.

Thank you for the feedback.
This approach to CRDs is similar to the one used by coreos/dex. I like it as it simplifies the process of managing CRDs for the user, it continuously monitors CRD availability at each restart of the operator and it couples the CRD definition to the code, allowing them to easily evolve together.
In case of CRD evolution, the code can evolve to be able to apply updates if required automatically, again taking a possible source of errors out of the hands of the users.
For the problem of Kubernetes versions preceding 1.16, I think we can add support also for v1beta1 and automatically choose the correct version based on the environment.
We could also decide to provide this auto-management of CRDs as a controllable feature that we can activate through the chart. This will however require to maintain also the CRD yaml file so that users can apply it manually (or through the helm chart) if this auto-management feature is not active.
Let me know your thoughts and I'll be happy to add the needed modifications to this PR.

I am fine with the approach to handle both the versions in the code. Since it performs an existence check, this wont be stopping the users from maintaining it outside. I believe your other commit #110 will clear the old configs, if we toggle the behaviour.

@carpenterm Any thoughts ?

@dimalbaby @carpenterm I pushed the commit adding the capability of automatically choosing the right version of the CRD API based on the system.
Unfortunately I don't have a Kubernetes cluster with version < 1.16 handy at the moment, so could you could help with testing it in that environment?

I tend to agree, we run clusters with lots of different service teams working on different products sharing the same kube-fluentd-operator instance and I'm assuming that others are also in that situation. Migrating all of the teams to CRDs at the same time we switch the server to CRDs would be a bit of a headache, it would be better if we could have both CRDs and ConfigMaps enabled for a while to enable teams to migrate. Would be great to dynamically figure out which one to use based on whether a ConfigMap or CRD exists, or if that's too much work, perhaps they could control it themselves via an annotation on the namespace and the cluster administrator could set a default?

Thank you for testing and for fixing the bug @dimalbaby

We could look at creating an extra "virtual" datasource that basically internally manages an instance of the configmap datasource and an instance of the crd datasource and concats the fluentd configs from both. This would allow running both datasources side by side. This virtual datasource can then be deprecated and possibly removed in a later release if there is interest in dropping support for configmaps or for migration (this is mostly to avoid the complexity of managing an extra datasource as the operator evolves). Let me know your thoughts.

I agree @tommasopozzetti. But what would happen if there is collision- loading the configs for a namespace from both the sources? Probably the CRDs will override ?

I think that it should be left up to who is managing the namespace to handle the issue. I see it as the same issue of what would happen if in the multimap datasource multiple configmaps are defined with collisions.
The datasource simply provides the input, and this new virtual datasource would allow providing inputs from two sources at the same time, but it is left up to who is writing that input to provide a correct one. If a collision generates a syntactic error, fluentd validation will catch it. If, instead, it creates a semantic error, then it is allowed, just as it would be allowed in the multimap datasource or even in the default datasource in case that same concatenated config is put in one configmap.
The correct migration process would require first installing the new release and enabling this migration mode. Then, one by one, the configmaps should be replaced by the corresponding CRDs. This coexistence will allow this process to last for as long as needed to get all teams up to speed, and finally the datasource can be switched to pure CRD, thus disabling configmaps alltogether.

That make sense. Thanks @tommasopozzetti

tommasopozzetti · 2020-05-13T17:18:41Z

@dimalbaby @carpenterm I have pushed another commit adding a feature called crd-migration-mode which includes the changes we discussed.
My proposal for the process of adopting the new release of the config-reloader is the following:

A new user, who is installing kube-fluentd-operator for the first time, should set the datasource: crd option in the chart and therefore be able to use the new system right away without any issues (maybe it makes sense to set crd as the datasource which is installed if nothing else is specified in the new chart?)
A user who is already using kube-fluentd-operator with either datasource: default or datasource: multimap will update to the new chart. He will need to keep the datasource property as is but he will set the crdMigrationMode property to true. By doing so, the config-reloader will be launched with both the crd datasource and the legacy datasource (either default or multimap depending on what was configured in the datasource property) enabled. The user will then slowly migrate one by one all configmap resources to the corresponding fluentdconfig resources. Finally, when the process is complete, the Helm release can be upgraded by changing the crdMigrationMode property to false and switching the datasource property to crd. This will effectively disable the legacy datasource and set the config-reloader to only watch fluentdconfig resources.

Let me know your thoughts on this

This commit introduces an interface to decouple the type of Kubernetes resource used to store the fluentD configs from the K8S datasource. This enhancement should bring no changes in behavior but it should set the stage for allowing different types of k8s resources to hold fluentD configs. This is in particular needed to simplify the creation of a CRD to specify a dedicated resource to hold fluentD configs. Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

The Kubernetes world is evolving around the concept of operators and custom resources to manage the various aspects and behaviors of components. This commit introduces a CRD for representing fluentd configurations, to move away from the generic ConfigMap. This allows to have a dedicated resource for fluentd configurations, thus allowing to easily obtain fluentd configurations by querying the `fluentdconfig` resource, or create new configurations for applications by simply attaching a `fluentdconfig` resource to the application specs. Furthermore, the use of CRDs should benefit the efficiency of the config-reloader. In fact, especially in clusters with a lot of applications that utilize ConfigMaps dynamically (for example for leader election purposes), a lot of updates happen to ConfigMaps, which are delivered to the config-reloader which has to realize they are not meant for it and discard them. Using CRDs, the config-reloader can register with the API Server to watch only `fluentdconfig` resources, thus being notified only when really necessary. Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

…ion must be specified as top level fields for v1beta1 Signed-off-by: Dimal Baby <dbaby@vmware.com>

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

tommasopozzetti · 2020-05-13T18:08:59Z

I also rebased onto master to solve conflicts

dimalbaby · 2020-05-18T04:39:26Z

@tommasopozzetti I am merging the changes. Thanks for addressing the comments. Great PR

carpenterm reviewed Apr 17, 2020

View reviewed changes

vmwclabot added the dco-required label Apr 17, 2020

tommasopozzetti force-pushed the introduce-crd branch from ae4e385 to 5c88ff7 Compare April 17, 2020 22:59

vmwclabot removed the dco-required label Apr 17, 2020

dimalbaby reviewed Apr 28, 2020

View reviewed changes

tommasopozzetti and others added 9 commits May 13, 2020 12:59

Add configs to chart to enable CRD datasource

d1b0cfa

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Switch to logs.vdp.vmware.com API name

1b863a3

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Add CRD v1beta1 compatibility

4545b7e

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Refactor CRD code to reduce duplication

16f214a

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Schema/subresources/additionalProperties attributes for a single vers…

394cd22

…ion must be specified as top level fields for v1beta1 Signed-off-by: Dimal Baby <dbaby@vmware.com>

Add MigrationMode to simplify migration process to CRDs

b9239ad

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

Add debug log and encapsulate configmap entryName

9089fc0

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>

tommasopozzetti force-pushed the introduce-crd branch from 125d0ad to 9089fc0 Compare May 13, 2020 18:08

dimalbaby merged commit 8db291e into vmware:master May 18, 2020

tommasopozzetti deleted the introduce-crd branch October 14, 2020 21:04

josephmcasey mentioned this pull request Jan 6, 2023

docs(FluentdConfig): instructions for installing without helm produce decoding error #362

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce 'crd' datasource to exploit custom k8s resources #111

Introduce 'crd' datasource to exploit custom k8s resources #111

tommasopozzetti commented Apr 9, 2020

carpenterm commented Apr 17, 2020

carpenterm Apr 17, 2020

tommasopozzetti Apr 17, 2020

carpenterm Apr 17, 2020

vmwclabot commented Apr 17, 2020

tommasopozzetti commented Apr 17, 2020

dimalbaby Apr 28, 2020

carpenterm Apr 28, 2020

tommasopozzetti Apr 28, 2020

dimalbaby Apr 30, 2020

tommasopozzetti May 6, 2020

carpenterm May 11, 2020

tommasopozzetti May 11, 2020

dimalbaby May 12, 2020 •

edited

Loading

tommasopozzetti May 12, 2020

dimalbaby May 12, 2020

tommasopozzetti commented May 13, 2020

tommasopozzetti commented May 13, 2020

dimalbaby commented May 18, 2020

Introduce 'crd' datasource to exploit custom k8s resources #111

Introduce 'crd' datasource to exploit custom k8s resources #111

Conversation

tommasopozzetti commented Apr 9, 2020

carpenterm commented Apr 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmwclabot commented Apr 17, 2020

tommasopozzetti commented Apr 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimalbaby May 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tommasopozzetti commented May 13, 2020

tommasopozzetti commented May 13, 2020

dimalbaby commented May 18, 2020

dimalbaby May 12, 2020 •

edited

Loading