Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce 'crd' datasource to exploit custom k8s resources #111

Merged
merged 9 commits into from
May 18, 2020

Conversation

tommasopozzetti
Copy link
Contributor

This PR is the logical continuation of the work started in #75 and #89.

The Kubernetes world is evolving around the concept of operators and custom resources to manage the various aspects and behaviors of components.

This PR introduces a CRD for representing fluentd configurations, to move away from the generic ConfigMap. This approach has a double benefit. On one hand, it allows to have a dedicated resource
for fluentd configurations, which enables to manage them in a more consistent way. It is possible, for example, to obtain fluentd configs simply by querying the cluster for that resource

kubectl get fluentdconfig

It is also possible to create new configs for a new application simply by attaching a FluentdConfig resource to the application manifests, rather than using a more generic ConfigMap with specific names and/or labels.

apiVersion: logging.csp.vmware.com/v1beta1
kind: FluentdConfig
metadata:
  name: fd-config
spec:
  fluentconf: |
    <match kube.ns.**>
      @type relabel
      @label @NOTIFICATIONS
    </match>

    <label @NOTIFICATIONS>
     <match **>
       @type null
     </match>
    </label>

On the other hand, the use of CRDs should benefit the efficiency of the config-reloader. In fact, especially in clusters with a lot of applications that utilize ConfigMaps dynamically (for
example for leader election purposes), a lot of updates happen to ConfigMaps, which are delivered to the config-reloader. The config-reloader needs to perform a bit of work for each of those updates to realize they are not meant for it and discard them. Using CRDs, the config-reloader can register with the API Server to watch only FluentdConfig resources, thus being notified only when really necessary.

The "CRD" has been introduced as a new datasource, configurable through the helm chart values, to allow users that are currently set up with ConfigMaps and do not want to perform the switchover to FluentdConfigs, to be able to keep on using them.

The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed.

@carpenterm
Copy link
Contributor

Hi @tommasopozzetti thank you for this contribution, looks great! May take a little time for us to review and test this out, but it's definitely something we are keen to merge.

"github.com/vmware/kube-fluentd-operator/config-reloader/config"
kfoClient "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/clientset/versioned"
kfoInformers "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/informers/externalversions"
kfoListersV1beta1 "github.com/vmware/kube-fluentd-operator/config-reloader/datasource/kubedatasource/fluentdconfig/client/listers/logging.csp.vmware.com/v1beta1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please change logging.csp.vmware.com to logs.vdp.vmware.com? Appreciate its a rather pedantic change but aligns more appropriately to our internal group names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I will try to get the change done either today or early next week.
I went with logging.csp.vmware.com as it is the prefix used by kube-fluentd-operator in the namespace annotations. Does it make sense to change those too to keep it consistent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to change them and keep them consistent, but we'd just need to highlight it in the release notes of the next release to make sure people weren't relying on it to point to a different configmap name.

@vmwclabot
Copy link
Member

@tommasopozzetti, you must sign every commit in this pull request acknowledging our Developer Certificate of Origin before your changes are merged. This can be done by adding Signed-off-by: John Doe <john.doe@email.org> to the last line of each Git commit message. The e-mail address used to sign must match the e-mail address of the Git author. Click here to view the Developer Certificate of Origin agreement.

@tommasopozzetti
Copy link
Contributor Author

Hi @carpenterm,
I pushed the change from logging.csp.vmware.com to logs.vdp.vmware.com
An example of FluentdConfig resource now will be:

apiVersion: logs.vdp.vmware.com/v1beta1
kind: FluentdConfig
metadata:
  name: fd-config
spec:
  fluentconf: |
    <match kube.ns.**>
      @type relabel
      @label @NOTIFICATIONS
    </match>

    <label @NOTIFICATIONS>
     <match **>
       @type null
     </match>
    </label>

// CRD Management for autoinstall
///////////////////////////////////////////

var fluentdConfigCRD = v1.CustomResourceDefinition{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tommasopozzetti This is a great PR. I think v1 for crd is supported from 1.16 onwards. Can we handle this based on a feature flag? or do you think any better approach to handle between v1 and v1beta1?
I observed this when I pulled this change and tested it locally. In worst-case scenario, if those crds are created outside the code, can that be provided as a workaround?

Since this a major feature, can you also consider a doc update ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we need to ensure it doesn't fail on older Kubernetes versions prior to 1.16.

The config-reloader has been equipped with the capability of installing the CRD at startup if requested, so no manual actions to enable it on the cluster are needed.

We should definitely have the ability to turn this off and do it manually. It's an interesting approach to install the CRDs from the code but we need to decide if this is something we really want to do longer term - if we install from the code we also need to maintain/upgrade in code vs a YAML file. This has the advantage of having less loosely coupled dependencies, I guess it depends how frequently we see that CRD changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback.
This approach to CRDs is similar to the one used by coreos/dex. I like it as it simplifies the process of managing CRDs for the user, it continuously monitors CRD availability at each restart of the operator and it couples the CRD definition to the code, allowing them to easily evolve together.
In case of CRD evolution, the code can evolve to be able to apply updates if required automatically, again taking a possible source of errors out of the hands of the users.
For the problem of Kubernetes versions preceding 1.16, I think we can add support also for v1beta1 and automatically choose the correct version based on the environment.
We could also decide to provide this auto-management of CRDs as a controllable feature that we can activate through the chart. This will however require to maintain also the CRD yaml file so that users can apply it manually (or through the helm chart) if this auto-management feature is not active.
Let me know your thoughts and I'll be happy to add the needed modifications to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the approach to handle both the versions in the code. Since it performs an existence check, this wont be stopping the users from maintaining it outside. I believe your other commit #110 will clear the old configs, if we toggle the behaviour.

@carpenterm Any thoughts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimalbaby @carpenterm I pushed the commit adding the capability of automatically choosing the right version of the CRD API based on the system.
Unfortunately I don't have a Kubernetes cluster with version < 1.16 handy at the moment, so could you could help with testing it in that environment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree, we run clusters with lots of different service teams working on different products sharing the same kube-fluentd-operator instance and I'm assuming that others are also in that situation. Migrating all of the teams to CRDs at the same time we switch the server to CRDs would be a bit of a headache, it would be better if we could have both CRDs and ConfigMaps enabled for a while to enable teams to migrate. Would be great to dynamically figure out which one to use based on whether a ConfigMap or CRD exists, or if that's too much work, perhaps they could control it themselves via an annotation on the namespace and the cluster administrator could set a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing and for fixing the bug @dimalbaby

We could look at creating an extra "virtual" datasource that basically internally manages an instance of the configmap datasource and an instance of the crd datasource and concats the fluentd configs from both. This would allow running both datasources side by side. This virtual datasource can then be deprecated and possibly removed in a later release if there is interest in dropping support for configmaps or for migration (this is mostly to avoid the complexity of managing an extra datasource as the operator evolves). Let me know your thoughts.

Copy link
Contributor

@dimalbaby dimalbaby May 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree @tommasopozzetti. But what would happen if there is collision- loading the configs for a namespace from both the sources? Probably the CRDs will override ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should be left up to who is managing the namespace to handle the issue. I see it as the same issue of what would happen if in the multimap datasource multiple configmaps are defined with collisions.
The datasource simply provides the input, and this new virtual datasource would allow providing inputs from two sources at the same time, but it is left up to who is writing that input to provide a correct one. If a collision generates a syntactic error, fluentd validation will catch it. If, instead, it creates a semantic error, then it is allowed, just as it would be allowed in the multimap datasource or even in the default datasource in case that same concatenated config is put in one configmap.
The correct migration process would require first installing the new release and enabling this migration mode. Then, one by one, the configmaps should be replaced by the corresponding CRDs. This coexistence will allow this process to last for as long as needed to get all teams up to speed, and finally the datasource can be switched to pure CRD, thus disabling configmaps alltogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense. Thanks @tommasopozzetti

@tommasopozzetti
Copy link
Contributor Author

@dimalbaby @carpenterm I have pushed another commit adding a feature called crd-migration-mode which includes the changes we discussed.
My proposal for the process of adopting the new release of the config-reloader is the following:

  • A new user, who is installing kube-fluentd-operator for the first time, should set the datasource: crd option in the chart and therefore be able to use the new system right away without any issues (maybe it makes sense to set crd as the datasource which is installed if nothing else is specified in the new chart?)
  • A user who is already using kube-fluentd-operator with either datasource: default or datasource: multimap will update to the new chart. He will need to keep the datasource property as is but he will set the crdMigrationMode property to true. By doing so, the config-reloader will be launched with both the crd datasource and the legacy datasource (either default or multimap depending on what was configured in the datasource property) enabled. The user will then slowly migrate one by one all configmap resources to the corresponding fluentdconfig resources. Finally, when the process is complete, the Helm release can be upgraded by changing the crdMigrationMode property to false and switching the datasource property to crd. This will effectively disable the legacy datasource and set the config-reloader to only watch fluentdconfig resources.

Let me know your thoughts on this

tommasopozzetti and others added 9 commits May 13, 2020 12:59
This commit introduces an interface to decouple the type of Kubernetes
resource used to store the fluentD configs from the K8S datasource.

This enhancement should bring no changes in behavior but it should
set the stage for allowing different types of k8s resources to hold
fluentD configs.

This is in particular needed to simplify the creation of a CRD to
specify a dedicated resource to hold fluentD configs.

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
The Kubernetes world is evolving around the concept of operators and
custom resources to manage the various aspects and behaviors of
components.

This commit introduces a CRD for representing fluentd configurations,
to move away from the generic ConfigMap. This allows to have a
dedicated resource for fluentd configurations, thus allowing to
easily obtain fluentd configurations by querying the `fluentdconfig`
resource, or create new configurations for applications by simply
attaching a `fluentdconfig` resource to the application specs.

Furthermore, the use of CRDs should benefit the efficiency of
the config-reloader. In fact, especially in clusters with a
lot of applications that utilize ConfigMaps dynamically (for
example for leader election purposes), a lot of updates happen
to ConfigMaps, which are delivered to the config-reloader which
has to realize they are not meant for it and discard them. Using
CRDs, the config-reloader can register with the API Server to
watch only `fluentdconfig` resources, thus being notified only
when really necessary.

Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
…ion must be specified as top level fields for v1beta1

Signed-off-by: Dimal Baby <dbaby@vmware.com>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
Signed-off-by: Tommaso Pozzetti <tommypozzetti@hotmail.it>
@tommasopozzetti
Copy link
Contributor Author

I also rebased onto master to solve conflicts

@dimalbaby
Copy link
Contributor

@tommasopozzetti I am merging the changes. Thanks for addressing the comments. Great PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants