Skip to content

Namespace config operator is consuming too much memory #96

@hanzala1234

Description

@hanzala1234
Contributor

Whenever we start the operator, memory consumption goes upto 20GB and our api server becomes unresponsive. API server starts consuming more than 15GB and then it gets killed and the master becomes unhealthy.

we have to scale down the namespace-config operator for making api server responsive again. what could be the reason that it consumes too much memory once it starts? could there be some memory leak? Is it possible that it reconciles resources in chunks rather than reconciling all together? how can we find the root cause?
ns-operator-cropped

Activity

raffaelespazzoli

raffaelespazzoli commented on Apr 5, 2021

@raffaelespazzoli
Collaborator
rasheedamir

rasheedamir commented on Apr 5, 2021

@rasheedamir

@raffaelespazzoli we have this version running currently "version":"1.0.3"

On this cluster where we are experiencing this issue we have 20 NamespaceConfig objects only.

We have been experience this issue for quite sometime now; and now it's a blocker! Last time it spiked to 20GB and then became stable at 6,5GB; but still 6,5GB is way too much for an operator

Currently it is scaled down to zero!

rasheedamir

rasheedamir commented on Apr 5, 2021

@rasheedamir

Here is last 7 days usage

screencapture-grafana-openshift-monitoring-apps-devtest-41b996e9-kubeapp-cloud-d-a164a7f0339f99e89cea5cb47e9be617-kubernetes-compute-resources-workload-2021-04-05-20_22_13

rasheedamir

rasheedamir commented on Apr 6, 2021

@rasheedamir

@raffaelespazzoli any thoughts on how we can troubleshoot it?

raffaelespazzoli

raffaelespazzoli commented on Apr 6, 2021

@raffaelespazzoli
Collaborator
hanzala1234

hanzala1234 commented on Apr 6, 2021

@hanzala1234
ContributorAuthor

Which types of objects are created by your namespaceconfigs?

We are creating mostly secrets,role and rolebindings and tekton resources (trigger template,triggerbinding, pipelines,event listener)
How big is your cluster and how big is the etcd database?

we have 16 nodes with 3 master nodes
ETCD size right now is 818Mi average.

Besides the namespace pod using a lot of memory did you see any other side
effects?

API server is crashing. once we scale up the namespace config operator, the whole cluster gets affected.

raffaelespazzoli

raffaelespazzoli commented on Apr 7, 2021

@raffaelespazzoli
Collaborator

how many namespace config objects do you have, how many namespaces do you have?

Can you run an experiment in which you create your namespace config object one every 5 minutes and monitor how the memory increases?

hanzala1234

hanzala1234 commented on Apr 7, 2021

@hanzala1234
ContributorAuthor

we have 20 namespace config objects. we have a total of 134 namespaces. but namespace config operator only applies to 30-40 namespaces. also, in our environment, we create namespaces dynamically as well for PR testings.

raffaelespazzoli

raffaelespazzoli commented on Apr 7, 2021

@raffaelespazzoli
Collaborator
rasheedamir

rasheedamir commented on Apr 7, 2021

@rasheedamir

20 is high number :(

Why is a "controller manager" is allocated per NamespaceConfig object?

raffaelespazzoli

raffaelespazzoli commented on Apr 7, 2021

@raffaelespazzoli
Collaborator

20 is high number :( by that I mean that I had never seen before a deployment where so many definitions were needed. And that perhaps there is a way to collapse some of them and optimize. I didn't mean to say that the operator should not support it.

Why is a "controller manager" is allocated per NamespaceConfig object? that's how the operator is designed. One can't dynamically add watchers to a running controller-manager. so each time a NamespaceConfig object is created, the needed watchers are grouped into a new controller-manager.

raffaelespazzoli

raffaelespazzoli commented on Nov 15, 2021

@raffaelespazzoli
Collaborator

may I close this issue?

Florian-94

Florian-94 commented on Feb 22, 2022

@Florian-94

Hello,
We are using the 1.2.0 version of the nsconfig operator on a 4.8.14 OCP cluster. We really appreciate it except for the RAM consumption ...
On one of our Openshift BUILD cluster, we have 125 namespaceconfigs objects. One for each namespace (and its rolebindings, netpol, resourcequotas, limitrange ... associated). And we plan to host new clients (ie namespaces) soon.
The limit for the nsconfig operator pod is 7Gb of RAM and it's not enough because the container restart every 15 minutes. We are going to set the limit to 10Gb of RAM which is huge and increase risk on scheduling this pod on ours workers.
Is there a way that you change the behaviour of the operator to limit this need of RAM ?
Thank you,

Florian

P.S : On another cluster, we have 35 namespaceconfigs objects for a RAM utilization stable with 1,15 Gb. It seems RAM consumption is not linear with number of namespaceconfigs objects.

raffaelespazzoli

raffaelespazzoli commented on Feb 22, 2022

@raffaelespazzoli
Collaborator

I recommend upgrading but that will probably not solve your problem. @Florian-94
There is definitely a correlation between the number of namespace config and type of object being configured and the memory sued by this operator. This cannot be eliminated.
Having one NamespaceConfig object per namespace is technically possible, but it's not what was intended for this operator.
Can you share your use case? Maybe a couple of namespace config for different namespaces? I wonder if we can use the operator in a way that is more in line with what was intended.

Florian-94

Florian-94 commented on Feb 22, 2022

@Florian-94

We have a web access portal where our customers can choose all specific parameters for limitrange and resourcequotas (the portal manages a process validation before creating namespaceconfigs CR on openshift cluster).
May be we could use the "tee-shirt size" system offers by nsconfig operator for this usage.
We also apply 2 network policies (in nsconfig CR) to be sure users can't modify / delete it (they are the same for all ns)

But, on this portal, our customers also manage users which will have an access on the namespace (kind: group in the namespaceconfigs objects with specifics userIDs in user field). So I can't see how to use a shared namespaceconfig template for this usage.
May be the nsconfig operator was not the right choice for our needs. May be we should just apply k8s objects and prevent namespace admin users to edit them (with gatekeeper for example). We didn't see the ram problem caused by too many namespaceconfig ressources when we did this choice.
Thanks for your help.

raffaelespazzoli

raffaelespazzoli commented on Feb 22, 2022

@raffaelespazzoli
Collaborator
raffaelespazzoli

raffaelespazzoli commented on Jan 8, 2024

@raffaelespazzoli
Collaborator

May I close this issue?

Florian-94

Florian-94 commented on Jan 8, 2024

@Florian-94

Yes, you can close the issue for me. Thank you. We are not using namespace-configuration-operator anymore. Maybe one day if we decide to use size templates to manage quotas/limitrange for our projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @rasheedamir@raffaelespazzoli@hanzala1234@Florian-94

      Issue actions

        Namespace config operator is consuming too much memory · Issue #96 · redhat-cop/namespace-configuration-operator