You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Installing and running MagTape on worker nodes with 24 or more CPU's generates a high number of threads with Gunicorn and there appears to be a memory leak of some sort.
What you expected to happen:
Pods to startup normally
How to reproduce it (as minimally and precisely as possible):
Run the simple install in a cluster with worker nodes that have 24 or more CPU's
Anything else we need to know?:
Experienced on worker nodes with 24 cores x 128GB RAM
Example output from MagTape container logs:
[2020-10-02 04:52:27 +0000] [107] [INFO] Booting worker with pid: 107
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:62)
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:63)
[2020-10-02 04:52:27 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:64)
[2020-10-02 04:52:27 +0000] [62] [INFO] Worker exiting (pid: 62)
[2020-10-02 04:52:27 +0000] [64] [INFO] Worker exiting (pid: 64)
[2020-10-02 04:52:27 +0000] [63] [INFO] Worker exiting (pid: 63)
[2020-10-02 04:52:29 +0000] [108] [INFO] Booting worker with pid: 108
[2020-10-02 04:52:29 +0000] [109] [INFO] Booting worker with pid: 109
[2020-10-02 04:52:30 +0000] [1] [INFO] Unhandled exception in main loop
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 211, in run
self.manage_workers()
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 545, in manage_workers
self.spawn_workers()
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 616, in spawn_workers
self.spawn_worker()
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 567, in spawn_worker
pid = os.fork()
OSError: [Errno 12] Out of memory
Environment:
Kubernetes version (use kubectl version): v1.15.5
Worker Node OS: Ubuntu 16.04
Cloud provider or hardware configuration:
Others:
The text was updated successfully, but these errors were encountered:
This is related to the dynamic sizing for workers/threads in the Gunicorn config. While the docs recommend (2 x $num_cores) + 1, they also recommend not going above 12 workers total. I'm testing out hard coding the values for workers/threads to a reasonable default and using an HPA for scaling out vs. up.
Thanks to Alex, @ilrudie , and Shahar for the consultations!
What happened:
Installing and running MagTape on worker nodes with 24 or more CPU's generates a high number of threads with Gunicorn and there appears to be a memory leak of some sort.
What you expected to happen:
Pods to startup normally
How to reproduce it (as minimally and precisely as possible):
Run the simple install in a cluster with worker nodes that have 24 or more CPU's
Anything else we need to know?:
Experienced on worker nodes with 24 cores x 128GB RAM
Example output from MagTape container logs:
Environment:
kubectl version
): v1.15.5The text was updated successfully, but these errors were encountered: