Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage if fs.nr_open is very high and no ulimit set on Linux systems #2299

Open
RedRoserade opened this issue Apr 2, 2021 · 6 comments

Comments

@RedRoserade
Copy link

While debugging kubernetes-sigs/kind#2175, I tried to understand why uwsgi wasn't running well on a Kind cluster on Fedora 33.

I came to the conclusion that it is because of a too high value for fs.nr_open, which defaults to 1073741816 on Fedora 33, but only 1048576 on Ubuntu 20.10. The very high limit causes, on my machine, the uWSGI process on a pod to consume >8Gi of memory on the --http process, and if memory limits are set, the process will get OOM-killed by the kernel (please see issue above for a test repo and logs).

The issue isn't manifested when running uwsgi outside a container/pod because of per-user limits set with ulimit of 1024. Also the containerd.service unit seems to, by default, set a value for fs.nr_open of 1048576, which helps avoid this issue when the container with uwsgi is run via docker run.

Pod logs (high limit set deliberately via sysctl -w fs.nr_open=1073741816):

red@noctis:~/Development/kube-stuff$ kubectl logs example-pod
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 18:05:18 2021] ***
compiled with version: 8.3.0 on 02 April 2021 17:51:03
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: example-pod
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /
detected binary path: /usr/local/bin/uwsgi
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
(continues, and then hangs. The `--http` process is OOM-killed)

Raising the limit on an Ubuntu 20.10 machine to 1073741816 and trying again, without a container:

(Note that I had to do it via sysctl -w and ulimit -n to raise both limits, it seems Ubuntu has a per-user limit set of 1024)

noctis# ulimit -n 1073741816
noctis# ulimit -n           
1073741816
noctis# source venv/bin/activate
(venv) noctis# ./docker-entrypoint.sh 
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 19:17:18 2021] ***
compiled with version: 10.2.0 on 02 April 2021 18:03:34
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: noctis
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/red/Development/kube-stuff
detected binary path: /home/red/Development/kube-stuff/venv/bin/uwsgi
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** 
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 62286
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
(doesn't hang, but consumes 8193M of memory)

image

Lowering the value of fs.nr_open to 1048576 makes things work well on the pod. However, I wonder why the uwsgi process consumes so much memory when this limit is high.

Running the same uwsgi app without changing any limits, and without containers:

red@noctis:~/Development/kube-stuff$ sysctl fs.nr_open
fs.nr_open = 1048576
red@noctis:~/Development/kube-stuff$ ulimit -n
1024
red@noctis:~/Development/kube-stuff$ source venv/bin/activate
(venv) red@noctis:~/Development/kube-stuff$ ./docker-entrypoint.sh 
*** Starting uWSGI 2.0.19.1 (64bit) on [Fri Apr  2 19:48:49 2021] ***
compiled with version: 10.2.0 on 02 April 2021 18:03:34
os: Linux-5.8.0-48-generic #54-Ubuntu SMP Fri Mar 19 14:25:20 UTC 2021
nodename: noctis
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 8
current working directory: /home/red/Development/kube-stuff
detected binary path: /home/red/Development/kube-stuff/venv/bin/uwsgi
*** WARNING: you are running uWSGI without its master process manager ***
your processes number limit is 62286
your memory page size is 4096 bytes
detected max file descriptor number: 1024

Memory usage is normal in this case.

Finally, I notice that if I run --http-socket instead of --http, memory usage is what I would consider "normal" (a few hundred MiB at most), but these options are not equivalent according to the documentation.

@xrmx
Copy link
Collaborator

xrmx commented Apr 3, 2021

Does it have the same effect passing the lower count with --max-fd?

@RedRoserade
Copy link
Author

@xrmx It does not, I wasn't aware of that option. I tested it with --max-fd 1024 and it no longer consumes a huge amount of memory, even when running as non-root (despite what https://uwsgi-docs.readthedocs.io/en/latest/Options.html#max-fd says that it requires root privileges).

Probably the 1024 limit is a bit too low, but it works.

@xrmx
Copy link
Collaborator

xrmx commented Apr 3, 2021

Yeah, the root problem is that some data structures are as a big as the number of fds available so the ill effect you have seen.

@RedRoserade
Copy link
Author

I see, thanks for the explanation. Assuming the data structures cannot be changed, I wonder if a default limit of something like 1048576 would be sane. However, at the same time, setting such defaults could break some use cases.

tiborsimko added a commit to tiborsimko/reana that referenced this issue Mar 7, 2023
Fixes memory consumption of the "uWSGI http 1" process that was rising
above 8 GiB on systems like Fedora 37 (locally) and Fedora CoreOS 36 (in
the cloud) due to very high file descriptor limits (`fs.nr_open =
1073741816`). See <kubernetes-sigs/kind#2175>
and <unbit/uwsgi#2299>.

Sets the uWSGI `max-fd` value to 1048576 as per
<https://github.com/kubernetes-sigs/kind/pull/1799/files>. If need be,
we can make it configurable via Helm chart values later.
tiborsimko added a commit to tiborsimko/reana that referenced this issue Mar 7, 2023
Fixes memory consumption of the "uWSGI http 1" process that was rising
above 8 GiB on systems like Fedora 37 (locally) and Fedora CoreOS 36 (in
the cloud) due to very high file descriptor limits (`fs.nr_open =
1073741816`). See <kubernetes-sigs/kind#2175>
and <unbit/uwsgi#2299>.

Sets the uWSGI `max-fd` value to 1048576 as per
<https://github.com/kubernetes-sigs/kind/pull/1799/files>. If need be,
we can make it configurable via Helm chart values later.
@polarathene
Copy link

Just chiming in to share here since it was a result on the first page of a search query. Should help with visibility 👍

This will likely be due to a config on your system for the container runtime (dockerd.service, containerd.service, etc) that sets LimitNOFILE=infinity.

Typically infinity will be approx 2^30 (over 1 billion) in size, while some distros like Debian (and Ubuntu deriving from it) have a lower 2^20 limit (1k times less) which is the default sysctl fs.nr_open value.

This was due to systemd v240 (2018Q4) release that would raise fs.nr_open and fs.file-max to the highest possible value, and fs.nr_open being used as infinity IIRC.

  • On some distros like Fedora, at least outside of containers this was a non-issue as infinity was not used (systemd 240 kept the soft limit of 1024, but raised the hard limit to 512k vs the kernels default 4096).
  • On others like Debian, pam_limits.so has been carrying a patch for something like 2 decades that set infinity as the hard limit IIRC (or it just took whatever the hard limit was on PID 1, which would be fs.nr_open, same outcome AFAIK). That caused the v240 release to not play well due to 2^20 being raised to 2^30, so they build systemd without the fs.nr_open bump (instead of fixing patch for pam_limits.so 🤷‍♂️ ).

Anyway... for container runtimes with systemd, they'd configure LimitNOFILE and have bounced between 2^20 (1048576) and 2^30 (infinity) a few times, with infinity being present since 2018-2021 depending on what you installed (and when your distro got the update). That is what raised the limits in the container, that may not appear to be the same on your host.

Often you can configure the ulimit per container (eg: docker run has --ulimit, compose and k8s have similar ulimit config settings). Or you can set the LimitNOFILE for the systemd service config to a sane value.... or if you're lucky I guess like in this case your software affected has an option to impose a limit.

Just to clarify, this typically only affects the soft limit value, although some software internally raises the soft limit to the hard limit (perfectly acceptable... just 2^30 is not a sane hard limit, 2^19 is often plenty and many can get away with 2^16 just fine).


As for the memory usage, from what I've read in other software affected (Java), an array is allocated sized to the soft limit set, and that used 8 bytes per element, thus for 2^30 uses approx 8.6GB of memory. The more sane 2^20 hard limit you'd see on Debian would only use 8.4MB in comparison, and likewise if the default soft limit 1024 would be fine, only 8.2KB needed.


For dockerd and containerd, this problem is likely to be resolved this year as there is a fair amount of discussion going on to not use infinity.

github-merge-queue bot pushed a commit to grafana/oncall that referenced this issue Aug 25, 2023
# What this PR does

Fixes #1521 (see
unbit/uwsgi#2299)

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required)
@jualvarez
Copy link

Just adding some more visibility here. This still bit me on Fedora 39's docker. For people running into this issue:

  • If not essential (exposing uWSGI to the world), not using the --http option, removes the issue
  • As mentioned above, lowering the ulimit works, either in systemd or (for a faster fix) in your bootstrap script, ulimit -n 1048576 should do the trick!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants