Valkey support for memory locking (?) #669

ranshid · 2024-06-19T05:41:21Z

The problem/use-case that the feature addresses

Valkey users are mostly interested in having their data in-memory. Currently whenever the server memory utilization is peaking it might cause the system to swap memory pages into swap memory thus introducing increase latency which in turn can lead to client applications to just attempt to connect more clients and potentially try to consume more memory. OS memory thrashing also leads to some maintenance operations (defrag, lazy evictions, client evictions etc...) to operate in a lower rate, thus contributing to more swap utilization and overall system health issues. When system is thrashing some other crucial server side operations (replication keepalive, pings and clusterbus notifications) are lagging which usually end at unresponsive server which is taken down or failed over. I would generally state that in AWS we have rarely seen cases were system was able to automatically stabilize after OS starts thrashing.

Description of the feature
The suggestion is to provide a configuration which will mlockall pages in memory. I think we can start with a startup configuration and potentially turn it to a dynamic configuration (so we can also switch mlockall/munlockall) at runtime.

What alternatives are there?
Disable swap - The best way to avoid system swapping is to just disable swap memory. On linux this can be done by turning off system swap swapoff -a. this would however impact the entire system and might lead to different services crashing when memory pressure is high.
tune system swappiness - AFAIK most linux distros come with swappiness level set to 60 which means the system will be aggressive swapping pages of inactive processes. While in most cases I have seen it is suggested to setup swappiness level to 1 in order to prevent unplanned paging to swap, it is mostly dependent on the system Valkey is running on.

What risks are there locking all process pages?
From my past experience running a process with mlockall and still enable system swap can lead to many cases of unplanned crashes. Unless the process virtual memory is setup to be bounded to the real amount of memory the process is expected to utilize, page allocations would still succeed and later access to allocated virtual pages would sigfault. it is harder to analyze these cases and probably require to tune other system config (like swappiness) to make sure there will always be some RAM space to allocate pages for the locked process.

Are there any external examples providing this config?
I looked in the Elasticsearch documentation and found they are offering 3 alternatives to avoid swap (similar to the ones I have mentioned earlier) but they also utilize JVM bounded virtual memory configuration, which limit the heap size utilization. I think that this configuration falls under the "expert" level of configurations so it should be considered if we want to expose users to such options.

Should we also mlock bgsave pages in RAM?
page lock is not inherited across forks, so by default the BGSAVE will run without lock protection. I think that at the first stage we should NOT enable BGSAVE to lock pages in memory given the ephemeral nature of it.

The text was updated successfully, but these errors were encountered:

hpatro · 2024-06-21T17:53:53Z

@ranshid One of the anecdote I would like to share here is some user(s) want predictable behavior throughout the lifetime of the process and when the engine enters swapping, that doesn't stay true. As everything slows down right from command processing to maintenance operations (defrag, lazy evictions, client evictions etc...), the engine is pretty much unresponsive and can lead to cascading effects like all clients getting disconnected and eventual connection storm. I think this would be a good to have configuration for such user(s).

JimB123 · 2024-06-21T19:02:08Z

I think this idea may be misguided. Swap is never the problem. Swap is a SOLUTION to the problem of insufficient memory.

Swap provides a safety net for transient OOM situations. Without this safety net, the OS would kill the process. Eliminating swap, or crippling it by exempting the largest single process only affects the safety-net, without addressing the cause of the low-memory condition.

Please note - I'm well aware that active swapping kills performance. Nobody wants to be running on a system that's heavily in swap. I view swap ONLY as a transient safety net and I would never advocate for planned usage of swap as part of normal processing.

Attempting to lock Valkey into memory (making it exempt from swapping) is likely to be counter-productive. Valkey is likely to be the single largest user of memory on the system. What this would really be saying is: "I want the safety net of swapping, but I want to exempt the single largest user of memory from that function". This essentially REQUIRES the OS to swap out pages which are more necessary than than pages in Valkey. That can't be good.

If you don't want swap, that's pretty simple - just configure the host system without swap. When then system has low memory, Valkey (the largest process) will be killed. Another option may be to configure accounting limits on the Valkey process - and when Valkey exceeds those limits, it will be killed.

most linux distros come with swappiness level set to 60 which means the system will be aggressive swapping pages of inactive processes

This is not a completely true statement. There is a lack of good information regarding swappiness, and a lot of mis-information. Start up a linux system. Leave it idle. Check the swap - it will be zero. Clearly the system is not "aggressively swapping pages of inactive processes".

Swapping only occurs as the system is close to memory exhaustion, regardless of the swappiness setting. I don't claim to be an expert on this setting, but as best I can tell from the differing information I've found, this setting is primarily a preference to page out data vs paging out (discarding) file-backed memory (like code). A lower swappiness value (range 0-200) is more likely to cause code to be discarded than data to be written to disk. (At least to the best of my understanding.) I think this reference seems a little better than most: https://www.howtogeek.com/449691/what-is-swapiness-on-linux-and-how-to-change-it/ Also, that's mostly collaborated by this reference: https://www.baeldung.com/linux/swap-space-settings - but you have to read both references carefully. In some cases, the word "swapping" is used to only refer to paging out (writing to disk) data pages. The words "less swapping" sound good, however - the unstated alternative - discarding file-backed pages (like code) - is very similar in impact. In both cases, when an address fault occurs, the page (be it data or code) needs to be read back from disk.

madolson · 2024-06-21T19:21:19Z

Broadly I'm in agreement with @JimB123. I don't think we should be locking memory for performance reasons. The inverse is crashing, which I would argue is almost always worse than swapping. I think there might be some arguments about selectively locking pages, or collating data from the main dictionary into pages that are locked, to try to improve the performance of the engine while swapping, but I don't think that is a high priority.

With that said, I do think we should consider locking memory for security purposes. I would really like to lock any security related options (like masterauth) so that it doesn't ever get spilled to disk.

zvi-code · 2024-06-27T10:53:58Z

I think we could do something better here. If we separate the memory by usage we can have different lock policy for different memory usages. We than can, for example: lock all memory not in user dbs OR not user values memory. This will allow many non data intensive commands to execute, while leaving most of the memory for swap.

The above goal can be achieved by using separate memory regions for these allocations. Depends on what we want to lock, we can achieve this with any allocator, I believe. With jemalloc specifically the natural way would be to use user
-defined arena. There are many other benefits to this approach IMO (for example my comment here).

ranshid changed the title ~~Valkey support for memory locking~~ Valkey support for memory locking (?) Jun 20, 2024

madolson added the pending-refinement This issue/request is still a high level idea that needs to be further refined label Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valkey support for memory locking (?) #669

Valkey support for memory locking (?) #669

ranshid commented Jun 19, 2024 •

edited

Loading

hpatro commented Jun 21, 2024

JimB123 commented Jun 21, 2024

madolson commented Jun 21, 2024 •

edited

Loading

zvi-code commented Jun 27, 2024 •

edited

Loading

Valkey support for memory locking (?) #669

Valkey support for memory locking (?) #669

Comments

ranshid commented Jun 19, 2024 • edited Loading

hpatro commented Jun 21, 2024

JimB123 commented Jun 21, 2024

madolson commented Jun 21, 2024 • edited Loading

zvi-code commented Jun 27, 2024 • edited Loading

ranshid commented Jun 19, 2024 •

edited

Loading

madolson commented Jun 21, 2024 •

edited

Loading

zvi-code commented Jun 27, 2024 •

edited

Loading