Twemcache has three basic eviction strategies now when eviction is enabled:
- item LRU eviction (
- slab random eviction (
- slab LRU eviction (
-M 4) All eviction strategies have the same trigger, which is when a server runs out of existing free slots and quota for new slabs.
Option No.1, commonly known as just "LRU eviction", has been around forever and is what all Memcached protocol implementation supports. It maintains an item LRU queue for each slab class, and accessing an item puts it to the back of the queue (but the update is rate-limited to avoid too much churn). When a server tries to evict, it scans from the head of the item LRU queue that the new item should go into, until an item not being currently accessed is found. Mainline Memcached allows at least one slab allocated to each slab class, even if that violates the number specified with
--max-memory, to avoid total rejection of items between a particular size range. Twemcache does not make such attempt. Regardless, since this eviction strategy works only at the item-level and within the same slab class, it leads to the slab calcification problem that threatens the long-term caching effectiveness as soon as item distribution among slab classes changes.
Option No.2, sometimes shortened to "random eviction", is our first attempt to solve the issue with item LRU eviction. It was introduced in Twemcache 2.0, before Twemcache was even open sourced. Fundamentally, we need an eviction strategy that evicts at the same level as memory allocation, which is at the slab level. Random eviction requires a simple data structure called the slab table, which is an array of all slabs allocated by far. A server evicts by repeatedly picking a random slab until one that is not currently being accessed is found. Then all items in this slab are evicted before the server re-initialize it to be used by the next item. Random eviction is naive and by no means perfect in itself, but it is conceptually simple- it responses to memory requests instantaneously, and the laws of statistics ensure that a migrating item size distribution will lead to the same change to happen in slab distribution over time. In our opinion, it is perfect as a first attack on the slab calcification problem. It establishes a baseline for future exploration of a better solution.
Option No.3, slab LRU eviction, is introduced in Twemcache 2.5.0. It furthers the search for a good slab-level eviction strategy and tries to be a little cleverer on choosing a slab to evict. It is based on the observation that with random eviction, a recently updated slab containing hot items may be evicted during the blind selection process. This naturally leads us to ask if we can adopt the same criteria as in item LRU eviction, but make it work at the slab-level. Slab LRU eviction is the result of such a thought. It uses a global slab LRU queue, which gets updated whenever an update to an item LRU queue is attempted (in other words, it shares the same update trigger as item LRU eviction). The updated slab is put to the tail of the queue, also in a rate-limited fashion, while eviction attempts are made from the head. It outperforms random eviction in terms of keeping hot data in memory when read requests are highly temporal and never reach beyond a small, recently updated portion of the data. On the other hand, it is more effective in the longer run compared to item LRU eviction if the item size distribution is dynamic.
Twemcache allows any combination of these eviction strategies by setting the corresponding bit and pass the total numeric value to your
--eviction-strategy option. Eviction strategies will be checked from the most significant bit to the least.
A Note on Slab Automove
Automove is the slab-level eviction strategy provided by mainline Memcached. The main difference between slab automove and slab random eviction is that the former only evicts a slab when it meets certain criteria. To quote from the Release Note of Memcached 1.4.11, that criteria is "if a slab class is seen as having the highest eviction count 3 times 10 seconds apart, it will take a page from a slab class which has had zero evictions in the last 30 seconds and move the memory". From the phrasing, it is not difficult to tell that this is a more conservative strategy than both slab random eviction and slab LRU eviction. For example, the slab automover does not kick in if all slab classes have seen evictions within the past 30 seconds, or if there are two alternating slab classes witnessing the highest eviction in each 10 second window. Therefore, one should expect the involvement of slab automover to be dependent on the traffic pattern. We have a note that compares slab automove with random eviction through an experiment. Continue reading about the result in random_eviction.md