Dear Twemcache team,
I would like to ask about the scalability of Twemcache. What is the maximum throughput you could achieve in terms of concurrent requests per second? and how many cores could you scale up to?
Based on my own experience, I could saturate only two (Xeon) cores with a single twemcache process achieving a maximum throughput of 320K requests per second. My request mix is 95% reads to 5% writes and each request reads/writes a 1KB record.
Is there any proof that Twemcache can be scaled-up to a higher number of cores? higher number of max concurrent requests?
Could you please share your experience on the twemcache scalability issues? Are there any known scalability bottlenecks that I should consider in my tuning process?
I believe your numbers are in the same ballpark as what others have done. And in our own throughput tests, we've observed similar performance between memcached trunk and twemcache.
Addressing the existing bottlenecks (global locks, mostly) is on our roadmap. We've dealt with stats_lock (gotta do the yak-shaving before taking down the beast), but that alone won't prove of much help until we've taken care of the others as well. The next one will be cache_lock. Until then, we expect to have comparable performance with memcached.
Performance tuning we've done show that 4/8 threads are both reasonable numbers to use, beyond which point contention makes scaling hard. On CPU consumption, we notice that the binary itself does very little computation. Usually the CPU cycles are attributed to sending data over socket (sendmsg()), which is highly workload dependent.