Skip to content

simple memcached clone using socket programming and threadpool

Notifications You must be signed in to change notification settings

tomkaith13/memcachedClone

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

README for MiniMemcachedServer:

API Details:

This memcached clone attempts to create a small subset
of the original memcached server based on the memcached text
protocol.
The operations implemented are :

1) SET - setting a key,value
2) GET - getting a value held by Key.
3) DELETE - Delete a key-val pair based on given key.
4) GETS - obtain the key-value pair along with a 64-bit cas_unique id
5) CAS - Pass the key-value and cas_unique id and the new value is set
         based on the fact that cas_unique is changed or not.
6) QUIT - exit the connection

The details and the syntax of the memcached text protocol can be found here:
https://raw.githubusercontent.com/memcached/memcached/master/doc/protocol.txt

===============================================================================
DESIGN CHOICES:
This server was designed and implemented using C++11.
The language choice was purely because of interest to implement many
of the concurrency features myself. This could easily be implemented in
other languages. It uses C++11 built-in thread support which makes it portable
to most Unix platforms. Also, since memcached was written in C, I would assume
that using a low-level language is good for performance. The choice was C++11
 primarily to utilize several built-in functionality of STLs and concurrency
language constructs.

The networking functionality is implemented using Unix Socket Programming APIs.
Again, portability was the main reason for this choice. Currently, all socket
programming APIs are made using embedded into the main MiniMemcached class.
Probably we should isolate socket calls which into a separate class if Windows
support needs to be added.

Configuration Limits:
The server configuration parameter are set when the server is instantiated.
The default value of the CONFIG parameters can be found at the start of
MiniMemcachedServer.hpp as macros.
The server admin can also provide custom config
parameters via the MiniMemcached constructor.
An improvement here would be the ability to read config parameters via
some config file rather than relying on user input.

Here are the config parameters (with their default values):
MAX_BYTES_LIMIT 100
MAX_CONNECTIONS_LIMIT 20
CONNECTION_TIMEOUT 90 (in seconds althought microseconds option can be available)
MAX_CACHE_SIZE 100 (total cache size)
DEFAULT_PORT "11211"

MiniMemcached was implemented as a class which using threads.
The server's main thread is the one which accepts any connection
from the client. There can be a total of MAX_CONNECTIONS_LIMITS number of
clients connected simultaneously to the server. Normally, the if this
number is exceeded, the user will be prompted for a message stating to try
again later. However, in this server's design, we added a conditional variable
which causes the server to wait until the total number of accepted clients
are below the MAX_CONNECTION_LIMIT. This way, the user does not have to poll,
but can wait until one of the user connection is terminated via a QUIT command
or a timeout.

The MAX_CONNECTIONS_LIMIT is serviced by the MiniMemcached server via a
ThreadPool which creates MAX_CONNECTIONS_LIMIT number of threads.

The advantages of using a ThreadPool are:
1) Number of threads are created initially by the server and stay alive
   for future use.
2) The onus of an internal system call of a lwp+kernel thread creation
   is taken on even before the traffic to the server is begun. This way there is
   limited user to kernel mode switch.
3) Threads created by the Pool execute a member function of MiniMemcached.

Disadvantages:
1) ThreadPool == shared mutable state. This means we need mutexes to control the
   access to critical sections of the class. These critical sections should be
   very small to avoid blocking.
2) The current threadpool is completely static and implemented by us. It does
   not currently have the feature to grow the number of threads in the pool
   if the need arises. This could be a add-on feature.
3) The number of threads that can be spawned in the single machine is limited.
   The limit can be viewed via `ulimit -a` command.

Alternative to ThreadPool:
As discussed, the cons of using ThreadPool,
is we can only achieve vertical scaling depending on the resource of
our server. One can uses the Actor model which involves creating processes
and message passing. This will help us add more horizontal scaling.

The reason for choicing threadpool for now, is time-constraint and ability
to code up something using simple C++ constructs rather than using third party
libraries.

Coming back to the rest of the funcionality, once the server receives the new
socket fd per client, we delegate the processing of each client to a single
thread in the pool. Each server waits for commands to be sent by the client
and wait for new data to be entered until CONNECTION_TIMEOUT seconds.
The connection timeout functionality is implemented using select(2). Alternative
choices to using select(2) a slow syscall are using libraries like libevent or
some other asynchronous IO provided by the kernel. Select(2) was ultimately
chosen purely for portability. The best way to pick performance vs portability
is when the admin knows the OS/machine where the server will be run. Portability
is important if the server is running on a cloud which can migrate the compute
instance between OSes.

The clients currently can connect to the server via a simple telnet command.
Example:
telnet <ip-address> <port-number>

Once the connection is successful, the user receives a '>' prompt
to entering more commands.

In the current implementation, command input validation is strict.
Any deviation from the syntax mentioned in the protocol RFC will not
be handled and will generate an ERROR message. Extra whitespaces are
not handled currently. This could be an improvement for later.

The cache limit is can be set initially by constructor of the class. The
default value is MAX_CACHE_SIZE. An improvment here would be to add some
sort of cache cleansing algorithm using either Least Recently Used (LRU) or Least
Frequently Used (LFU) algorithm. This could be a good improvement.

Another improvement that is currently not handled by this version is client
side API which enable sending and receiving key-val pairs to the server
programmatically rather than using telnet.

The use of mutexes while reading and writing will create a choke point for
mutiple readers trying to read the same data. The easy solution to solve
this issue is to use shared_mutex which is C++'s built for read-write locks
which solves reader starvation. However, C++11 does not have built-in support
for shared_mutex yet. It is only available in C++17 which is not available
on Xcode.


===============================================================================
HOW TO BUILD:
The project comes with a xcode project which should work on any mac.
If the user prefers building via commandline. Just unzip the target
and issue the command:
bash# g++ -std=c++11 main.cpp ./src/*cpp -I ./src -o memcached

followed by:
bash# ./memcached

Please ensure the build tool chain is currently installed in your server.
If you are running it on the mac install Xcode.


===============================================================================
TESTING:
Inorder to run this server, we would need to use a test program in C++11.
The easiest way to go about it is to use the MiniMemcached's constructor.

MiniMemcached(string portNum = DEFAULT_PORT,
    int connectionCount = MAX_CONNECTIONS_LIMIT,
    int cacheSize = MAX_CACHE_SIZE,
    int connectTimeout = CONNECTION_TIMEOUT,
    string configFile = "");

As mentioned above, defaults can be found as macros in MiniMemcached.hpp

Once we create the server object, all we need to do is to run
serverObj.initServer();

This will cause it to run. Ideally, this would run as a deamon. However
that could be phase two of the project.

serverObj.shutdown is meant for graceful shutdown where it stops the threadpool.
And frees the internally allocated memory. However, its currently expects to
be terminated by a SIGINT or Ctrl+C.

Once the server is running, user can connect to the server via
bash# telenet <localhost/ip address> 11211

The usage of this server can be seen in the main.cpp found in /src

Test Automation of the current project can only be done via expect
scripts.
======================== Actual Requirements ===================================


This exercise is meant to demonstrate:
An understanding of servers, networking, and protocols.
An understanding of concurrency, performance, and resource constraints, and an ability to anticipate future issues and implement solutions.
An ability to write clear, easy to understand code, communicate your approach, and reason about tradeoffs that you have made.
The exercise:
Write a memcache server that has support for the Get, Set, and Delete methods using either the Memcache Text Protocol or the Memcache Binary Protocol. Include support for Compare And Swap, either by implementing the GETS and CAS methods in the text protocol or by using the CAS field in the binary protocol. You don't need to implement support for the expiry fields.
Support a configurable limit for the resource consumption of your service.
Handle concurrency issues that arise in an environment with multiple simultaneous writers and readers.
Discussion:
Tell us what you expect the performance of your implementation to be, and where you anticipate bottlenecks.
Suggest ways in which the performance, reliability, or other limitations of your implementation could be improved upon in the future.
Suggest or implement ways in which your service could be monitored and managed in a production setting.
Show how you tested your server.
Provide instructions on what we need to do to build and run your server locally on a standard unix-like development environment (e.g. Mac laptop or Linux server).
Notes:
You should choose an appropriate language and runtime given the requirements of the exercise. Feel free to run your choice by us if you have questions.
You may use basic collections and support classes from the language of your choice, but you are responsible for implementing the resource constraints, eviction algorithm, and concurrency control.
We don't expect you to reimplement all of memcache in a few hours. (Please don't try to!) It's more important that your solution work and be complete, and that you can reason about the performance tradeoffs and potential concurrency issues that you'd expect given real-world usage.
Please provide all source code via a Git repository – create a local repo for your code, and when the exercise is complete, please tar or zip your main directory and send it along to us (and please remember to include your .git directory/files). If you’re not comfortable with Git, you may simply provide a tgz of the source code.


About

simple memcached clone using socket programming and threadpool

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages