This is a list of fun project ideas that no one is currently working on.
One of the primary problems for a practical distributed system is knowing the state of the system. Voldemort has a rudimentary GUI that provides basic information. This project would be to make a first rate management GUI and corresponding control functionality to be able to know the performance and availability of each node in the system as well as perform more intense operations like starting / stopping nodes, restoring from replication, rebalancing, etc.
Voldemort comes with a very simple text shell. A better way to build such a thing is to fully integrate a language with an interpreter and provide a set of predefined administrative commands as functions in the shell. Scala has a flexible syntax and integrates easily with Java so it would be a good choice for such a shell.
Since Voldemort supports a pluggable storage engine interface. For example, we have a Krati based storage engine in contrib. Though we have dramatically improved performance of the BDB-JE storage engine, we definitely want to try out other databases (since there is not one database that works for all workloads). The first phase of this project would require building JNA / JNI bindings for the storage engine followed by the integration with Voldemort.
Contributions to https://github.com/vinothchandar/rocksdb-jna would be good way to start. There is also https://github.com/ankgup87/rocksdbjni
Given Voldemort now has a Coordinator service, that exposes a REST endpoint for Voldemort operations, now is the time to get cracking on client for different languages/frameworks such as Ruby,Node.js,PHP,Python
An easy project would be to provide the same API as Memcache.
We want to add the ability to push jars in a central Maven repository. There has been active discussions around this on the forums. So please get some context from there, before diving in.
Storage systems have become much more specialized in recent years with each system providing expertise in certain areas—Hadoop and proprietary data warehouses provide batch processing capabilities, Search indexes provide support for complex ranked text queries, and a variety of distributed databases have sprung up. Voldemort is a specialized key-value system, but the same data stored in Voldemort may need to be indexed by search, churned over in Hadoop, or otherwise processed by another system. Each of these systems needs the ability to subscribe to the changes happening in Voldemort and get a stream of such changes that they can process in their own specialized way.
Indeed even Voldemort nodes could subscribe to one another as a quick catch-up mechanism for recovering from failure.
Amazon has implemented this functionality as a “Merkle tree” data structure in their Dynamo system which allows nodes to compare their contents quickly and catch up to differences they have missed, but this is not the only approach. It could be a simple secondary index that implements a node-specific logical counter that tracks modification number for each key.
The api that would be provided would be something like getAllChangesSince(int changeNumber), and this api would provide the latest change for each key.