Intrinsic datastores for Node.js (nodejsdb)
Experimental project. Runnable artefacts will be published as standalone npm modules, and by various authors.
Proposed base Synchronous API is here.
However, a very important area where this kind of explosive evolution is desperately needed but where it is not happening, is the area of database development. We only have a handful of projects to choose from, and even fewer architectural models. Problems like clustering, interfacing, query languages, persistence strategies, etc. are currently mostly in the domain of lower-level languages. Instrinsic datastores for Node.js is an attempt to support this portion of the Node.js ecosystem.
What we inevitably see during the course of evolution of almost any database product, is its extension with some form of a secondary language (the query language being the primary one). This comes either in the form of stored procedures (e.g. T-SQL, PL/SQL, etc.), or a scripting language (e.g. Lua in Redis).
So the idea here is to bring datastore functionality and scripting into the same process, the same way as we see it with dedicated databases in form of stored procedures, but this time from the other way around - bring the database to the scripting environment:
Advantages, when building standalone database servers:
- Utilization of the Node.js platform and its ecosystem to evolve database products.
Advantages, when using this approach to join the application and the database layer:
- The OS will not have to process the extra TCP/IP or IPC that occurs with out-of-process databases.
- Data access latency will be lower.
- In simple implementations, data access may be synchronous.
- Simplified software stack.
- Rawhash - In-memory key:value cache where keys are binary Buffers mem, kv
- Node-LevelDB - NodeJS bindings to levelDB, with SSTable disk storage approach disk, kv
- node-cask - Bitcask clone for node, based on node-mmap disk, kv
- node-gdbm - interface to GNU GDBM disk, kv
- node-sqlite3 - "Asynchronous, non-blocking SQLite3 bindings for Node.js" disk, sql
- node-memcache - in-process memcached for Node.js mem, kv
- hive - "In memory store for Node JS"
- nStore - "uses a safe append-only data format for quick inserts, updates, and deletes. Also a index of all documents and their exact location on the disk is stored in in memory for fast reads of any document"
- node-tiny - "largely inspired by nStore, however, its goal was to implement real querying which goes easy on the memory"
- BarricaneDB - "a transparent object persistence mechanism"
- chaos - "we exploit the sha1 chaotic randomness to store the keys evenly in the filesystem"
- Alfred - "a fast in-process key-value store for node.js"
- awesome - "A Redis implementation in node.js"
- nedis - "Redis server implementation written with nodejs"
- EventVat - "evented in-process key/value store with an API like that of Redis"
Is v8 good for in-memory data storage? Data would be first class citizen and a lot of wheel-reinventing could be avoided. v8 translates JS directly into machine code, how to best leverage this?-- A simple test on Node v0.7.4 Mac revealed that on my 2,8GHz dual core machine, about 40M objects is where v8 starts to choke. Given the high level of optimization already done in v8, it's probably safe to assume that any significant improvement of v8's GC on the current architecture is not possible, at least not without adding significant memory usage overhead, or without significant rewrite of the current implementation. New possibilities are on the horizon in the form of GPU-supported GC, albeit patent-encumbered (see here and here), but we've already seen a lot worse situations where patent-free solutions were developed working-around the existing patents, e.g. WebM vs. H.264, and many others. However at the present time (2/2012), it is best not to consider node/v8 as a good storage for large number of objects.
decide on a good primitive data structures and operation set which would allow to model most used DB cases, including pub-sub
binary safe keys and data
map, ordered-map, deque
atomic ops? transactions? (plan ahead for the concurrent impl?)
provide drop-in replaceable implementations with varying tradeoffs:
all data in memory (with or without secondary storage) - fastest, RAM-limited
all keys in memory - still pretty fast, not-so RAM-limited
keys and data in memory on-demand
provide further variations:
single-process - fast, but multiple cores and multiple Nodes cannot work with the same data, clustering must be applied
shared-memory implementation - certain overhead and latency but higher total performance up from a certain number of cores (atomic ops and async API necessary at this point)
Proposed base Synchronous API is here.
Asynchronous API will be added in the future.