Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Intrinsic datastores for Node.js

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 lib
Octocat-spinner-32 LICENSE
Octocat-spinner-32 README.md
Octocat-spinner-32 SyncAPI.md
Octocat-spinner-32 logo.fxg
Octocat-spinner-32 package.json
README.md

Intrinsic datastores for Node.js (nodejsdb)

Experimental project. Runnable artefacts will be published as standalone npm modules, and by various authors.

Add your comments in the form of Issues, or contribute to this discussion.

News

Proposed base Synchronous API is here.

Rationale

Few years ago, server-side JavaScript was unimaginable. Today, at the beginning of 2012, more and more businesses increasingly rely on high-performance, low-development-costs, short time-to-market, and explosively growing ecosystem of libraries of the Node.js platform. However Node.js is not an exception, but rather a confirmation of the rule that JavaScript is the most potent environment for software evolution available today. Other notable JavaScript ecosystems with explosive growth are Firefox Extensions, OS X Dashboard Widgets, Chrome Extensions, and of course the client side of the web, with millions of libraries, frameworks and applications.

However, a very important area where this kind of explosive evolution is desperately needed but where it is not happening, is the area of database development. We only have a handful of projects to choose from, and even fewer architectural models. Problems like clustering, interfacing, query languages, persistence strategies, etc. are currently mostly in the domain of lower-level languages. Instrinsic datastores for Node.js is an attempt to support this portion of the Node.js ecosystem.

What we inevitably see during the course of evolution of almost any database product, is its extension with some form of a secondary language (the query language being the primary one). This comes either in the form of stored procedures (e.g. T-SQL, PL/SQL, etc.), or a scripting language (e.g. Lua in Redis).

So the idea here is to bring datastore functionality and scripting into the same process, the same way as we see it with dedicated databases in form of stored procedures, but this time from the other way around - bring the database to the scripting environment:

Advantages, when building standalone database servers:

  • Utilization of the Node.js platform and its ecosystem to evolve database products.

Advantages, when using this approach to join the application and the database layer:

  • The OS will not have to process the extra TCP/IP or IPC that occurs with out-of-process databases.
  • Data access latency will be lower.
  • In simple implementations, data access may be synchronous.
  • Simplified software stack.

Known Efforts

Native

In-VM

  • hive - "In memory store for Node JS"
  • nStore - "uses a safe append-only data format for quick inserts, updates, and deletes. Also a index of all documents and their exact location on the disk is stored in in memory for fast reads of any document"
  • node-tiny - "largely inspired by nStore, however, its goal was to implement real querying which goes easy on the memory"
  • BarricaneDB - "a transparent object persistence mechanism"
  • chaos - "we exploit the sha1 chaotic randomness to store the keys evenly in the filesystem"
  • Alfred - "a fast in-process key-value store for node.js"
  • awesome - "A Redis implementation in node.js"
  • nedis - "Redis server implementation written with nodejs"
  • EventVat - "evented in-process key/value store with an API like that of Redis"
  • PouchDB - "Portable CouchDB JavaScript implementation"

Scratchpad

  • Is v8 good for in-memory data storage? Data would be first class citizen and a lot of wheel-reinventing could be avoided. v8 translates JS directly into machine code, how to best leverage this? -- A simple test on Node v0.7.4 Mac revealed that on my 2,8GHz dual core machine, about 40M objects is where v8 starts to choke. Given the high level of optimization already done in v8, it's probably safe to assume that any significant improvement of v8's GC on the current architecture is not possible, at least not without adding significant memory usage overhead, or without significant rewrite of the current implementation. New possibilities are on the horizon in the form of GPU-supported GC, albeit patent-encumbered (see here and here), but we've already seen a lot worse situations where patent-free solutions were developed working-around the existing patents, e.g. WebM vs. H.264, and many others. However at the present time (2/2012), it is best not to consider node/v8 as a good storage for large number of objects.

Plans

  • decide on a good primitive data structures and operation set which would allow to model most used DB cases, including pub-sub

    • binary safe keys and data

    • map, ordered-map, deque

    • key timeouts

    • event emitter

    • atomic ops? transactions? (plan ahead for the concurrent impl?)

  • provide drop-in replaceable implementations with varying tradeoffs:

    • all data in memory (with or without secondary storage) - fastest, RAM-limited

    • all keys in memory - still pretty fast, not-so RAM-limited

    • keys and data in memory on-demand

  • provide further variations:

    • single-process - fast, but multiple cores and multiple Nodes cannot work with the same data, clustering must be applied

    • shared-memory implementation - certain overhead and latency but higher total performance up from a certain number of cores (atomic ops and async API necessary at this point)

API

Proposed base Synchronous API is here.

Asynchronous API will be added in the future.

Notes

Something went wrong with that request. Please try again.