Permalink
Browse files

Initial commit

  • Loading branch information...
0 parents commit c596123c396830c86e668d2db0074d34e58ad83a @vaterlaus committed Sep 14, 2010
@@ -0,0 +1,17 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<projectDescription>
+ <name>django_cassandra_backend</name>
+ <comment></comment>
+ <projects>
+ </projects>
+ <buildSpec>
+ <buildCommand>
+ <name>org.python.pydev.PyDevBuilder</name>
+ <arguments>
+ </arguments>
+ </buildCommand>
+ </buildSpec>
+ <natures>
+ <nature>org.python.pydev.pythonNature</nature>
+ </natures>
+</projectDescription>
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<?eclipse-pydev version="1.0"?>
+
+<pydev_project>
+<pydev_property name="org.python.pydev.PYTHON_PROJECT_INTERPRETER">Default</pydev_property>
+<pydev_property name="org.python.pydev.PYTHON_PROJECT_VERSION">python 2.6</pydev_property>
+
+</pydev_project>
@@ -0,0 +1,153 @@
+Introduction
+============
+This is an early development release of a Django backend for the Cassandra database.
+It has only been under development for a short time and there are almost certainly
+issues/bugs with this release -- see the end of this document for a list of known
+issues. Needless to say, you shouldn't use this release in a production setting, the
+format of the data stored in Cassandra may change in future versions, there's no
+promise of backwards compatibility with this version, and so on.
+
+Please let me know if you find any bugs or have any suggestions for how to improve
+the backend. You can contact me at: rob.vaterlaus@bigswitch.com
+
+Installation
+============
+The backend requires the 0.7 version of Cassandra. 0.7 has several features
+(e.g. programmatic creation/deletion of keyspaces & column families, secondary index
+support) that are useful with running as a Django database backend, so I targeted
+that version instead of 0.6. Unfortunately, the Cassandra Thrift API changed between
+0.6 and 0.7 so the two version are incompatible.
+
+There's a beta1 version of 0.7 available at the Cassandra web site. I'm actually using a
+somewhat later daily binary release dated 8/23. I obtained the daily release by following
+the "Latest Builds" link in the Cassandra downloads page, but the last few times I've
+tried it that link was dead, so I'm not sure what's going on with that. I had switched
+to the 8/23 release, because I had read that there was an issue with the secondary
+index support in the beta1 release and I was trying to get secondary index support
+working in the backend. As it turns out I was still seeing problems with the 8.23
+release, so I wound up disabling the secondary index code (details below), so it's
+possible/probable that the backend will work with the beta1 release, especially
+if you don't try to enable secondary index support (i.e. don't set the db_index
+to True for any of the fields). But I haven't tested with beta1, so no promises.
+
+The backend also requires the Django-nonrel fork of Django and djangotoolbox.
+Both are available here: <http://www.allbuttonspressed.com/projects/django-nonrel>.
+I installed the Django-nonrel version of Django globally in site-packages and
+copied djangotoolbox into the directory where I'm testing the Cassandra backend,
+but there are probably other (better?) ways to install those things.
+
+You also need to generate the Python Thrift API code as described in the Cassandra
+documentation and copy the generated "cassandra" directory (from Cassandra's
+interface/gen-py directory) over to the top-level Django project directory.
+
+To configure a project to use the Cassandra backend all you have to do is change
+the database settings in the settings.py file. Change the ENGINE value to be
+'django_cassandra.db' and the NAME value to be the name of the keyspace to use.
+You can set HOST and PORT to override the default values of 'localhost' and 9160.
+In theory you can also set USER and PASSWORD if you're using authentication with
+Cassandra, but this hasn't been tested yet, so it may not work.
+
+Configure Cassandra as described in the Cassandra documentation.
+If want to be able to do range queries over primary keys then you need to set the
+partitioner in the cassandra.yaml config file to be the OrderPreservingPartitioner.
+
+Once you're finished configuring Cassandra start up the Cassandra daemon process as
+described in the Cassandra documentation.
+
+Run syncdb. This creates the keyspace (if necessary) and the column families for the
+models in the installed apps. The Cassandra backend creates one column family per
+model. It will use the db_table value from the meta settings for the name of the
+column family if it's specified; otherwise it uses the default name similar to
+other backends.
+
+Now you should be able to use the normal model and query set calls from you
+Django code.
+
+This release includes a test project and app. If you want to use the backend in
+another project you just need to copy the django_cassandra directory to the
+top-level directory of the project (along with the cassandra and djangotoolbox
+directories).
+
+What Works
+==========
+- the basics: creating model instances, querying (get/filter/exclude), count,
+ update/save, delete, order_by
+- efficient queries for exact matches on the primary key. It can also do range
+ queries on the primary key, but your Cassandra cluster must be configured to use the
+ OrderPreservingPartitioner if you want to do that. Unfortunately, currently it
+ doesn't fail gracefully if you try to do range queries when using the
+ RandomPartitioner, so just don't do that for now :-)
+- inefficient queries for everything else that can't be done efficiently in
+ Cassandra. The basic approach used in the query processing code is to first try
+ to prune the number of rows to look at by finding a part of the query that can
+ be evaluated efficiently (i.e. a primary key filter predicate or a secondary
+ index predicate, once that's working). Then it evaluates the remaining filter
+ predicates over the pruned rows to obtain the final result. If there's no part
+ of the query that can be evaluated efficiently, then it just fetches the entire
+ set of rows and does all of the filtering in the backend code.
+- programmatic creation of the keyspace & column families via syncdb
+- Django admin UI, except for users in the auth application (see below)
+- I think all of the filter operations (e.g. gt, startswith, regex, etc.) are supported
+ although it's possible I missed something
+- complex queries with Q nodes
+
+What Doesn't Work (Yet)
+=======================
+- Secondary Index Support: There's code in there to use secondary indexes, but
+ I was seeing weird results when I tried to execute Cassandra queries using the
+ secondary indexes so I disabled that code. Hopefully that's just an issue with the
+ specific version of Cassandra I'm using, but I haven't tried it out with a more
+ recent version to see if it's working now. If you're feeling adventurous you could
+ try it out with a newer version of Cassandra and enable the secondary index code
+ by setting the value of SECONDARY_INDEX_SUPPORT_ENABLED to True in predicate.py.
+ You enable secondary index support for fields by setting the db_index argument to
+ True when constructing the field.
+- I haven't tested all of the different field types, so there are probably
+ issues there with how the data is converted to and from Cassandra with some of the
+ field types. My use case was mostly string fields, so most of the testing was with
+ that. I've also tried out date, datetime, time, and decimal fields, so I think
+ those should work too, but I haven't tried anything else.
+- joins
+- chunked queries. It just tries to get everything all at once from Cassandra.
+ Currently the maximum that it can get (i.e. the count value in the Cassandra
+ Thrift API) is set semi-arbitrarily to 10000, so if you try to query over a
+ column family with more rows (or columns) than that it may not work.
+ Probably the value could be set higher than that, but at some point Cassandra
+ fails if it's too big (i.e. it didn't work if I set it to 0x7fffffff).
+ If you want to make it bigger you can change the MAX_FETCH_COUNT variable
+ in compiler.py.
+- ListModel/ListField support from djangotoolbox (I think?). I haven't
+ investigated how this works and if it's feasible to support in Cassandra,
+ although I'm guessing it probably wouldn't be too hard. For now, this means
+ that several of the unit tests from djangotoolbox fail if you have that
+ in your installed apps.
+- there's no way to configure the settings used to create the keyspaces
+ and column families (e.g. replication strategy, replication factor) or the
+ read & write consistency levels used when querying or inserting/mutating
+ columns in Cassandra. My plan was to add global database settings and
+ per-model Meta settings to configure those things, but I haven't gotten to
+ it yet.
+- Cassandra authentication. Actually this may work but I haven't tested it yet.
+ There's code in there that tries to login to Cassandra if the USER and
+ PASSWORD are specified in the database settings, but I've only tested with
+ the AllowAllAuthenticator.
+- probably a lot of other stuff that I've forgotten or am unaware of :-)
+
+Known Issues
+============
+- I haven't been able to get the admin UI to work for users in the Django
+ authentication middleware. I included djangotoolbox in my installed apps, as
+ suggested on the Django-nonrel web site, which got my further, but I still get
+ an error in some Django template code that tries to render a change list (I think).
+ I still need to track down what's going on there.
+- f you enable the authentication and session middleware a bunch of the
+ associated unit tests fail if you run all of the unit tests.
+ This may be related to the issue with editing users in the admin UI
+- the code needs a cleanup pass for things like the exception handling/safety,
+ some refactoring, more pydoc comments, etc.
+- I have a feeling there are some places where I haven't completely leveraged
+ the code in djangotoolbox, so there may be places where I haven't done
+ things in the optimal way
+- the error handling/messaging isn't great for things like the Cassandra
+ daemon not running, a versioning mismatch between client and Cassandra
+ daemon, etc.
No changes.
No changes.
No changes.
@@ -0,0 +1,88 @@
+from djangotoolbox.db.base import NonrelDatabaseFeatures, \
+ NonrelDatabaseOperations, NonrelDatabaseWrapper, NonrelDatabaseClient, \
+ NonrelDatabaseValidation, NonrelDatabaseIntrospection, \
+ NonrelDatabaseCreation
+
+from thrift import Thrift
+from thrift.transport import TTransport
+from thrift.transport import TSocket
+from thrift.protocol import TBinaryProtocol
+from cassandra import Cassandra
+from cassandra.ttypes import *
+import time
+from .creation import DatabaseCreation
+from .introspection import DatabaseIntrospection
+
+class DatabaseFeatures(NonrelDatabaseFeatures):
+ string_based_auto_field = True
+
+class DatabaseOperations(NonrelDatabaseOperations):
+ compiler_module = __name__.rsplit('.', 1)[0] + '.compiler'
+
+ def sql_flush(self, style, tables, sequence_list):
+ for table_name in tables:
+ self.connection.creation.flush_table(table_name)
+ return ""
+
+class DatabaseClient(NonrelDatabaseClient):
+ pass
+
+class DatabaseValidation(NonrelDatabaseValidation):
+ pass
+
+# TODO: Maybe move this somewhere else? db.utils.py maybe?
+class CassandraConnection(object):
+ def __init__(self, client, transport):
+ self.client = client
+ self.transport = transport
+
+ def commit(self):
+ pass
+
+ def open(self):
+ if self.transport:
+ self.transport.open()
+
+ def close(self):
+ if self.transport:
+ self.transport.close()
+
+class DatabaseWrapper(NonrelDatabaseWrapper):
+ def __init__(self, *args, **kwds):
+ super(DatabaseWrapper, self).__init__(*args, **kwds)
+
+ # Set up the associated backend objects
+ self.features = DatabaseFeatures(self)
+ self.ops = DatabaseOperations(self)
+ self.client = DatabaseClient(self)
+ self.creation = DatabaseCreation(self)
+ self.validation = DatabaseValidation(self)
+ self.introspection = DatabaseIntrospection(self)
+
+ # Get the host and port specified in the database backend settings.
+ # Default to the standard Cassandra settings.
+ host = self.settings_dict.get('HOST')
+ if not host or host == '':
+ host = 'localhost'
+ port = self.settings_dict.get('PORT')
+ if not port or port == '':
+ port = 9160
+
+ # Create the client connection to the Cassandra daemon
+ socket = TSocket.TSocket(host, port)
+ transport = TTransport.TFramedTransport(TTransport.TBufferedTransport(socket))
+ protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
+ client = Cassandra.Client(protocol)
+
+ # Create our connection wrapper
+ self.db_connection = CassandraConnection(client, transport)
+ self.db_connection.open()
+
+ version = client.describe_version()
+ # FIXME: Should do some version check here to make sure that we're
+ # talking to a cassandra daemon that supports the operations we require
+
+ # Set up the Cassandra keyspace
+ keyspace_name = self.settings_dict.get('NAME')
+ self.creation.init_keyspace(keyspace_name)
+
Oops, something went wrong.

0 comments on commit c596123

Please sign in to comment.