A plugin for flume that allows you to use Cassandra as a sink.
Latest commit a93dd65 Jan 13, 2012 @thobbs Bump version to 1.0.0
Failed to load latest commit information.
.gitignore initial version Jan 3, 2012
LICENSE Added failover, license, changed data model. Sep 8, 2010
pom.xml Bump version to 1.0.0 Jan 13, 2012



The flume-cassandra-plugin allows you to use Apache Cassandra as a Flume sink.

Getting Started

  1. Download flume-cassandra-plugin-X.Y.tar.gz from the Downloads section.

  2. Extract the tarball with tar -xzf.

  3. Set $FLUME_CLASSPATH for all terminals which will run Flume master or node:

export FLUME_CLASSPATH=`pwd`/flume-cassandra-plugin-1.0.0.jar:`pwd`/uuid-3.2.0.jar:`pwd`/cassandra-thrift-1.0.6.jar
  1. Modify flume-site.xml (you may start out by copying flume-site.xml.template and removing the body of the file) to include:
    <description>Comma separated list of plugin classes</description>
  1. Start the flume master and a node. The node should log something like:
2011-08-07 21:29:54,793 [main] INFO conf.SinkFactoryImpl: Found sink builder simpleCassandraSink in org.apache.cassandra.plugins.flume.sink.SimpleCassandraSink
2011-08-07 21:29:54,793 [main] INFO conf.SinkFactoryImpl: Found sink builder logsandraSyslogSink in org.apache.cassandra.plugins.flume.sink.LogsandraSyslogSink


This plugin primarily targets log storage right now.

There are two sinks available for use: the SimpleCassandraSink and the LogsandraSyslogSink.

Simple Cassandra Sink

The Simple Cassandra Sink requires four arguments for its constructor:

  1. A keyspace (String). For example, 'Keyspace1'.
  2. A column family name (String) for storing data in.
  3. A column family name (String) for storing indexes in.
  4. A list Cassandra server hostname:port combinations (Strings)

Cassandra must already be configured so that the keyspace and both of the column families must already exist. The index column family should use a TimeUUIDType comparator. The data storage column family can use BytesType.

For example, in cassandra-cli you might create them like:

[default@unknown] connect localhost/9160;
Connected to: "Test Cluster" on localhost/9160
[default@unknown] create keyspace Keyspace1;
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1
[default@Keyspace1] create column family FlumeData;
Waiting for schema agreement...
... schemas agree across the cluster
[default@Keyspace1] create column family FlumeIndexes with comparator = 'TimeUUIDType';
Waiting for schema agreement...
... schemas agree across the cluster

When creating this sink with web UI (which you can access by default at http://localhost:35871/flumeconfig.jsp), you will use a sink like:

simpleCassandraSink("Keyspace1", "FlumeData", "FlumeIndexes", "localhost:9160")

For multiple servers use:

simpleCassandraSink("Keyspace1", "FlumeData", "FlumeIndexes", "localhost:9160", "hostname2:9160", "hostname3:9160")

If you're new to flume and you want to test that the plugin works, I recommend using a Source like asciisynth(20, 100). You should see 20 corresponding entries in each of the column families if you use list in cassandra-cli.

How it Works

When the Cassandra sink receives an event, it does the following:

  1. In the index column family: a. Creates a column where the name is a type 1 UUID (timestamp based) and the value is empty. b. Inserts it into row "YYYYMMDDHH" (the current date and hour) in the given column family.
  2. In the data column family: a. Creates a column where the name is 'data' and the value is the flume event body. b. Inserts it into a row with a key that is the same uuid from step 1.

This allows you to easily fetch all logs for a slice of time. Simply use something like get_slice() on the index column family to get the uuids you want for a particular slice of time, and then multiget the data column family using those uuids as the keys.


"No Cassandra servers available" Exception when trying to write data to a single server setup.

Your keyspace may of defaulted to the org.apache.cassandra.locator.NetworkTopologyStrategy replication strategy resulting in Cassandra being unable to successfully commit. You can check using describe Keyspace1; in the cassandra-cli. Try using the following create keyspace statement instead:

create keyspace Keyspace1 with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};