Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Tree: 779a0e6d69
Fetching contributors…

Cannot retrieve contributors at this time

108 lines (74 sloc) 3.16 KB
Lucandra - A Cassandra Backend for Lucene/Solr
By Jake Luciani -
Lucandra provides a Lucene IndexReader and IndexWriter that interfaces with Cassandra.
Solr is also supported.
Mailing List:
Useful articles:
You can see Lucandra in action here:
Bookmarks demo:
Lucandra includes a delicious like bookmarks demo to get you started. to build run the following:
1. Setup Cassandra 0.6 with storage-conf.xml in config
2. ant lucandra.jar
3. ant test -Dcassandra.port=9160 -Dcassandra.framed=false
#edit run-demo with appropriate settings (delicious clone)
4. -index bookmarks.tsv
5. -search title:linu*
Solr example:
Lucandra also supports Solr. to build run the following:
1. Setup Cassandra 0.6 with storage-conf.xml in config
2. ant lucandra.jar
3. cd solr-example; java -jar start.jar
4. cd exampledocs; ./ *.xml
5. surf to http://localhost:8983/solr/admin/
Storing an inverted index in Cassandra was the initial use-case for Cassandra at Facebook.
The Cassandra wiki discusses this:
"You can think of each super column name as a term and the columns within as the docids
with rank info and other attributes being a part of it. If you have keys as the userids
then you can have a per-user index stored in this form. This is how the per user index
for term search is laid out for Inbox search at Facebook."
Initially we implemented Lucene support with supercolumn as described but we ran into
a major scaling issue when we tried to index all of wikipedia.
Turns out Cassandra keeps the supercolumn in memory for a given key.
Also all columns for a key are tied to one cassandra node so we don't gain much scalability this way.
Thankfully Cassandra recently added support for distributed ordered keys that
allows us to use keys to store index terms without supercolumns.
Implementation Notes
The Lucandra Cassandra config looks like this.
<Keyspace Name="Lucandra">
<ColumnFamily CompareWith="BytesType" Name="TermVectors"/>
<ColumnFamily CompareWith="BytesType" Name="Documents"/>
*Documents Ids are currently random and autogenerated.
*Term keys and Document Keys are encoded as follows (using a random binary delimiter)
Term Key col name value
"index_name/field/term" => { documentId , position vector }
Document Key
"index_name/documentId" => { fieldName , value }
The IndexReader caches terms aggressively during search and tries to avoid lots of back and forth with Cassandra.
What Works
* Real-Time indexing (documents become available almost immediately)
* No optimizing
* Search
* Sort
* Delete
* Wildcards and other Lucene magic
What's Missing (for now)
* You can't walk the documents with index reader
* Faceting isn't yet supported
Jump to Line
Something went wrong with that request. Please try again.