Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

[split] update readme and docs for cassie

  • Loading branch information...
commit 8368b2a36cf81c1aebbe83146022046f75175c82 1 parent 12a24ea
Ryan King authored
Showing with 111 additions and 164 deletions.
  1. +6 −0 CHANGELOG
  2. +20 −0 ISSUES
  3. +10 −52 LICENSE
  4. +1 −2  OWNERS
  5. +74 −110 README.md
6 CHANGELOG
View
@@ -1,3 +1,9 @@
+# 0.19.0
+* add a ServerSetsCluster constructor that accepts a ZooKeeperClient instance (matthew billoti)
+* upgrade to util 1.12.12 and finagle 1.10.0
+* support out-of-order operations on FakeCassandra
+* numerous cleanups in preparation for open-sourcing
+
# 0.18.0
* make FakeCassandra spin up on random port (@kmx)
* Allow the column iteratee to take in order and limit. (@skr)
20 ISSUES
View
@@ -0,0 +1,20 @@
+Herein are a list of known issues and areas for improvement–
+
+# Mixed Java/Scala data structures
+
+We currently use a mix of java and scala data structures. We should really standardize on one,
+then provide wrappers for others. THe plan is to standardize on scala data structures internally and
+then provide wrappers for java compatibility/ease-of-use.
+
+# Code duplication
+
+We've duplicated code across ColumFamily, SuperColumFamily, CounterColumFamily and SuperCounterColumFamily.
+We need to pull this back together in a more sane way. (also Column and CounterColumn)
+
+# Incomplete Cassandra feature support
+
+Not all cassandra operations are supported (we've taken a JIT approach).
+
+# Iteratee code is fragile and complicated
+
+We should move to Finagle Spools.
62 LICENSE
View
@@ -1,55 +1,13 @@
-Copyright (c) 2010 Coda Hale
+Copyright 2010 Coda Hale; 2011-2012 Twitter, Inc.
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
+http://www.apache.org/licenses/LICENSE-2.0
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
-LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
---------------------------------------------------------------------------------
-
-Apache Thrift
-Copyright 2006-2009 The Apache Software Foundation, et al.
-
-This product includes software developed at
-The Apache Software Foundation (http://www.apache.org/).
-
---------------------------------------------------------------------------------
-
-Apache Cassandra
-Copyright 2009, 2010 The Apache Software Foundation
-
-This product includes software developed by The Apache Software
-Foundation (http://www.apache.org/).
-
-Some alternate data structures provided by high-scale-lib from
-http://sourceforge.net/projects/high-scale-lib/.
-Written by Cliff Click and released as Public Domain.
-
-Some alternate data structures provided by concurrentlinkedhashmap
-from http://code.google.com/p/concurrentlinkedhashmap/.
-Copyright 2009 Benjamin Manes
-
-Alternative collection types provided by google-collections from
-http://code.google.com/p/google-collections/.
-Copyright (C) 2007 Google Inc.
-
-JSON (de)serialization provided by jackson (http://jackson.codehaus.org).
-Copyright (C) 2010 Tatu Saloranta and others.
-
-Alternative JSON (de)serialization by json-simple from
-(http://code.google.com/p/json-simple).
-Copyright (C) 2009 Fang Yidong and Chris Nokleberg
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
3  OWNERS
View
@@ -1,3 +1,2 @@
*: ryan, stuhood, harish
-review_group:cassandra
-review_group:data-services
+review_group:cassandra
184 README.md
@@ -2,8 +2,10 @@ Cassie
======
Cassie is a small, lightweight Cassandra client built on
-[Finagle](http://github.com/twitter/finagle) with with all that provides plus
-column name/value encoding and decoding.
+[Finagle](http://github.com/twitter/finagle) with with all that provides plus column name/value
+encoding and decoding.
+
+It is heavily used in production at Twitter so such be considered stable, yet it is incomplete in that it doesn't support the full feature set of Cassandra and will continue to evolve.
Requirements
------------
@@ -11,6 +13,7 @@ Requirements
* Java SE 6
* Scala 2.8
* Cassandra 0.8 or later
+* sbt 0.7
Note that Cassie *is* usable from Java. Its not super easy, but we're working
to make it easier.
@@ -22,8 +25,13 @@ In your [simple-build-tool](http://code.google.com/p/simple-build-tool/) project
file, add Cassie as a dependency:
val twttr = "Twitter's Repository" at "http://maven.twttr.com/"
- val cassie = "com.twitter" % "cassie" % "0.16.0"
+ val cassie = "com.twitter" % "cassie" % "0.19.0"
+
+Finagle
+-------
+Before going further, you should probably learn about Finagle and its paradigm for asynchronous
+computing– https://github.com/twitter/finagle.
Connecting To Your Cassandra Cluster
------------------------------------
@@ -58,34 +66,29 @@ A Longer Note, This Time On Column Names And Values
---------------------------------------------------
Cassandra stores the name and value of a column as an array of bytes. To
-convert these bytes to and from useful Scala types, Cassie uses implicit `Codec`
+convert these bytes to and from useful Scala types, Cassie uses `Codec`
parameters for the given type.
For example, take adding a column to a column family of UTF-8 strings:
+ val strings = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]
strings.insert("newstring", Column("colname", "colvalue"))
-The `insert` method looks for implicit parameters of type `Codec[String]` to
-convert the key, name and value to byte arrays. In this case, the `codecs`
-package already provides `Utf8Codec` as an implicit parameter, so the conversion
-is seamless. Cassie handles `String` and `Array[Byte]` instances out of the box,
-and also provides some useful non-standard types:
-
-* `AsciiString`: character sequence encoded with `US-ASCII`
-* `Int`: 32-bit integer stored as a 4-byte sequence
-* `Long`: 64-bit integer stored as an 8-byte sequence
-
-These types also have implicit conversions defined, so if you have an instance
-of `ColumnFamily[String, String, VarLong]` you can use regular `Long`s.
+The `insert` method here requires a String and Column[String, String] because the type parameters of the columnFamily call were all `Codec[String]`. The conversion between Strings and ByteArrays will be seamless. Cassie has codecs for a number of data types already:
+* `Utf8Codec`: character sequence encoded with `UTF-8`
+* `IntCodec`: 32-bit integer stored as a 4-byte sequence
+* `LongCodec`: 64-bit integer stored as an 8-byte sequence
+* `LexicalUUIDCodec` a UUID stored as a 16-byte sequence
+* `ThriftCodec` a Thrift struct stored as variable-length sequence of bytes
Accessing Column Families
-------------------------
Once you've got a `Keyspace` instance, you can load your column families:
- val people = keyspace.columnFamily[String, String, String]("People", MicrosecondEpochClock)
- val numbers = keyspace.columnFamily[String, String, VarInt]("People", MicrosecondEpochClock,
+ val people = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]("People")
+ val numbers = keyspace.columnFamily[Utf8Codec, Utf8Codec, IntCodec]("People",
defaultReadConsistency = ReadConsistency.One,
defaultWriteConsistency = WriteConsistency.Any)
@@ -96,8 +99,6 @@ can change this default or simply pass a different consistency level to specific
read and write operations.
-TODO: write or link to docs on Futures
-
Reading Data From Cassandra
---------------------------
@@ -105,15 +106,30 @@ Now that you've got your `ColumnFamily`, you can read some data from Cassandra:
people.getColumn("codahale", "name")
-`getColumn` returns an `Future[Option[Column[Name, Value]]]` where `Name` and `Value`
-are the type parameters of the `ColumnFamily`. If the row or column doesn't
-exist, `None` is returned.
+`getColumn` returns an `Future[Option[Column[Name, Value]]]` where `Name` and `Value` are the type
+parameters of the `ColumnFamily`. If the row or column doesn't exist, `None` is returned. Explaining
+Futures is out of scope for this README, go the Finagle docs to learn more. But in essence you can
+do this:
+
+ people.getColumn("codahale", "name") map {
+ _ match {
+ case col: Some(Column[String, String]) => # we have data
+ case None => # there was no column
+ }
+ } handle {
+ case e => {
+ # there was an exception, do something about it
+ }
+ }
-You can also get a set of columns:
+This whole block returns a Future which will be satisfied when the thrift rpc is done and the
+callbacks have run.
+
+Anyway, continuing– you can also get a set of columns:
people.getColumns("codahale", Set("name", "motto"))
-This returns a `Future[Map[Name, Column[Name, Value]]]`, where each column is mapped by
+This returns a `Future[java.util.Map[Name, Column[Name, Value]]]`, where each column is mapped by
its name.
If you want to get all columns of a row, that's cool too:
@@ -125,50 +141,29 @@ Cassie also supports multiget for columns and sets of columns:
people.multigetColumn(Set("codahale", "darlingnikles"), "name")
people.multigetColumns(Set("codahale", "darlingnikles"), Set("name", "motto"))
-`multigetColumn` returns a `Future[Map[Key, Map[Name, Column[Name, Value]]]]` which
-maps row keys to column names to columns.
-
+`multigetColumn` returns a `Future[Map[Key, Map[Name, Column[Name, Value]]]]` whichmaps row keys to
+column names to columns.
-Iterating Through Rows
-----------------------
-Cassie provides functionality for iterating through the rows of a column family.
-This works with both the random partitioner and the order-preserving
-partitioner.
+Asynchronous Iteration Through Rows and Columns
+-----------------------------------------------
-It does this by requesting a certain number of rows, starting with the first
-possible row (`""`) and ending with the last row possible row (`""`). The last
-key of the returned rows is then used as the start key for the next request,
-until either no rows are returned or the last row is returned twice.
+NOTE: This is new/experimental and likely to change in the future.
-(The performance hit in this is that the last row of one request will be the
-first row of the next.)
+Cassie provides functionality for iterating through the rows of a column family and columns in a
+row. This works with both the random partitioner and the order-preserving partitioner, though
+iterating through rows in the random partitioner had undefined order.
You can iterate over every column of every row:
- for ((key, col) <- people.rowIteratee(100) {
- println(" Found column %s in row %s", col, key)
- }
-
-(This gets 100 rows at a time.)
-
-Or just one column from every row:
+ val finished = cf.rowsIteratee(100).foreach { case(key, columns) =>
+ println(key) //this function is executed async for each row
+ println(cols)
+ }
+ finished() //this is a Future[Unit]. wait on it to know when the iteration is done
- for ((key, col) <- people.columnIteratee(100, "name") {
- println(" Found column %s in row %s", col, key)
- }
-
-Or a set of columns from every row:
-
- for ((key, col) <- people.columnsIteratee(100, Set("name", "motto")) {
- println(" Found column %s in row %s", col, key)
- }
+This gets 100 rows at a time and calls the above partial function on each one.
-The 'ColumnIteratee' object returned by these methods implements Iterable for
-use in loops like those shown, but it also allows for async iteration. An
-Iteratee contains a batch of values, and has a hasNext() method indicating
-whether more batches are available. If more batches are available, continue()
-will request the next batch and return a Future[Iteratee].
Writing Data To Cassandra
-------------------------
@@ -183,21 +178,14 @@ You can insert a value with a specific timestamp:
people.insert("darlingnikles", Column("name", "Niki").timestamp(200L))
people.insert("darlingnikles", Column("motto", "Told ya.").timestamp(201L))
-Or even insert column names and values of a different type than those of the
-`ColumnFamily`:
-
- people.insert("biscuitfoof", Column[AsciiString, AsciiString]("name", "Biscuit"))
- people.insert("biscuitfoof", Column[AsciiString, AsciiString]("motto", "Mlalm."))
-
Batch operations are also possible:
people.batch() { cf =>
cf.insert("puddle", Column("name", "Puddle"))
cf.insert("puddle", Column("motto", "Food!"))
- }
+ }.execute()
-(See `BatchMutationBuilder` for a better idea of which operations are
-available.)
+(See `BatchMutationBuilder` for a better idea of which operations are available.)
Deleting Data From Cassandra
@@ -222,73 +210,49 @@ Or even a row:
Generating Unique IDs
---------------------
-If you're going to be storing data in Cassandra and don't have a naturally
-unique piece of data to use as a key, you've probably looked into UUIDs. The
-only problem with UUIDs is that they're mental, requiring access to MAC
-addresses or Gregorian calendars or POSIX ids. In general, people want UUIDs
+If you're going to be storing data in Cassandra and don't have a naturally unique piece of data to
+use as a key, you've probably looked into UUIDs. The only problem with UUIDs is that they're mental,
+requiring access to MAC addresses or Gregorian calendars or POSIX ids. In general, people want UUIDs
which are:
* Unique across a large set of workers without requiring coordination.
* Partially ordered by time.
-Cassie's `LexicalUUID`s meet these criteria. They're 128 bits long. The most
-significant 64 bits are a timestamp value (from one of Cassie's
-strictly-increasing `Clock` implementations -- `NanosecondEpochClock` is
-recommended). The least significant 64 bits are a worker ID, with the default
-value being a hash of the machine's hostname.
+Cassie's `LexicalUUID`s meet these criteria. They're 128 bits long. The most significant 64 bits are
+a timestamp value (from Cassie's strictly-increasing `Clock` implementation). The least significant
+64 bits are a worker ID, with the default value being a hash of the machine's hostname.
-When sorted using Cassandra's `LexicalUUIDType`, `LexicalUUID`s will be
-partially ordered by time -- that is, UUIDs generated in order on a single
-process will be totally ordered by time; UUIDs generated simultaneously (i.e.,
-within the same clock tick, given clock skew) will not have a deterministic
-order; UUIDs generated in order between single processes (i.e., in different
-clock ticks, given clock skew) will be totally ordered by time.
+When sorted using Cassandra's `LexicalUUIDType`, `LexicalUUID`s will be partially ordered by time --
+that is, UUIDs generated in order on a single process will be totally ordered by time; UUIDs
+generated simultaneously (i.e., within the same clock tick, given clock skew) will not have a
+deterministic order; UUIDs generated in order between single processes (i.e., in different clock
+ticks, given clock skew) will be totally ordered by time.
-See *Lamport. Time, clocks, and the ordering of events in a distributed system.
-Communications of the ACM (1978) vol. 21 (7) pp. 565* and *Mattern. Virtual time
-and global states of distributed systems. Parallel and Distributed Algorithms
-(1989) pp. 215–226* for a more thorough discussion.
+See *Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of
+the ACM (1978) vol. 21 (7) pp. 565* and *Mattern. Virtual time and global states of distributed
+systems. Parallel and Distributed Algorithms (1989) pp. 215–226* for a more thorough discussion.
-`LexicalUUID`s can be used as column names, in which case they're stored as
-16-byte values and are sortable by `LexicalUUIDType`, or as keys, in which case
-they're stored as traditional, hex-encoded strings. Cassie provides implicit
-conversions between `LexicalUUID` and `String`:
- val uuid = LexicalUUID(people.clock)
-
- people.insert(uuid, Column("one", "two")) // converted to hex automatically
-
- people.insert("key", Column(uuid, "what")) // converted to a byte array
-
-
-TODO counter column families
Things What Ain't Done Yet
==========================
-* Anything relating to super columns
-* Range queries
* Authentication
-* Counting
* Meta data (e.g., `describe_*`)
-Why? I don't need it yet.
-
-
Thanks
======
-Many thanks to:
+Many thanks to (pre twitter fork):
* Cliff Moon
* James Golick
* Robert J. Macomber
-
License
-------
Copyright (c) 2010 Coda Hale
-Copyright (c) 2011 Twitter, Inc.
+Copyright (c) 2011-2012 Twitter, Inc.
-Published under The MIT License, see LICENSE
+Published under The Apache 2.0 License, see LICENSE.
Please sign in to comment.
Something went wrong with that request. Please try again.