Skip to content
This repository
Browse code

[split] update readme and docs for cassie

  • Loading branch information...
commit 8368b2a36cf81c1aebbe83146022046f75175c82 1 parent 12a24ea
Ryan King authored January 31, 2012
6  CHANGELOG
... ...
@@ -1,3 +1,9 @@
  1
+# 0.19.0
  2
+* add a ServerSetsCluster constructor that accepts a ZooKeeperClient instance (matthew billoti)
  3
+* upgrade to util 1.12.12 and finagle 1.10.0
  4
+* support out-of-order operations on FakeCassandra
  5
+* numerous cleanups in preparation for open-sourcing
  6
+
1 7
 # 0.18.0
2 8
 * make FakeCassandra spin up on random port (@kmx)
3 9
 * Allow the column iteratee to take in order and limit. (@skr)
20  ISSUES
... ...
@@ -0,0 +1,20 @@
  1
+Herein are a list of known issues and areas for improvement–
  2
+
  3
+# Mixed Java/Scala data structures
  4
+
  5
+We currently use a mix of java and scala data structures. We should really standardize on one,
  6
+then provide wrappers for others. THe plan is to standardize on scala data structures internally and
  7
+then provide wrappers for java compatibility/ease-of-use.
  8
+
  9
+# Code duplication
  10
+
  11
+We've duplicated code across ColumFamily, SuperColumFamily, CounterColumFamily and SuperCounterColumFamily.
  12
+We need to pull this back together in a more sane way. (also Column and CounterColumn)
  13
+
  14
+# Incomplete Cassandra feature support
  15
+
  16
+Not all cassandra operations are supported (we've taken a JIT approach).
  17
+
  18
+# Iteratee code is fragile and complicated
  19
+
  20
+We should move to Finagle Spools.
62  LICENSE
... ...
@@ -1,55 +1,13 @@
1  
-Copyright (c) 2010 Coda Hale
  1
+Copyright 2010 Coda Hale; 2011-2012 Twitter, Inc.
2 2
 
3  
-Permission is hereby granted, free of charge, to any person obtaining
4  
-a copy of this software and associated documentation files (the
5  
-"Software"), to deal in the Software without restriction, including
6  
-without limitation the rights to use, copy, modify, merge, publish,
7  
-distribute, sublicense, and/or sell copies of the Software, and to
8  
-permit persons to whom the Software is furnished to do so, subject to
9  
-the following conditions:
  3
+Licensed under the Apache License, Version 2.0 (the "License");
  4
+you may not use this file except in compliance with the License.
  5
+You may obtain a copy of the License at
10 6
 
11  
-The above copyright notice and this permission notice shall be
12  
-included in all copies or substantial portions of the Software.
  7
+http://www.apache.org/licenses/LICENSE-2.0
13 8
 
14  
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15  
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16  
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17  
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18  
-LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19  
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20  
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21  
-
22  
---------------------------------------------------------------------------------
23  
-
24  
-Apache Thrift
25  
-Copyright 2006-2009 The Apache Software Foundation, et al.
26  
-
27  
-This product includes software developed at
28  
-The Apache Software Foundation (http://www.apache.org/).
29  
-
30  
---------------------------------------------------------------------------------
31  
-
32  
-Apache Cassandra
33  
-Copyright 2009, 2010 The Apache Software Foundation
34  
-
35  
-This product includes software developed by The Apache Software
36  
-Foundation (http://www.apache.org/).
37  
-
38  
-Some alternate data structures provided by high-scale-lib from
39  
-http://sourceforge.net/projects/high-scale-lib/.
40  
-Written by Cliff Click and released as Public Domain.
41  
-
42  
-Some alternate data structures provided by concurrentlinkedhashmap
43  
-from http://code.google.com/p/concurrentlinkedhashmap/.
44  
-Copyright 2009 Benjamin Manes
45  
-
46  
-Alternative collection types provided by google-collections from
47  
-http://code.google.com/p/google-collections/.
48  
-Copyright (C) 2007 Google Inc.
49  
-
50  
-JSON (de)serialization provided by jackson (http://jackson.codehaus.org).
51  
-Copyright (C) 2010 Tatu Saloranta and others.
52  
-
53  
-Alternative JSON (de)serialization by json-simple from
54  
-(http://code.google.com/p/json-simple).
55  
-Copyright (C) 2009 Fang Yidong and Chris Nokleberg
  9
+Unless required by applicable law or agreed to in writing, software
  10
+distributed under the License is distributed on an "AS IS" BASIS,
  11
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12
+See the License for the specific language governing permissions and
  13
+limitations under the License.
3  OWNERS
... ...
@@ -1,3 +1,2 @@
1 1
 *: ryan, stuhood, harish
2  
-review_group:cassandra
3  
-review_group:data-services
  2
+review_group:cassandra
184  README.md
Source Rendered
@@ -2,8 +2,10 @@ Cassie
2 2
 ======
3 3
 
4 4
 Cassie is a small, lightweight Cassandra client built on
5  
-[Finagle](http://github.com/twitter/finagle) with with all that provides plus
6  
-column name/value encoding and decoding.
  5
+[Finagle](http://github.com/twitter/finagle) with with all that provides plus column name/value
  6
+encoding and decoding.
  7
+
  8
+It is heavily used in production at Twitter so such be considered stable, yet it is incomplete in that it doesn't support the full feature set of Cassandra and will continue to evolve.
7 9
 
8 10
 Requirements
9 11
 ------------
@@ -11,6 +13,7 @@ Requirements
11 13
 * Java SE 6
12 14
 * Scala 2.8
13 15
 * Cassandra 0.8 or later
  16
+* sbt 0.7
14 17
 
15 18
 Note that Cassie *is* usable from Java. Its not super easy, but we're working
16 19
 to make it easier.
@@ -22,8 +25,13 @@ In your [simple-build-tool](http://code.google.com/p/simple-build-tool/) project
22 25
 file, add Cassie as a dependency:
23 26
 
24 27
     val twttr = "Twitter's Repository" at "http://maven.twttr.com/"
25  
-    val cassie = "com.twitter" % "cassie" % "0.16.0"
  28
+    val cassie = "com.twitter" % "cassie" % "0.19.0"
  29
+
  30
+Finagle
  31
+-------
26 32
 
  33
+Before going further, you should probably learn about Finagle and its paradigm for asynchronous
  34
+computing– https://github.com/twitter/finagle.
27 35
 
28 36
 Connecting To Your Cassandra Cluster
29 37
 ------------------------------------
@@ -58,34 +66,29 @@ A Longer Note, This Time On Column Names And Values
58 66
 ---------------------------------------------------
59 67
 
60 68
 Cassandra stores the name and value of a column as an array of bytes. To
61  
-convert these bytes to and from useful Scala types, Cassie uses implicit `Codec`
  69
+convert these bytes to and from useful Scala types, Cassie uses `Codec`
62 70
 parameters for the given type.
63 71
 
64 72
 For example, take adding a column to a column family of UTF-8 strings:
65 73
 
  74
+    val strings = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]
66 75
     strings.insert("newstring", Column("colname", "colvalue"))
67 76
 
68  
-The `insert` method looks for implicit parameters of type `Codec[String]` to
69  
-convert the key, name and value to byte arrays. In this case, the `codecs`
70  
-package already provides `Utf8Codec` as an implicit parameter, so the conversion
71  
-is seamless. Cassie handles `String` and `Array[Byte]` instances out of the box,
72  
-and also provides some useful non-standard types:
73  
-
74  
-* `AsciiString`: character sequence encoded with `US-ASCII`
75  
-* `Int`: 32-bit integer stored as a 4-byte sequence
76  
-* `Long`: 64-bit integer stored as an 8-byte sequence
77  
-
78  
-These types also have implicit conversions defined, so if you have an instance
79  
-of `ColumnFamily[String, String, VarLong]` you can use regular `Long`s.
  77
+The `insert` method here requires a String and Column[String, String] because the type parameters of the columnFamily call were all `Codec[String]`.  The conversion between Strings and ByteArrays will be seamless. Cassie has codecs for a number of data types already:
80 78
 
  79
+* `Utf8Codec`: character sequence encoded with `UTF-8`
  80
+* `IntCodec`: 32-bit integer stored as a 4-byte sequence
  81
+* `LongCodec`: 64-bit integer stored as an 8-byte sequence
  82
+* `LexicalUUIDCodec` a UUID stored as a 16-byte sequence
  83
+* `ThriftCodec` a Thrift struct stored as variable-length sequence of bytes
81 84
 
82 85
 Accessing Column Families
83 86
 -------------------------
84 87
 
85 88
 Once you've got a `Keyspace` instance, you can load your column families:
86 89
 
87  
-    val people  = keyspace.columnFamily[String, String, String]("People", MicrosecondEpochClock)
88  
-    val numbers = keyspace.columnFamily[String, String, VarInt]("People", MicrosecondEpochClock,
  90
+    val people  = keyspace.columnFamily[Utf8Codec, Utf8Codec, Utf8Codec]("People")
  91
+    val numbers = keyspace.columnFamily[Utf8Codec, Utf8Codec, IntCodec]("People",
89 92
                     defaultReadConsistency = ReadConsistency.One,
90 93
                     defaultWriteConsistency = WriteConsistency.Any)
91 94
 
@@ -96,8 +99,6 @@ can change this default or simply pass a different consistency level to specific
96 99
 read and write operations.
97 100
 
98 101
 
99  
-TODO: write or link to docs on Futures
100  
-
101 102
 Reading Data From Cassandra
102 103
 ---------------------------
103 104
 
@@ -105,15 +106,30 @@ Now that you've got your `ColumnFamily`, you can read some data from Cassandra:
105 106
 
106 107
     people.getColumn("codahale", "name")
107 108
 
108  
-`getColumn` returns an `Future[Option[Column[Name, Value]]]` where `Name` and `Value`
109  
-are the type parameters of the `ColumnFamily`. If the row or column doesn't
110  
-exist, `None` is returned.
  109
+`getColumn` returns an `Future[Option[Column[Name, Value]]]` where `Name` and `Value` are the type
  110
+parameters of the `ColumnFamily`. If the row or column doesn't exist, `None` is returned. Explaining
  111
+Futures is out of scope for this README, go the Finagle docs to learn more. But in essence you can 
  112
+do this:
  113
+
  114
+  people.getColumn("codahale", "name") map {
  115
+    _ match {
  116
+      case col: Some(Column[String, String]) => # we have data
  117
+      case None => # there was no column
  118
+    }
  119
+  } handle {
  120
+    case e => {
  121
+      # there was an exception, do something about it
  122
+    }
  123
+  }
111 124
 
112  
-You can also get a set of columns:
  125
+This whole block returns a Future which will be satisfied when the thrift rpc is done and the
  126
+callbacks have run.
  127
+
  128
+Anyway, continuing– you can also get a set of columns:
113 129
 
114 130
     people.getColumns("codahale", Set("name", "motto"))
115 131
 
116  
-This returns a `Future[Map[Name, Column[Name, Value]]]`, where each column is mapped by
  132
+This returns a `Future[java.util.Map[Name, Column[Name, Value]]]`, where each column is mapped by
117 133
 its name.
118 134
 
119 135
 If you want to get all columns of a row, that's cool too:
@@ -125,50 +141,29 @@ Cassie also supports multiget for columns and sets of columns:
125 141
     people.multigetColumn(Set("codahale", "darlingnikles"), "name")
126 142
     people.multigetColumns(Set("codahale", "darlingnikles"), Set("name", "motto"))
127 143
 
128  
-`multigetColumn` returns a `Future[Map[Key, Map[Name, Column[Name, Value]]]]` which
129  
-maps row keys to column names to columns.
130  
-
  144
+`multigetColumn` returns a `Future[Map[Key, Map[Name, Column[Name, Value]]]]` whichmaps row keys to
  145
+column names to columns.
131 146
 
132  
-Iterating Through Rows
133  
-----------------------
134 147
 
135  
-Cassie provides functionality for iterating through the rows of a column family.
136  
-This works with both the random partitioner and the order-preserving
137  
-partitioner.
  148
+Asynchronous Iteration Through Rows and Columns
  149
+-----------------------------------------------
138 150
 
139  
-It does this by requesting a certain number of rows, starting with the first
140  
-possible row (`""`) and ending with the last row possible row (`""`). The last
141  
-key of the returned rows is then used as the start key for the next request,
142  
-until either no rows are returned or the last row is returned twice.
  151
+NOTE: This is new/experimental and likely to change in the future.
143 152
 
144  
-(The performance hit in this is that the last row of one request will be the
145  
-first row of the next.)
  153
+Cassie provides functionality for iterating through the rows of a column family and columns in a
  154
+row. This works with both the random partitioner and the order-preserving partitioner, though
  155
+iterating through rows in the random partitioner had undefined order.
146 156
 
147 157
 You can iterate over every column of every row:
148 158
 
149  
-    for ((key, col) <- people.rowIteratee(100) {
150  
-      println(" Found column %s in row %s", col, key)
151  
-    }
152  
-
153  
-(This gets 100 rows at a time.)
154  
-
155  
-Or just one column from every row:
  159
+  val finished = cf.rowsIteratee(100).foreach { case(key, columns) =>
  160
+   println(key) //this function is executed async for each row
  161
+   println(cols)
  162
+  }
  163
+  finished() //this is a Future[Unit]. wait on it to know when the iteration is done
156 164
 
157  
-    for ((key, col) <- people.columnIteratee(100, "name") {
158  
-      println(" Found column %s in row %s", col, key)
159  
-    }
160  
-
161  
-Or a set of columns from every row:
162  
-
163  
-    for ((key, col) <- people.columnsIteratee(100, Set("name", "motto")) {
164  
-      println(" Found column %s in row %s", col, key)
165  
-    }
  165
+This gets 100 rows at a time and calls the above partial function on each one.
166 166
 
167  
-The 'ColumnIteratee' object returned by these methods implements Iterable for
168  
-use in loops like those shown, but it also allows for async iteration. An
169  
-Iteratee contains a batch of values, and has a hasNext() method indicating
170  
-whether more batches are available. If more batches are available, continue()
171  
-will request the next batch and return a Future[Iteratee].
172 167
 
173 168
 Writing Data To Cassandra
174 169
 -------------------------
@@ -183,21 +178,14 @@ You can insert a value with a specific timestamp:
183 178
     people.insert("darlingnikles", Column("name", "Niki").timestamp(200L))
184 179
     people.insert("darlingnikles", Column("motto", "Told ya.").timestamp(201L))
185 180
 
186  
-Or even insert column names and values of a different type than those of the
187  
-`ColumnFamily`:
188  
-
189  
-    people.insert("biscuitfoof", Column[AsciiString, AsciiString]("name", "Biscuit"))
190  
-    people.insert("biscuitfoof", Column[AsciiString, AsciiString]("motto", "Mlalm."))
191  
-
192 181
 Batch operations are also possible:
193 182
 
194 183
     people.batch() { cf =>
195 184
       cf.insert("puddle", Column("name", "Puddle"))
196 185
       cf.insert("puddle", Column("motto", "Food!"))
197  
-    }
  186
+    }.execute()
198 187
 
199  
-(See `BatchMutationBuilder` for a better idea of which operations are
200  
-available.)
  188
+(See `BatchMutationBuilder` for a better idea of which operations are available.)
201 189
 
202 190
 
203 191
 Deleting Data From Cassandra
@@ -222,73 +210,49 @@ Or even a row:
222 210
 Generating Unique IDs
223 211
 ---------------------
224 212
 
225  
-If you're going to be storing data in Cassandra and don't have a naturally
226  
-unique piece of data to use as a key, you've probably looked into UUIDs. The
227  
-only problem with UUIDs is that they're mental, requiring access to MAC
228  
-addresses or Gregorian calendars or POSIX ids. In general, people want UUIDs
  213
+If you're going to be storing data in Cassandra and don't have a naturally unique piece of data to
  214
+use as a key, you've probably looked into UUIDs. The only problem with UUIDs is that they're mental,
  215
+requiring access to MAC addresses or Gregorian calendars or POSIX ids. In general, people want UUIDs
229 216
 which are:
230 217
 
231 218
 * Unique across a large set of workers without requiring coordination.
232 219
 * Partially ordered by time.
233 220
 
234  
-Cassie's `LexicalUUID`s meet these criteria. They're 128 bits long. The most
235  
-significant 64 bits are a timestamp value (from one of Cassie's
236  
-strictly-increasing `Clock` implementations -- `NanosecondEpochClock` is
237  
-recommended). The least significant 64 bits are a worker ID, with the default
238  
-value being a hash of the machine's hostname.
  221
+Cassie's `LexicalUUID`s meet these criteria. They're 128 bits long. The most significant 64 bits are
  222
+a timestamp value (from Cassie's strictly-increasing `Clock` implementation). The least significant
  223
+64 bits are a worker ID, with the default value being a hash of the machine's hostname.
239 224
 
240  
-When sorted using Cassandra's `LexicalUUIDType`, `LexicalUUID`s will be
241  
-partially ordered by time -- that is, UUIDs generated in order on a single
242  
-process will be totally ordered by time; UUIDs generated simultaneously (i.e.,
243  
-within the same clock tick, given clock skew) will not have a deterministic
244  
-order; UUIDs generated in order between single processes (i.e., in different
245  
-clock ticks, given clock skew) will be totally ordered by time.
  225
+When sorted using Cassandra's `LexicalUUIDType`, `LexicalUUID`s will be partially ordered by time --
  226
+that is, UUIDs generated in order on a single process will be totally ordered by time; UUIDs
  227
+generated simultaneously (i.e., within the same clock tick, given clock skew) will not have a
  228
+deterministic order; UUIDs generated in order between single processes (i.e., in different clock
  229
+ticks, given clock skew) will be totally ordered by time.
246 230
 
247  
-See *Lamport. Time, clocks, and the ordering of events in a distributed system.
248  
-Communications of the ACM (1978) vol. 21 (7) pp. 565* and *Mattern. Virtual time
249  
-and global states of distributed systems. Parallel and Distributed Algorithms
250  
-(1989) pp. 215–226* for a more thorough discussion.
  231
+See *Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of
  232
+the ACM (1978) vol. 21 (7) pp. 565* and *Mattern. Virtual time and global states of distributed
  233
+systems. Parallel and Distributed Algorithms (1989) pp. 215–226* for a more thorough discussion.
251 234
 
252  
-`LexicalUUID`s can be used as column names, in which case they're stored as
253  
-16-byte values and are sortable by `LexicalUUIDType`, or as keys, in which case
254  
-they're stored as traditional, hex-encoded strings. Cassie provides implicit
255  
-conversions between `LexicalUUID` and `String`:
256 235
 
257  
-    val uuid = LexicalUUID(people.clock)
258  
-
259  
-    people.insert(uuid, Column("one", "two")) // converted to hex automatically
260  
-
261  
-    people.insert("key", Column(uuid, "what")) // converted to a byte array
262  
-
263  
-
264  
-TODO counter column families
265 236
 
266 237
 Things What Ain't Done Yet
267 238
 ==========================
268 239
 
269  
-* Anything relating to super columns
270  
-* Range queries
271 240
 * Authentication
272  
-* Counting
273 241
 * Meta data (e.g., `describe_*`)
274 242
 
275  
-Why? I don't need it yet.
276  
-
277  
-
278 243
 Thanks
279 244
 ======
280 245
 
281  
-Many thanks to:
  246
+Many thanks to (pre twitter fork):
282 247
 
283 248
 * Cliff Moon
284 249
 * James Golick
285 250
 * Robert J. Macomber
286 251
 
287  
-
288 252
 License
289 253
 -------
290 254
 
291 255
 Copyright (c) 2010 Coda Hale
292  
-Copyright (c) 2011 Twitter, Inc.
  256
+Copyright (c) 2011-2012 Twitter, Inc.
293 257
 
294  
-Published under The MIT License, see LICENSE
  258
+Published under The Apache 2.0 License, see LICENSE.

0 notes on commit 8368b2a

Please sign in to comment.
Something went wrong with that request. Please try again.