Skip to content
This repository
Browse code

Scrap the old cassandra.yaml file in favor of a management command to…

… create the keyspace.
  • Loading branch information...
commit f0cfec3390b96487dbc2254dcb41da1ae3a01267 1 parent d9c7c77
Eric Florenzano ericflo authored
23 README.md
Source Rendered
@@ -44,22 +44,10 @@ Then we need to create our database directories on disk:
44 44 sudo mkdir -p /var/lib/cassandra
45 45 sudo chown -R `whoami` /var/lib/cassandra
46 46
47   -Now we copy the Cassandra configuration from the Twissandra source tree, and
48   -put it in its proper place in the Cassandra directory structure:
49   -
50   - cp ../twissandra/cassandra.yaml conf/
51   -
52 47 Finally we can start Cassandra:
53 48
54 49 ./bin/cassandra -f
55 50
56   -This will run the Cassandra database (configured for Twissandra) in the
57   -foreground, so to continue, we'll need to open a new terminal.
58   -
59   -Finally we need to load the Twissandra schema into the database:
60   -
61   - ./bin/schematool 127.0.0.1 8080 import
62   -
63 51 ### Install Thrift
64 52
65 53 Follow the instructions [provided on the Thrift website itself](http://wiki.apache.org/thrift/ThriftInstallation)
@@ -88,11 +76,18 @@ Now let's install all of the dependencies:
88 76 Now that we've got all of our dependencies installed, we're ready to start up
89 77 the server.
90 78
91   -### Start up the webserver
  79 +### Create the schema
92 80
93   -Make sure you're in the Twissandra checkout, and then start up the server:
  81 +Make sure you're in the Twissandra checkout, and then run the sync_cassandra
  82 +command to create the proper keyspace in Cassandra:
94 83
95 84 cd twissandra
  85 + python manage.py sync_cassandra
  86 +
  87 +### Start up the webserver
  88 +
  89 +This is the fun part! We're done setting everything up, we just need to run it:
  90 +
96 91 python manage.py runserver
97 92
98 93 Now go to http://127.0.0.1:8000/ and you can play with Twissandra!
283 cassandra.yaml
... ... @@ -1,283 +0,0 @@
1   -# Cassandra storage config YAML
2   -# See http://wiki.apache.org/cassandra/StorageConfiguration for
3   -# explanations of configuration directives.
4   -
5   -# name of the cluster
6   -cluster_name: 'Test Cluster'
7   -
8   -# Set to true to make new [non-seed] nodes automatically migrate data
9   -# to themselves from the pre-existing nodes in the cluster. Defaults
10   -# to false because you can only bootstrap N machines at a time from
11   -# an existing cluster of N, so if you are bringing up a cluster of
12   -# 10 machines with 3 seeds you would have to do it in stages. Leaving
13   -# this off for the initial start simplifies that.
14   -auto_bootstrap: false
15   -
16   -# See http://wiki.apache.org/cassandra/HintedHandoff
17   -hinted_handoff_enabled: true
18   -
19   -# authentication backend, implementing IAuthenticator; used to identify users
20   -authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
21   -
22   -# authorization backend, implementing IAuthority; used to limit access/provide permissions
23   -authority: org.apache.cassandra.auth.AllowAllAuthority
24   -
25   -# any IPartitioner may be used, including your own as long as it is on
26   -# the classpath. Out of the box, Cassandra provides
27   -# org.apache.cassandra.dht.RandomPartitioner
28   -# org.apache.cassandra.dht.OrderPreservingPartitioner, and
29   -# org.apache.cassandra.dht.CollatingOrderPreservingPartitioner.
30   -partitioner: org.apache.cassandra.dht.RandomPartitioner
31   -
32   -# directories where Cassandra should store data on disk.
33   -data_file_directories:
34   - - /var/lib/cassandra/data
35   -
36   -# Addresses of hosts that are deemed contact points.
37   -# Cassandra nodes use this list of hosts to find each other and learn
38   -# the topology of the ring. You must change this if you are running
39   -# multiple nodes!
40   -seeds:
41   - - 127.0.0.1
42   -
43   -# Access mode. mmapped i/o is substantially faster, but only practical on
44   -# a 64bit machine (which notably does not include EC2 "small" instances)
45   -# or relatively small datasets. "auto", the safe choice, will enable
46   -# mmapping on a 64bit JVM. Other values are "mmap", "mmap_index_only"
47   -# (which may allow you to get part of the benefits of mmap on a 32bit
48   -# machine by mmapping only index files) and "standard".
49   -# (The buffer size settings that follow only apply to standard,
50   -# non-mmapped i/o.)
51   -disk_access_mode: auto
52   -
53   -# Unlike most systems, in Cassandra writes are faster than reads, so
54   -# you can afford more of those in parallel. A good rule of thumb is 2
55   -# concurrent reads per processor core. Increase ConcurrentWrites to
56   -# the number of clients writing at once if you enable CommitLogSync +
57   -# CommitLogSyncDelay. -->
58   -concurrent_reads: 8
59   -concurrent_writes: 32
60   -
61   -# This sets the amount of memtable flush writer threads. These will
62   -# be blocked by disk io, and each one will hold a memtable in memory
63   -# while blocked. If you have a large heap and many data directories,
64   -# you can increase this value for better flush performance.
65   -# By default this will be set to the amount of data directories defined.
66   -#memtable_flush_writers: 1
67   -
68   -# Buffer size to use when performing contiguous column slices.
69   -# Increase this to the size of the column slices you typically perform
70   -sliced_buffer_size_in_kb: 64
71   -
72   -# TCP port, for commands and data
73   -storage_port: 7000
74   -
75   -# Address to bind to and tell other nodes to connect to. You _must_
76   -# change this if you want multiple nodes to be able to communicate!
77   -listen_address: localhost
78   -
79   -# The address to bind the Thrift RPC service to
80   -rpc_address: localhost
81   -# port for Thrift to listen on
82   -rpc_port: 9160
83   -
84   -# Frame size for thrift (maximum field length).
85   -# 0 disables TFramedTransport in favor of TSocket. This option
86   -# is deprecated; we strongly recommend using Framed mode.
87   -thrift_framed_transport_size_in_mb: 15
88   -
89   -# The max length of a thrift message, including all fields and
90   -# internal thrift overhead.
91   -thrift_max_message_length_in_mb: 16
92   -
93   -snapshot_before_compaction: false
94   -
95   -# change this to increase the compaction thread's priority. In java, 1 is the
96   -# lowest priority and that is our default.
97   -# compaction_thread_priority: 1
98   -
99   -# The threshold size in megabytes the binary memtable must grow to,
100   -# before it's submitted for flushing to disk.
101   -binary_memtable_throughput_in_mb: 256
102   -# The maximum time to leave a dirty memtable unflushed.
103   -# (While any affected columnfamilies have unflushed data from a
104   -# commit log segment, that segment cannot be deleted.)
105   -# This needs to be large enough that it won't cause a flush storm
106   -# of all your memtables flushing at once because none has hit
107   -# the size or count thresholds yet.
108   -memtable_flush_after_mins: 60
109   -# Size of the memtable in memory before it is flushed
110   -memtable_throughput_in_mb: 64
111   -# Number of objects in millions in the memtable before it is flushed
112   -memtable_operations_in_millions: 0.3
113   -
114   -column_index_size_in_kb: 64
115   -
116   -in_memory_compaction_limit_in_mb: 64
117   -
118   -# commit log
119   -commitlog_directory: /var/lib/cassandra/commitlog
120   -
121   -# Size to allow commitlog to grow to before creating a new segment
122   -commitlog_rotation_threshold_in_mb: 128
123   -
124   -# commitlog_sync may be either "periodic" or "batch."
125   -# When in batch mode, Cassandra won't ack writes until the commit log
126   -# has been fsynced to disk. It will wait up to
127   -# CommitLogSyncBatchWindowInMS milliseconds for other writes, before
128   -# performing the sync.
129   -commitlog_sync: periodic
130   -
131   -# the other option is "timed," where writes may be acked immediately
132   -# and the CommitLog is simply synced every commitlog_sync_period_in_ms
133   -# milliseconds.
134   -commitlog_sync_period_in_ms: 10000
135   -
136   -# Time to wait for a reply from other nodes before failing the command
137   -rpc_timeout_in_ms: 10000
138   -
139   -# phi value that must be reached for a host to be marked down.
140   -# most users should never need to adjust this.
141   -# phi_convict_threshold: 8
142   -
143   -# endpoint_snitch -- Set this to a class that implements
144   -# IEndpointSnitch, which will let Cassandra know enough
145   -# about your network topology to route requests efficiently.
146   -# Out of the box, Cassandra provides
147   -# org.apache.cassandra.locator.SimpleSnitch,
148   -# org.apache.cassandra.locator.RackInferringSnitch, and
149   -# org.apache.cassandra.locator.PropertyFileSnitch.
150   -endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
151   -
152   -# dynamic_snitch -- This boolean controls whether the above snitch is
153   -# wrapped with a dynamic snitch, which will monitor read latencies
154   -# and avoid reading from hosts that have slowed (due to compaction,
155   -# for instance)
156   -dynamic_snitch: true
157   -
158   -# request_scheduler -- Set this to a class that implements
159   -# RequestScheduler, which will schedule incoming client requests
160   -# according to the specific policy. This is useful for multi-tenancy
161   -# with a single Cassandra cluster.
162   -# NOTE: This is specifically for requests from the client and does
163   -# not affect inter node communication.
164   -# org.apache.cassandra.scheduler.NoScheduler - No scheduling takes place
165   -# org.apache.cassandra.scheduler.RoundRobinScheduler - Round robin of
166   -# client requests to a node with a sepearte queue for each
167   -# reques_scheduler_id. The requests are throttled based on the limit set
168   -# in throttle_limit in the requeset_scheduler_options
169   -request_scheduler: org.apache.cassandra.scheduler.NoScheduler
170   -
171   -# Scheduler Options vary based on the type of scheduler
172   -# NoScheduler - Has no options
173   -# RoundRobin
174   -# - throttle_limit -- The throttle_limit is the number of in-flight
175   -# requests per client. Requests beyond
176   -# that limit are queued up until
177   -# running requests can complete.
178   -# The value of 80 here is twice the number of
179   -# concurrent_reads + concurrent_writes.
180   -# request_scheduler_options:
181   -# throttle_limit: 80
182   -
183   -# request_scheduler_id -- An identifer based on which to perform
184   -# the request scheduling. The current supported option is "keyspace"
185   -request_scheduler_id: keyspace
186   -
187   -# A ColumnFamily is the Cassandra concept closest to a relational table.
188   -#
189   -# Keyspaces are separate groups of ColumnFamilies. Except in very
190   -# unusual circumstances you will have one Keyspace per application.
191   -#
192   -# Keyspace required parameters:
193   -# - name: name of the keyspace; "system" and "definitions" are
194   -# reserved for Cassandra Internals.
195   -# - replica_placement_strategy: the class that determines how replicas
196   -# are distributed among nodes. Contains both the class as well as
197   -# configuration information. Must extend AbstractReplicationStrategy.
198   -# Out of the box, Cassandra provides
199   -# * org.apache.cassandra.locator.SimpleStrategy
200   -# * org.apache.cassandra.locator.NetworkTopologyStrategy
201   -# * org.apache.cassandra.locator.OldNetworkTopologyStrategy
202   -#
203   -# SimpleStrategy merely places the first
204   -# replica at the node whose token is closest to the key (as determined
205   -# by the Partitioner), and additional replicas on subsequent nodes
206   -# along the ring in increasing Token order.
207   -#
208   -# With NetworkTopologyStrategy,
209   -# for each datacenter, you can specify how many replicas you want
210   -# on a per-keyspace basis. Replicas are placed on different racks
211   -# within each DC, if possible. This strategy also requires rack aware
212   -# snitch, such as RackInferringSnitch or PropertyFileSnitch.
213   -# An example:
214   -# - name: Keyspace1
215   -# replica_placement_strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
216   -# strategy_options:
217   -# DC1 : 3
218   -# DC2 : 2
219   -# DC3 : 1
220   -#
221   -# OldNetworkToplogyStrategy [formerly RackAwareStrategy]
222   -# places one replica in each of two datacenters, and the third on a
223   -# different rack in in the first. Additional datacenters are not
224   -# guaranteed to get a replica. Additional replicas after three are placed
225   -# in ring order after the third without regard to rack or datacenter.
226   -#
227   -# - replication_factor: Number of replicas of each row
228   -# - column_families: column families associated with this keyspace
229   -#
230   -# ColumnFamily required parameters:
231   -# - name: name of the ColumnFamily. Must not contain the character "-".
232   -# - compare_with: tells Cassandra how to sort the columns for slicing
233   -# operations. The default is BytesType, which is a straightforward
234   -# lexical comparison of the bytes in each column. Other options are
235   -# AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType, LongType,
236   -# and IntegerType (a generic variable-length integer type).
237   -# You can also specify the fully-qualified class name to a class of
238   -# your choice extending org.apache.cassandra.db.marshal.AbstractType.
239   -#
240   -# ColumnFamily optional parameters:
241   -# - keys_cached: specifies the number of keys per sstable whose
242   -# locations we keep in memory in "mostly LRU" order. (JUST the key
243   -# locations, NOT any column values.) Specify a fraction (value less
244   -# than 1) or an absolute number of keys to cache. Defaults to 200000
245   -# keys.
246   -# - rows_cached: specifies the number of rows whose entire contents we
247   -# cache in memory. Do not use this on ColumnFamilies with large rows,
248   -# or ColumnFamilies with high write:read ratios. Specify a fraction
249   -# (value less than 1) or an absolute number of rows to cache.
250   -# Defaults to 0. (i.e. row caching is off by default)
251   -# - comment: used to attach additional human-readable information about
252   -# the column family to its definition.
253   -# - read_repair_chance: specifies the probability with which read
254   -# repairs should be invoked on non-quorum reads. must be between 0
255   -# and 1. defaults to 1.0 (always read repair).
256   -# - preload_row_cache: If true, will populate row cache on startup.
257   -# Defaults to false.
258   -# - gc_grace_seconds: specifies the time to wait before garbage
259   -# collecting tombstones (deletion markers). defaults to 864000 (10
260   -# days). See http://wiki.apache.org/cassandra/DistributedDeletes
261   -#
262   -# NOTE: this keyspace definition is for demonstration purposes only.
263   -# Cassandra will not load these definitions during startup. See
264   -# http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation.
265   -keyspaces:
266   - - name: Twissandra
267   - replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy
268   - replication_factor: 1
269   - column_families:
270   - - name: User
271   - compare_with: UTF8Type
272   - - name: Username
273   - compare_with: BytesType
274   - - name: Friends
275   - compare_with: BytesType
276   - - name: Followers
277   - compare_with: BytesType
278   - - name: Tweet
279   - compare_with: UTF8Type
280   - - name: Timeline
281   - compare_with: LongType
282   - - name: Userline
283   - compare_with: LongType
2  settings.py
@@ -86,4 +86,6 @@
86 86
87 87 INSTALLED_APPS = (
88 88 'django.contrib.sessions',
  89 + 'tweets',
  90 + 'users',
89 91 )
1  tweets/management/__init__.py
... ... @@ -0,0 +1 @@
  1 +
1  tweets/management/commands/__init__.py
... ... @@ -0,0 +1 @@
  1 +
47 tweets/management/commands/sync_cassandra.py
... ... @@ -0,0 +1,47 @@
  1 +import pycassa
  2 +from pycassa.cassandra.ttypes import KsDef, CfDef
  3 +
  4 +from django.core.management.base import NoArgsCommand
  5 +
  6 +class Command(NoArgsCommand):
  7 +
  8 + def handle_noargs(self, **options):
  9 + # First we define all our column families
  10 + column_families = [
  11 + CfDef('Twissandra', 'User', comparator_type='UTF8Type'),
  12 + CfDef('Twissandra', 'Username', comparator_type='BytesType'),
  13 + CfDef('Twissandra', 'Friends', comparator_type='BytesType'),
  14 + CfDef('Twissandra', 'Followers', comparator_type='BytesType'),
  15 + CfDef('Twissandra', 'Tweet', comparator_type='UTF8Type'),
  16 + CfDef('Twissandra', 'Timeline', comparator_type='LongType'),
  17 + CfDef('Twissandra', 'Userline', comparator_type='LongType'),
  18 + ]
  19 + # Now we define our keyspace (with column families inside)
  20 + keyspace = KsDef(
  21 + 'Twissandra', # Keyspace Name
  22 + 'org.apache.cassandra.locator.SimpleStrategy', # Placement Strat.
  23 + {}, # Options for the Placement Strat.
  24 + 1, # Replication factor
  25 + column_families,
  26 + )
  27 +
  28 + client = pycassa.connect('system')
  29 +
  30 + # If there is already a Twissandra keyspace, we have to ask the user
  31 + # what they want to do with it.
  32 + try:
  33 + client.describe_keyspace('Twissandra')
  34 + # If there were a keyspace, it would have raised an exception.
  35 + msg = 'Looks like you already have a Twissandra keyspace.\nDo you '
  36 + msg += 'want to delete it and recreate it? All current data will '
  37 + msg += 'be deleted! (y/n): '
  38 + resp = raw_input(msg)
  39 + if not resp or resp[0] != 'y':
  40 + print "Ok, then we're done here."
  41 + return
  42 + client.system_drop_keyspace('Twissandra')
  43 + except pycassa.NotFoundException:
  44 + pass
  45 +
  46 + client.system_add_keyspace(keyspace)
  47 + print 'All done!'

0 comments on commit f0cfec3

Please sign in to comment.
Something went wrong with that request. Please try again.