Skip to content


Subversion checkout URL

You can clone with
Download ZIP
100644 117 lines (89 sloc) 6.34 KB
fbd0f95 @jkreps Initial import
jkreps authored
1 Why it is time for a new database
3 Management of shared mutable state is the crux of scalability. An application without shared state can be scaled trivially
4 by buying a load balancer and more servers.
6 [why we need to share state]
8 Shared state is what makes the application interesting. Modern web applications are getting more shared state--all
9 the personalization and user data that is associated with web 2.0 is a whole lot of state.
11 Relational databases are the gold standard of state management and have been for the last 30 years. The relational
12 data model is one of the better contributions of computer science. Relational databases are here to stay, but they are not
13 appropriate for every storage situation (we still have files and hashmaps, after all). In particular relational databases,
14 and in particular the current SQL products available, present problems for large scale software.
16 But first lets review what is good about the relational model:
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
17 1. It separates data from look-up strategy, queries are declarative
18 2. It is incredibly flexible--most desirable data operations can be expressed in relations
fbd0f95 @jkreps Initial import
jkreps authored
20 The problems with current RDBMSes are the following:
21 1. Building a website on a centralized database combines the two worst enemies, disk io and network io, for every
22 server in the system, and centralizes them onto a single machine. This is clearly not feasible.
23 2. The relational model does not parallelize well, due to the difficulties of sharing state. Most databases favor
24 consistency over performance (2PC). You can't have both.
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
25 3. SQL sucks for application development. Most web apps are programs that construct little sql statement mini-programs
fbd0f95 @jkreps Initial import
jkreps authored
26 by concatenating together strings. This completely ungainly. SQL is vender specific so you are tied to the vendors
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
27 database. This means development suffers immensely. Unit testing any piece of code that involves database access is quite
28 problematic because the unit test will be slow and will depend on a server running on another machine which can't be
29 bootstrapped as part of the test code. This could easily be solved if vendors provided a light-weight drop in
fbd0f95 @jkreps Initial import
jkreps authored
30 replacement that could be used for testing. Approximately 10000 products attempt to work around this problem
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
31 by automatically mapping your application to sql. This has been called the Vietnam of Computer Science (referring to the
fbd0f95 @jkreps Initial import
jkreps authored
32 American war, not the country...and of course computer scientists are not particularly interested in this problem as
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
33 it is too practical).
fbd0f95 @jkreps Initial import
jkreps authored
35 To begin to understand the problems with relational databases, let's first review some basic facts
37 For context let's review some basic facts about a modern computer:
38 roundtrip throughput
39 Java HashMap
40 BDB Btree
41 MySQL local
42 MySQL remote
44 1000 iterations of a trivial loop
45 allocating and initializing 1000 10 byte objects
46 1000 200 byte local http requests
47 1000 200 byte remote http requests (100 mbit ethernet)
49 1. Disk IO is stagnant
50 2. Moore's law is going gangbusters for parallel applications (e.g. any application that has shared state)
52 So from this you can get some feel for the cost of various operations. It is clear that the disk operations are
53 our greatest enemy followed by the network. We can iterate over XXX items of an array in the time required to
54 look-up a single item via a log(n) b-tree index. We can iterate over YYY items in the time required to complete
55 a mysql request.
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
57 The relational model is great, when accessed programmatically, but it is best for little applications (of which there are many).
fbd0f95 @jkreps Initial import
jkreps authored
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
59 In a high scalability scenario with a shared db, the advantages of the relational model disappear. No longer do you have
fbd0f95 @jkreps Initial import
jkreps authored
60 flexibility, what you have is a system in which most operations will bring the database to its knees along with
61 everyone depending on it. Think about the average table, of the set of possible queries, only a few can be issued without
62 difficulty. With this being the case, the abstraction of the lookup structure becomes a major problem--it is impossible to
63 tell by looking at code whether it will run quickly or bring down the system, that is totally dependent on data sizes and
64 indexing structure.
66 There are two basic strategies for scaling up databases
67 1. Partitioning and replication
68 2. Caching
70 Caching is the most popular because it is the simplest to implement. Basically you replace all your fancy queries with
71 simple puts and gets, and each operation goes first to the cache and then to the db. The cache may not actually be much
72 faster than the db, but it can be distributed whereas the db cannot.
74 There are two basic caching strategies
23b7001 correct some spelling errors (docs/purpose.txt)
Barak A. Pearlmutter authored
75 1. Local read/write with background replication
fbd0f95 @jkreps Initial import
jkreps authored
76 2. Remote access
78 In the first strategy we try to keep each object on each server. We read and write to the local cache, and in the background
79 the cache replicates our changes out to the other servers. This is the ehcache strategy.
81 The problem with this is that we must store the whole cache local which means either our entire dataset must fit in memory
82 or we have to use the local disk. Of course the local disk may well be slower than the network, so our cache may become slower
83 then the database itself (though, importantly, it is distributed). In addition each write now must be written to the db as well
84 as every application server.
86 How it works?
88 Most storage systems look up your data in a table. This is difficult in a distributed system because the lookup table is large and
89 always changing. Keeping this in sync and uncorrupted is a scalability problem--metadata updates are a big bottleneck.
91 A simpler approach is much faster--use the key to calculate the location. This means coming up with a function
92 f(key)->(n1,n2,n3)
93 where n1, n2, n3 are the nodes containing the data.
95 In this model only the key and the cluster topology (which servers are where) are needed to locate and retrieve data.
98 - Relational data model is excellent
99 - SQL is a pain
100 - relational very hard to distribute
101 - SQL breaks testability (without great pain)
102 - Don't care about POSIX filesystem type stuff
104 Needs For A High-Scalability Distributed Storage System
105 - Commodity hardware -- cheap hardware, commodity network, shared nothing
106 - Incremental scalability to hundreds of nodes
107 - High throughput
108 - Low latency
109 - No single point of failure
110 - Easy enough for OPS team or software engineer to manage (No DBA)
112 References
113 - Google BigTable
114 - Amazon Dynamo
115 - One-size fits all
Something went wrong with that request. Please try again.