Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 333 lines (260 sloc) 17.839 kB
3d37e22 @pietern Import FAQ
pietern authored
1 # FAQ
2
3 ## Why do I need Redis instead of memcachedb, Tokyo Cabinet, ...?
4
1f10c77 @cyx A couple of grammer fixes and spelling tweaks.
cyx authored
5 Memcachedb is basically memcached made persistent. Redis is a different
3d37e22 @pietern Import FAQ
pietern authored
6 evolution path in the key-value DBs, the idea is that the main advantages of
1f10c77 @cyx A couple of grammer fixes and spelling tweaks.
cyx authored
7 key-value DBs are retained even without severe loss of comfort of plain
3d37e22 @pietern Import FAQ
pietern authored
8 key-value DBs. So Redis offers more features:
9
10 * Keys can store different data types, not just strings. Notably Lists and
11 Sets. For example if you want to use Redis as a log storage system for
12 different computers every computer can just `RPUSH data to the computer_ID
13 key`. Don't want to save more than 1000 log lines per computer? Just issue a
14 `LTRIM computer_ID 0 999` command to trim the list after every push.
15 * Another example is about Sets. Imagine to build a social news site like
1f10c77 @cyx A couple of grammer fixes and spelling tweaks.
cyx authored
16 [Reddit][reddit]. Every time a user upvotes a given news you can just add to
3d37e22 @pietern Import FAQ
pietern authored
17 the news_ID_upmods key holding a value of type SET the id of the user that
18 did the upmodding. Sets can also be used to index things. Every key can be a
19 tag holding a SET with the IDs of all the objects associated to this tag.
20 Using Redis set intersection you obtain the list of IDs having all this tags
21 at the same time.
22 * We wrote a [simple Twitter Clone][retwis] using just Redis as database.
23 Download the source code from the download section and imagine to write it
24 with a plain key-value DB without support for lists and sets... it's *much*
25 harder.
26 * Multiple DBs. Using the SELECT command the client can select different
27 datasets. This is useful because Redis provides a MOVE atomic primitive that
28 moves a key form a DB to another one, if the target DB already contains such
29 a key it returns an error: this basically means a way to perform locking in
30 distributed processing.
31 * *So what is Redis really about?* The User interface with the programmer.
32 Redis aims to export to the programmer the right tools to model a wide range
33 of problems. *Sets, Lists with O(1) push operation, lrange and ltrim,
34 server-side fast intersection between sets, are primitives that allow to
35 model complex problems with a key value database*.
36
37 [reddit]: http://reddit.com
38 [retwis]: http://retwis.antirez.com
39
40 ## Isn't this key-value thing just hype?
41
42 I imagine key-value DBs, in the short term future, to be used like you use
43 memory in a program, with lists, hashes, and so on. With Redis it's like this,
44 but this special kind of memory containing your data structures is shared,
45 atomic, persistent.
46
47 When we write code it is obvious, when we take data in memory, to use the most
48 sensible data structure for the work, right? Incredibly when data is put inside
49 a relational DB this is no longer true, and we create an absurd data model even
50 if our need is to put data and get this data back in the same order we put it
51 inside (an ORDER BY is required when the data should be already sorted.
1f10c77 @cyx A couple of grammer fixes and spelling tweaks.
cyx authored
52 Strange, don't you think?).
3d37e22 @pietern Import FAQ
pietern authored
53
54 Key-value DBs bring this back at home, to create sensible data models and use
55 the right data structures for the problem we are trying to solve.
56
57 ## Can I backup a Redis DB while the server is working?
58
59 Yes you can. When Redis saves the DB it actually creates a temp file, then
60 rename(2) that temp file name to the destination file name. So even while the
d94afe3 @djanowski Typos.
djanowski authored
61 server is working it is safe to save the database file just with the _cp_ UNIX
3d37e22 @pietern Import FAQ
pietern authored
62 command. Note that you can use master-slave replication in order to have
63 redundancy of data, but if all you need is backups, cp or scp will do the work
64 pretty well.
65
66 ## What's the Redis memory footprint?
67
68 Worst case scenario: 1 Million keys with the key being the natural numbers from
d94afe3 @djanowski Typos.
djanowski authored
69 0 to 999999 and the string "Hello World" as value use 100MB on my Intel MacBook
3d37e22 @pietern Import FAQ
pietern authored
70 (32bit). Note that the same data stored linearly in an unique string takes
71 something like 16MB, this is the norm because with small keys and values there
72 is a lot of overhead. Memcached will perform similarly.
73
74 With large keys/values the ratio is much better of course.
75
76 64 bit systems will use much more memory than 32 bit systems to store the same
77 keys, especially if the keys and values are small, this is because pointers
78 takes 8 bytes in 64 bit systems. But of course the advantage is that you can
79 have a lot of memory in 64 bit systems, so to run large Redis servers a 64 bit
80 system is more or less required.
81
1f10c77 @cyx A couple of grammer fixes and spelling tweaks.
cyx authored
82 ## I like Redis high level operations and features, but I don't like that it takes everything in memory and I can't have a dataset larger the memory. Plans to change this?
3d37e22 @pietern Import FAQ
pietern authored
83
84 Short answer: If you are using a Redis client that supports consistent hashing
85 you can distribute the dataset across different nodes. For instance the Ruby
86 clients supports this feature. There are plans to develop redis-cluster that
87 basically is a dummy Redis server that is only used in order to distribute the
88 requests among N different nodes using consistent hashing.
89
90 ## Why Redis takes the whole dataset in RAM?
91
92 Redis takes the whole dataset in memory and writes asynchronously on disk in
93 order to be very fast, you have the best of both worlds: hyper-speed and
94 persistence of data, but the price to pay is exactly this, that the dataset
95 must fit on your computers RAM.
96
97 If the data is larger then memory, and this data is stored on disk, what
98 happens is that the bottleneck of the disk I/O speed will start to ruin the
99 performances. Maybe not in benchmarks, but once you have real load from
100 multiple clients with distributed key accesses the data must come from disk,
101 and the disk is damn slow. Not only, but Redis supports higher level data
102 structures than the plain values. To implement this things on disk is even
103 slower.
104
105 Redis will always continue to hold the whole dataset in memory because this
106 days scalability requires to use RAM as storage media, and RAM is getting
107 cheaper and cheaper. Today it is common for an entry level server to have 16 GB
108 of RAM! And in the 64-bit era there are no longer limits to the amount of RAM
109 you can have in theory.
110
111 Amazon EC2 now provides instances with 32 or 64 GB of RAM.
112
113 ## If my dataset is too big for RAM and I don't want to use consistent hashing or other ways to distribute the dataset across different nodes, what I can do to use Redis anyway?
114
115 You may try to load a dataset larger than your memory in Redis and see what
116 happens, basically if you are using a modern Operating System, and you have a
117 lot of data in the DB that is rarely accessed, the OS's virtual memory
118 implementation will try to swap rarely used pages of memory on the disk, to
119 only recall this pages when they are needed. If you have many large values
120 rarely used this will work. If your DB is big because you have tons of little
121 values accessed at random without a specific pattern this will not work (at low
122 level a page is usually 4096 bytes, and you can have different keys/values
123 stored at a single page. The OS can't swap this page on disk if there are even
124 few keys used frequently).
125
126 Another possible solution is to use both MySQL and Redis at the same time,
127 basically take the state on Redis, and all the things that get accessed very
128 frequently: user auth tokens, Redis Lists with chronologically ordered IDs of
129 the last N-comments, N-posts, and so on. Then use MySQL as a simple storage
130 engine for larger data, that is just create a table with an auto-incrementing
131 ID as primary key and a large BLOB field as data field. Access MySQL data only
132 by primary key (the ID). The application will run the high traffic queries
133 against Redis but when there is to take the big data will ask MySQL for
134 specific resources IDs.
135
136 Update: it could be interesting to test how Redis performs with datasets larger
137 than memory if the OS swap partition is in one of this very fast Intel SSD
138 disks.
139
140 ## Do you plan to implement Virtual Memory in Redis? Why don't just let the Operating System handle it for you?
141
142 Yes, in order to support datasets bigger than RAM there is the plan to
143 implement transparent Virtual Memory in Redis, that is, the ability to transfer
144 large values associated to keys rarely used on Disk, and reload them
145 transparently in memory when this values are requested in some way.
146
147 So you may ask why don't let the operating system VM do the work for us. There
148 are two main reasons: in Redis even a large value stored at a given key, for
149 instance a 1 million elements list, is not allocated in a contiguous piece of
150 memory. It's actually *very* fragmented since Redis uses quite aggressive
151 object sharing and allocated Redis Objects structures reuse.
152
153 So you can imagine the memory layout composed of 4096 bytes pages that actually
154 contain different parts of different large values. Not only, but a lot of
155 values that are large enough for us to swap out to disk, like a 1024k value, is
156 just one quarter the size of a memory page, and likely in the same page there
157 are other values that are not rarely used. So this value wil never be swapped
158 out by the operating system. This is the first reason for implementing
159 application-level virtual memory in Redis.
160
161 There is another one, as important as the first. A complex object in memory
162 like a list or a set is something *10 times bigger* than the same object
163 serialized on disk. Probably you already noticed how Redis snapshots on disk
164 are damn smaller compared to the memory usage of Redis for the same objects.
165 This happens because when data is in memory is full of pointers, reference
166 counters and other metadata. Add to this malloc fragmentation and need to
167 return word-aligned chunks of memory and you have a clear picture of what
168 happens. So this means to have 10 times the I/O between memory and disk than
169 otherwise needed.
170
171 ## Is there something I can do to lower the Redis memory usage?
172
173 Yes, try to compile it with 32 bit target if you are using a 64 bit box.
174
175 If you are using Redis >= 1.3, try using the Hash data type, it can save a lot
176 of memory.
177
178 If you are using hashes or any other type with values bigger than 128 bytes try
179 also this to lower the RSS usage (Resident Set Size): `EXPORT
180 MMAP_THRESHOLD=4096`
181
182 ## I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!
183
d94afe3 @djanowski Typos.
djanowski authored
184 This may happen and it's perfectly okay. Redis objects are small C structures
3d37e22 @pietern Import FAQ
pietern authored
185 allocated and freed a lot of times. This costs a lot of CPU so instead of being
186 freed, released objects are taken into a free list and reused when needed. This
187 memory is taken exactly by this free objects ready to be reused.
188
189 ## What happens if Redis runs out of memory?
190
191 With modern operating systems malloc() returning NULL is not common, usually
192 the server will start swapping and Redis performances will be disastrous so
193 you'll know it's time to use more Redis servers or get more RAM.
194
195 The INFO command (work in progress in this days) will report the amount of
196 memory Redis is using so you can write scripts that monitor your Redis servers
197 checking for critical conditions.
198
199 You can also use the "maxmemory" option in the config file to put a limit to
200 the memory Redis can use. If this limit is reached Redis will start to reply
201 with an error to write commands (but will continue to accept read-only
202 commands).
203
204 ## Does Redis use more memory running in 64 bit boxes? Can I use 32 bit Redis in 64 bit systems?
205
206 Redis uses a lot more memory when compiled for 64 bit target, especially if the
207 dataset is composed of many small keys and values. Such a database will, for
208 instance, consume 50 MB of RAM when compiled for the 32 bit target, and 80 MB
209 for 64 bit! That's a big difference.
210
211 You can run 32 bit Redis binaries in a 64 bit Linux and Mac OS X system without
212 problems. For OS X just use *make 32bit*. For Linux instead, make sure you have
213 *libc6-dev-i386* installed, then use *make 32bit* if you are using the latest
214 Git version. Instead for Redis `<= 1.2.2` you have to edit the Makefile and
215 replace "-arch i386" with "-m32".
216
217 If your application is already able to perform application-level sharding, it
218 is very advisable to run N instances of Redis 32bit against a big 64 bit Redis
219 box (with more than 4GB of RAM) instead than a single 64 bit instance, as this
220 is much more memory efficient.
221
222 ## How much time it takes to load a big database at server startup?
223
224 Just an example on normal hardware: It takes about 45 seconds to restore a 2 GB
225 database on a fairly standard system, no RAID. This can give you some kind of
226 feeling about the order of magnitude of the time needed to load data when you
227 restart the server.
228
229 ## Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!
230
231 Short answer: `echo 1 > /proc/sys/vm/overcommit_memory` :)
232
233 And now the long one:
234
235 Redis background saving schema relies on the copy-on-write semantic of fork in
236 modern operating systems: Redis forks (creates a child process) that is an
237 exact copy of the parent. The child process dumps the DB on disk and finally
238 exits. In theory the child should use as much memory as the parent being a
239 copy, but actually thanks to the copy-on-write semantic implemented by most
240 modern operating systems the parent and child process will _share_ the common
241 memory pages. A page will be duplicated only when it changes in the child or in
242 the parent. Since in theory all the pages may change while the child process is
243 saving, Linux can't tell in advance how much memory the child will take, so if
244 the `overcommit_memory` setting is set to zero fork will fail unless there is
245 as much free RAM as required to really duplicate all the parent memory pages,
246 with the result that if you have a Redis dataset of 3 GB and just 2 GB of free
247 memory it will fail.
248
249 Setting `overcommit_memory` to 1 says Linux to relax and perform the fork in a
250 more optimistic allocation fashion, and this is indeed what you want for Redis.
251
252 A good source to understand how Linux Virtual Memory work and other
253 alternatives for `overcommit_memory` and `overcommit_ratio` is this classic
d94afe3 @djanowski Typos.
djanowski authored
254 from Red Hat Magazine, ["Understanding Virtual Memory"][redhatvm].
3d37e22 @pietern Import FAQ
pietern authored
255
256 [redhatvm]: http://www.redhat.com/magazine/001nov04/features/vm/
257
258 ## Are Redis on disk snapshots atomic?
259
260 Yes, redis background saving process is always fork(2)ed when the server is
261 outside of the execution of a command, so every command reported to be atomic
262 in RAM is also atomic from the point of view of the disk snapshot.
263
264 ## Redis is single threaded, how can I exploit multiple CPU / cores?
265
266 Simply start multiple instances of Redis in different ports in the same box and
4b4f7ba @slestak fix faq typo
slestak authored
267 treat them as different servers! Given that Redis is a distributed database
3d37e22 @pietern Import FAQ
pietern authored
268 anyway in order to scale you need to think in terms of multiple computational
269 units. At some point a single box may not be enough anyway.
270
271 In general key-value databases are very scalable because of the property that
272 different keys can stay on different servers independently.
273
274 In Redis there are client libraries such Redis-rb (the Ruby client) that are
275 able to handle multiple servers automatically using _consistent hashing_. We
276 are going to implement consistent hashing in all the other major client
277 libraries. If you use a different language you can implement it yourself
278 otherwise just hash the key before to SET / GET it from a given server. For
279 example imagine to have N Redis servers, server-0, server-1, ..., server-N. You
280 want to store the key "foo", what's the right server where to put "foo" in
281 order to distribute keys evenly among different servers? Just perform the _crc_
282 = CRC32("foo"), then _servernum_ = _crc_ % N (the rest of the division for N).
283 This will give a number between 0 and N-1 for every key. Connect to this server
284 and store the key. The same for gets.
285
286 This is a basic way of performing key partitioning, consistent hashing is much
287 better and this is why after Redis 1.0 will be released we'll try to implement
288 this in every widely used client library starting from Python and PHP (Ruby
289 already implements this support).
290
291 ## I'm using some form of key hashing for partitioning, but what about SORT BY?
292
293 With [SortCommand SORT] BY you need that all the _weight keys_ are in the same
294 Redis instance of the list/set you are trying to sort. In order to make this
295 possible we developed a concept called _key tags_. A key tag is a special
296 pattern inside a key that, if preset, is the only part of the key hashed in
297 order to select the server for this key. For example in order to hash the key
298 "foo" I simply perform the CRC32 checksum of the whole string, but if this key
299 has a pattern in the form of the characters {...} I only hash this substring.
300 So for example for the key "foo{bared}" the key hashing code will simply
301 perform the CRC32 of "bared". This way using key tags you can ensure that
302 related keys will be stored on the same Redis instance just using the same key
303 tag for all this keys. Redis-rb already implements key tags.
304
305 ## What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a List, Set, Ordered Set?
306
307 In theory Redis can handle up to 2^32 keys, and was tested in practice to
308 handle at least 150 million of keys per instance. We are working in order to
309 experiment with larger values.
310
311 Every list, set, and ordered set, can hold 2^32 elements.
312
313 Actually Redis internals are ready to allow up to 2^64 elements but the current
314 disk dump format don't support this, and there is a lot time to fix this issues
315 in the future as currently even with 128 GB of RAM it's impossible to reach
316 2^32 elements.
317
318 ## What Redis means actually?
319
320 Redis means two things:
321
322 * It means REmote DIctionary Server
323 * It is a joke on the word Redistribute (instead to use just a Relational DB
324 redistribute your workload among Redis servers)
325
326 ## Why did you started the Redis project?
327
328 In order to scale [LLOOGG][lloogg]. But after I got the basic server
329 working I liked the idea to share the work with other guys, and Redis was
330 turned into an open source project.
331
4b4f7ba @slestak fix faq typo
slestak authored
332 [lloogg]: http://lloogg.com
Something went wrong with that request. Please try again.