The ziplist -> hashtable conversion code is triggered every time an hash value must be promoted to a full hash table because the number or size of elements reached the threshold. If a problem in the ziplist causes the same field to be present multiple times, the assertion of successful addition of the element inside the hash table will fail, crashing server with a failed assertion, but providing little information about the problem. This code adds a new logging function to perform the hex dump of binary data, and makes sure that the ziplist -> hashtable conversion code uses this new logging facility to dump the content of the ziplist when the assertion fails. This change was originally made in order to investigate issue #547.
A new stress test was added to stress test the code converting a ziplist into an hash table. In this commit also randomValue helper function was modified to also return negative values.
wait_for_condition is now used instead of the usual "after 1000" (that is the way to sleep in Tcl). This should avoid to find the replica in a state where it is loading the RDB in memory, returning -LOADING error. This test used to fail when running the test over valgrind, due to the added latencies.
(additional commit notes by firstname.lastname@example.org): The rdbIsObjectType() macro was not updated when the new RDB object type of ziplist encoded hashes was added. As a result RESTORE, that uses rdbLoadObjectType(), failed when a ziplist encoded hash was loaded. This does not affected normal RDB loading because in that case we use the lower-level function rdbLoadType(). The commit also adds a regression test.
Improved comments to make clear that rdbLoadType() just loads a general TYPE in the context of RDB that can be an object type or an expire type, end-of-file, and so forth. While rdbLoadObjectType() enforces that the type is a valid Object Type otherwise it returns -1.
In the issue #529 an user reported a bug that can be triggered with the following code: flushdb set a "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" bitop or x a b The bug was introduced with the speed optimization in commit 8bbc076 that specializes every BITOP operation loop up to the minimum length of the input strings. However the computation of the minimum length contained an error when a non existing key was present in the input, after a key that was non zero length. This commit fixes the bug and adds a regression test for it.
Commit 33e1db3 modified the name of a few INFO fields. This commit changes the Redis test to account for this changes.
The 'persistence' section of INFO output now contains additional four fields related to RDB and AOF persistence: rdb_last_bgsave_time_sec Duration of latest BGSAVE in sec. rdb_current_bgsave_time_sec Duration of current BGSAVE in sec. aof_last_rewrite_time_sec Duration of latest AOF rewrite in sec. aof_current_rewrite_time_sec Duration of current AOF rewrite in sec. The 'current' fields are set to -1 if a BGSAVE / AOF rewrite is not in progress. The 'last' fileds are set to -1 if no previous BGSAVE / AOF rewrites were performed. Additionally a few fields in the persistence section were renamed for consistency: changes_since_last_save -> rdb_changes_since_last_save bgsave_in_progress -> rdb_bgsave_in_progress last_save_time -> rdb_last_save_time last_bgsave_status -> rdb_last_bgsave_status bgrewriteaof_in_progress -> aof_rewrite_in_progress bgrewriteaof_scheduled -> aof_rewrite_scheduled After the renaming, fields in the persistence section start with rdb_ or aof_ prefix depending on the persistence method they describe. The field 'loading' and related fields are not prefixed because they are unique for both the persistence methods.
This commit adds a fast-path to the BITOP that can be used for all the bytes from 0 to the minimal length of the string, and if there are at max 16 input keys. Often the intersected bitmaps are roughly the same size, so this optimization can provide a 10x speed boost to most real world usages of the command. Bytes are processed four full words at a time, in loops specialized for the specific BITOP sub-command, without the need to check for length issues with the inputs (since we run this algorithm only as far as there is data from all the keys at the same time). The remaining part of the string is intersected in the usual way using the slow but generic algorith. It is possible to do better than this with inputs that are not roughly the same size, sorting the input keys by length, by initializing the result string in a smarter way, and noticing that the final part of the output string composed of only data from the longest string does not need any proecessing since AND, OR and XOR against an empty string does not alter the output (zero in the first case, and the original string in the other two cases). More implementations will be implemented later likely, but this should be enough to release Redis 2.6-RC4 with bitops merged in. Note: this commit also adds better testing for BITOP NOT command, that is currently the faster and hard to optimize further since it just flips the bits of a single input string.
A bug in the implementation caused BITOP to crash the server if at least one one of the source objects was integer encoded. The new implementation takes an additional array of Redis objects pointers and calls getDecodedObject() to get a reference to a string encoded object, and then uses decrRefCount() to release the object. Tests modified to cover the regression and improve coverage.
At Redis's default optimization level the command is now much faster, always using a constant-time bit manipualtion technique to count bits instead of GCC builtin popcount, and unrolling the loop. The current implementation performance is 1.5GB/s in a MBA 11" (1.8 Ghz i7) compiled with both GCC and clang. The algorithm used is described here: http://graphics.stanford.edu/~seander/bithacks.html
All the general string operations are implemented in t_string.c, however the bit operations, while targeting the string type, are better served in a specific file where we have the implementations of the following four commands and helper functions: GETBIT SETBIT BITOP BITCOUNT In the future this file will probably contain more code related to making the BITOP and BITCOUNT operations faster.
The motivation for this new commands is to be search in the usage of Redis for real time statistics. See the article "Fast real time metrics using Redis". http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/ In general Redis strings when used as bitmaps using the SETBIT/GETBIT command provide a very space-efficient and fast way to store statistics. For instance in a web application with users, every user can be associated with a key that shows every day in which the user visited the web service. This information can be really valuable to extract user behaviour information. With Redis bitmaps doing this is very simple just saying that a given day is 0 (the data the service was put online) and all the next days are 1, 2, 3, and so forth. So with SETBIT it is possible to set the bit corresponding to the current day every time the user visits the site. It is possible to take the count of the bit sets on the run, this is extremely easy using a Lua script. However a fast bit count native operation can be useful, especially if it can operate on ranges, or when the string is small like in the case of days (even if you consider many years it is still extremely little data). For this reason BITOP was introduced. The command counts the number of bits set to 1 in a string, with optional range: BITCOUNT key [start end] The start/end parameters are similar to GETRANGE. If omitted the whole string is tested. Population counting is more useful when bit-level operations like AND, OR and XOR are avaialble. For instance I can test multiple users to see the number of days three users visited the site at the same time. To do this we can take the AND of all the bitmaps, and then count the set bits. For this reason the BITOP command was introduced: BITOP [AND|OR|XOR|NOT] dest_key src_key1 src_key2 src_key3 ... src_keyN In the special case of NOT (that inverts the bits) only one source key can be passed. The judicious use of BITCOUNT and BITOP combined can lead to interesting use cases with very space efficient representation of data. The implementation provided is still not tested and optimized for speed, next commits will introduce unit tests. Later the implementation will be profiled to see if it is possible to gain an important amount of speed without making the code much more complex.
The INFO output, persistence section, already contained the field describing the size of the current AOF buffer to flush on disk. However the other AOF buffer, used to accumulate changes during an AOF rewrite, was not mentioned in the INFO output. This commit introduces a new field called aof_rewrite_buffer_length with the length of the rewrite buffer.
During the AOF rewrite process, the parent process needs to accumulate the new writes in an in-memory buffer: when the child will terminate the AOF rewriting process this buffer (that ist the difference between the dataset when the rewrite was started, and the current dataset) is flushed to the new AOF file. We used to implement this buffer using an sds.c string, but sds.c has a 2GB limit. Sometimes the dataset can be big enough, the amount of writes so high, and the rewrite process slow enough that we overflow the 2GB limit, causing a crash, documented on github by issue #504. In order to prevent this from happening, this commit introduces a new system to accumulate writes, implemented by a linked list of blocks of 10 MB each, so that we also avoid paying the reallocation cost. Note that theoretically modern operating systems may implement realloc() simply as a remaping of the old pages, thus with very good performances, see for instance the mremap() syscall on Linux. However this is not always true, and jemalloc by default avoids doing this because there are issues with the current implementation of mremap(). For this reason we are using a linked list of blocks instead of a single block that gets reallocated again and again. The changes in this commit lacks testing, that will be performed before merging into the unstable branch. This fix will not enter 2.4 because it is too invasive. However 2.4 will log a warning when the AOF rewrite buffer is near to the 2GB limit.
The user @jokea noticed that the following line of code into replication.c made little sense: addReplySds(slave,sdsempty()); Investigating a bit I found that this was introduced by commit 6208b3a three years ago in the early stages of Redis. The code apparently is not useful at all, so I'm removing it. This change will not be backported into 2.4 so that in the rare case this should introduce a bug, we'll have a chance to detect it into the development branch. However following the code path it seems like the code is not useful at all, so the risk is truly small.
Weeks ago trying to fix an harmless GCC warning I introduced a bug in the ziplist-encoded implementations of sorted sets. The bug completely broke zuiNext() iterator, that is used in the ZINTERSTORE and ZUNIONSTORE implementation, so those two commands are no longer reliable starting from Redis version 2.4.12 and latest 2.6.0-RC releases. This commit fixes the problem and adds a regression test.
Due to a change in the format of the bug report in case of crash of failed assertion the test suite was no longer able to properly log it. Instead just a protocol error was logged by the Redis TCL client that provided no clue about the actual problem. This commit resolves the issue by logging everything from the first line of the log including the string REDIS BUG REPORT, till the end of the file.
This makes the code more readable, it is still not the case to split the file itself into three different files, but the logical separation improves the readability especially since new commits are going to introduce an additional section.
Full changelog here: http://www.canonware.com/cgi-bin/gitweb.cgi?p=jemalloc.git;a=blob_plain;f=ChangeLog;hb=master Notable improvements from the point of view of Redis: 1) Bugfixing. 2) Support for Valgrind. 3) Support for OSX Lion, FreeBSD.