Stale tuples in secondary index under certain conditions #3607

locker · 2018-08-03T11:57:47Z

Branch 1.9 commit f29466c (1.10 and 2.0 are affected as well)

yaml = require('yaml')

box.cfg{log_level = 4}

s = box.schema.space.create('test', {engine = 'vinyl'})
s:create_index('pk')
s:create_index('sk', {parts = {2, 'unsigned'}})

s:insert{1, 10}
box.snapshot()

s:update(1, {{'=', 2, 10}})
s:delete(1)
box.snapshot()

s:insert{1, 20}

print(yaml.encode(s.index.sk:select()))

os.exit(0)

The script produces the following output:

2018-08-03 14:55:33.261 [13023] main/101/test.lua C> Tarantool 1.9.1-59-gf29466cd
2018-08-03 14:55:33.262 [13023] main/101/test.lua C> log level 4
---
- [1, 20]
- [1, 20]
...

[1, 20] is printed twice, because [1, 10] wasn't removed from the secondary index.

The text was updated successfully, but these errors were encountered:

index.update() looks up the old tuple in the primary index, applies update operations to it, then writes a DELETE statement to secondary indexes to delete the old tuple and a REPLACE statement to all indexes to insert the new tuple. It also sets a column mask for both DELETE and REPLACE statements. The column mask is a bit mask which has a bit set if the corresponding field is updated by update operations. It is used by the write iterator for two purposes. First, the write iterator skips REPLACE statements that don't update key fields. Second, the write iterator turns a REPLACE that has a column mask that intersects with key fields into an INSERT (so that it can get annihilated with a DELETE when the time comes). The latter is correct, because if an update() does update secondary key fields, then it must have deleted the old tuple and hence the new tuple is unique in terms of extended key (merged primary and secondary key parts, i.e. cmp_def). The problem is that a bit may be set in a column mask even if the corresponding field does not actually get updated. For example, consider the following example. s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 10} box.snapshot() s:update(1, {{'=', 2, 10}}) The update() doesn't modify the secondary key field so it only writes REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10} too, but it gets overwritten by the REPLACE). However, the REPLACE has column mask that says that update() does modify the key field, because a column mask is generated solely from update operations, before applying them. As a result, the write iterator will not skip this REPLACE on dump. This won't have any serious consequences, because this is a mere optimization. What is worse, the write iterator will also turn the REPLACE into an INSERT, which is absolutely wrong as the REPLACE is preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE statement and the INSERT created by the write iterator from the REPLACE will get annihilated, leaving the old INSERT{1, 10} visible. The issue may result in invalid select() output as demonstrated in the issue description. It may also result in crashes, because the tuple cache is very sensible to invalid select() output. To fix this issue let's clear key bits in the column mask if we detect that an update() doesn't actually update secondary key fields although the column mask says it does. We do that in vy_tx_set(), if we see that the statement overwritten by a REPLACE is a DELETE: the point is a DELETE may be written to a secondary index only if the tuple it is supposed to delete actually exists (we can't generate a surrogate DELETE otherwise); so if a REPLACE overwrites a DELETE, there must be the same REPLACE in the index and hence we can clear key bits in the REPLACE column mask. Closes #3607

index.update() looks up the old tuple in the primary index, applies update operations to it, then writes a DELETE statement to secondary indexes to delete the old tuple and a REPLACE statement to all indexes to insert the new tuple. It also sets a column mask for both DELETE and REPLACE statements. The column mask is a bit mask which has a bit set if the corresponding field is updated by update operations. It is used by the write iterator for two purposes. First, the write iterator skips REPLACE statements that don't update key fields. Second, the write iterator turns a REPLACE that has a column mask that intersects with key fields into an INSERT (so that it can get annihilated with a DELETE when the time comes). The latter is correct, because if an update() does update secondary key fields, then it must have deleted the old tuple and hence the new tuple is unique in terms of extended key (merged primary and secondary key parts, i.e. cmp_def). The problem is that a bit may be set in a column mask even if the corresponding field does not actually get updated. For example, consider the following example. s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 10} box.snapshot() s:update(1, {{'=', 2, 10}}) The update() doesn't modify the secondary key field so it only writes REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10} too, but it gets overwritten by the REPLACE). However, the REPLACE has column mask that says that update() does modify the key field, because a column mask is generated solely from update operations, before applying them. As a result, the write iterator will not skip this REPLACE on dump. This won't have any serious consequences, because this is a mere optimization. What is worse, the write iterator will also turn the REPLACE into an INSERT, which is absolutely wrong as the REPLACE is preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE statement and the INSERT created by the write iterator from the REPLACE will get annihilated, leaving the old INSERT{1, 10} visible. The issue may result in invalid select() output as demonstrated in the issue description. It may also result in crashes, because the tuple cache is very sensible to invalid select() output. To fix this issue let's clear key bits in the column mask if we detect that an update() doesn't actually update secondary key fields although the column mask says it does. Closes #3607

locker · 2018-08-10T18:49:43Z

Fixed by e72867c

If an UPDATE request doesn't touch key parts of a secondary index, we don't need to write it to the index memory level or dump it to disk, as this would only increase IO load. Historically, we use column mask set by the UPDATE operation to skip secondary indexes that are not affected by the operation on commit. However, there's a problem here: the column mask isn't precise - it may have a bit set even if the corresponding column doesn't really get updated, e.g. consider {'+', 2, 0}. Not taking this into account may result in appearance of phantom tuples on disk as the write iterator assumes that statements that have no effect aren't written to secondary indexes (this is needed to apply INSERT+DELETE "annihilation" optimization). We fixed that by clearing column mask bits in vy_tx_set in case we detect that the key isn't changed, for more details see #3607 and commit e72867c ("vinyl: fix appearance of phantom tuple in secondary index after update"). It was rather an ugly hack, but it worked. However, it turned out that apart from looking hackish this code has a nasty bug that may lead to tuples missing from secondary indexes. Consider the following example: s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 3, 2}}) box.commit() The first update operation writes DELETE{1,1} and REPLACE{2,1} to the secondary index write set. The second update replaces REPLACE{2,1} with DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with REPLACE{2,1} in the write set, we assume that the update doesn't modify secondary index key parts and clear the column mask so as not to commit a pointless request, see vy_tx_set. As a result, we skip the first update too and get key {2,1} missing from the secondary index. Actually, it was a dumb idea to use column mask to skip statements in the first place, as there's a much easier way to filter out statements that have no effect for secondary indexes. The thing is every DELETE statement inserted into a secondary index write set acts as a "single DELETE", i.e. there's exactly one older statement it is supposed to purge. This is, because in contrast to the primary index we don't write DELETE statements blindly - we always look up the tuple overwritten in the primary index first. This means that REPLACE+DELETE for the same key is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE can be treated as no-op, too, because secondary indexes don't store full tuples hence all REPLACE statements for the same key are equivalent. By marking such pair of statements as no-op in vy_tx_set, we guarantee that no-op statements don't make it to secondary index memory or disk levels. Closes #4242

If an UPDATE request doesn't touch key parts of a secondary index, we don't need to re-index it in the in-memory secondary index, as this would only increase IO load. Historically, we use column mask set by the UPDATE operation to skip secondary indexes that are not affected by the operation on commit. However, there's a problem here: the column mask isn't precise - it may have a bit set even if the corresponding column value isn't changed by the update operation, e.g. consider {'+', 2, 0}. Not taking this into account may result in appearance of phantom tuples on disk as the write iterator assumes that statements that have no effect aren't written to secondary indexes (this is needed to apply INSERT+DELETE "annihilation" optimization). We fixed that by clearing column mask bits in vy_tx_set in case we detect that the key isn't changed, for more details see #3607 and commit e72867c ("vinyl: fix appearance of phantom tuple in secondary index after update"). It was rather an ugly hack, but it worked. However, it turned out that apart from looking hackish this code has a nasty bug that may lead to tuples missing from secondary indexes. Consider the following example: s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 3, 2}}) box.commit() The first update operation writes DELETE{1,1} and REPLACE{2,1} to the secondary index write set. The second update replaces REPLACE{2,1} with DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with REPLACE{2,1} in the write set, we assume that the update doesn't modify secondary index key parts and clear the column mask so as not to commit a pointless request, see vy_tx_set. As a result, we skip the first update too and get key {2,1} missing from the secondary index. Actually, it was a dumb idea to use column mask to skip statements in the first place, as there's a much easier way to filter out statements that have no effect for secondary indexes. The thing is every DELETE statement inserted into a secondary index write set acts as a "single DELETE", i.e. there's exactly one older statement it is supposed to purge. This is, because in contrast to the primary index we don't write DELETE statements blindly - we always look up the tuple overwritten in the primary index first. This means that REPLACE+DELETE for the same key is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE can be treated as no-op, too, because secondary indexes don't store full tuples hence all REPLACE statements for the same key are equivalent. By marking both statements as no-op in vy_tx_set, we guarantee that no-op statements don't make it to secondary index memory or disk levels. Closes #4242

If an UPDATE request doesn't touch key parts of a secondary index, we don't need to re-index it in the in-memory secondary index, as this would only increase IO load. Historically, we use column mask set by the UPDATE operation to skip secondary indexes that are not affected by the operation on commit. However, there's a problem here: the column mask isn't precise - it may have a bit set even if the corresponding column value isn't changed by the update operation, e.g. consider {'+', 2, 0}. Not taking this into account may result in appearance of phantom tuples on disk as the write iterator assumes that statements that have no effect aren't written to secondary indexes (this is needed to apply INSERT+DELETE "annihilation" optimization). We fixed that by clearing column mask bits in vy_tx_set in case we detect that the key isn't changed, for more details see #3607 and commit e72867c ("vinyl: fix appearance of phantom tuple in secondary index after update"). It was rather an ugly hack, but it worked. However, it turned out that apart from looking hackish this code has a nasty bug that may lead to tuples missing from secondary indexes. Consider the following example: s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 3, 2}}) box.commit() The first update operation writes DELETE{1,1} and REPLACE{2,1} to the secondary index write set. The second update replaces REPLACE{2,1} with DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with REPLACE{2,1} in the write set, we assume that the update doesn't modify secondary index key parts and clear the column mask so as not to commit a pointless request, see vy_tx_set. As a result, we skip the first update too and get key {2,1} missing from the secondary index. Actually, it was a dumb idea to use column mask to skip statements in the first place, as there's a much easier way to filter out statements that have no effect for secondary indexes. The thing is every DELETE statement inserted into a secondary index write set acts as a "single DELETE", i.e. there's exactly one older statement it is supposed to purge. This is, because in contrast to the primary index we don't write DELETE statements blindly - we always look up the tuple overwritten in the primary index first. This means that REPLACE+DELETE for the same key is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE can be treated as no-op, too, because secondary indexes don't store full tuples hence all REPLACE statements for the same key are equivalent. By marking both statements as no-op in vy_tx_set, we guarantee that no-op statements don't make it to secondary index memory or disk levels. Closes #4242 (cherry picked from commit 69aee6f)

If an UPDATE request doesn't touch key parts of a secondary index, we don't need to re-index it in the in-memory secondary index, as this would only increase IO load. Historically, we use column mask set by the UPDATE operation to skip secondary indexes that are not affected by the operation on commit. However, there's a problem here: the column mask isn't precise - it may have a bit set even if the corresponding column value isn't changed by the update operation, e.g. consider {'+', 2, 0}. Not taking this into account may result in appearance of phantom tuples on disk as the write iterator assumes that statements that have no effect aren't written to secondary indexes (this is needed to apply INSERT+DELETE "annihilation" optimization). We fixed that by clearing column mask bits in vy_tx_set in case we detect that the key isn't changed, for more details see #3607 and commit e72867c ("vinyl: fix appearance of phantom tuple in secondary index after update"). It was rather an ugly hack, but it worked. However, it turned out that apart from looking hackish this code has a nasty bug that may lead to tuples missing from secondary indexes. Consider the following example: s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 1, 1} box.begin() s:update(1, {{'=', 2, 2}}) s:update(1, {{'=', 3, 2}}) box.commit() The first update operation writes DELETE{1,1} and REPLACE{2,1} to the secondary index write set. The second update replaces REPLACE{2,1} with DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with REPLACE{2,1} in the write set, we assume that the update doesn't modify secondary index key parts and clear the column mask so as not to commit a pointless request, see vy_tx_set. As a result, we skip the first update too and get key {2,1} missing from the secondary index. Actually, it was a dumb idea to use column mask to skip statements in the first place, as there's a much easier way to filter out statements that have no effect for secondary indexes. The thing is every DELETE statement inserted into a secondary index write set acts as a "single DELETE", i.e. there's exactly one older statement it is supposed to purge. This is, because in contrast to the primary index we don't write DELETE statements blindly - we always look up the tuple overwritten in the primary index first. This means that REPLACE+DELETE for the same key is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE can be treated as no-op, too, because secondary indexes don't store full tuples hence all REPLACE statements for the same key are equivalent. By marking both statements as no-op in vy_tx_set, we guarantee that no-op statements don't make it to secondary index memory or disk levels. Closes #4242

locker added bug Something isn't working vinyl labels Aug 3, 2018

locker self-assigned this Aug 3, 2018

kyukhin added this to the 1.9.2 milestone Aug 3, 2018

locker added the ready for review label Aug 10, 2018

locker closed this as completed Aug 10, 2018

locker removed the ready for review label Aug 10, 2018

locker mentioned this issue May 27, 2019

Improve test checking secondary index consistency #4251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale tuples in secondary index under certain conditions #3607

Stale tuples in secondary index under certain conditions #3607

locker commented Aug 3, 2018

locker commented Aug 10, 2018

Stale tuples in secondary index under certain conditions #3607

Stale tuples in secondary index under certain conditions #3607

Comments

locker commented Aug 3, 2018

locker commented Aug 10, 2018