Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStorage: fix rare data corruption when using restore after multiple undos #395

Conversation

perrinjerome
Copy link
Contributor

@perrinjerome perrinjerome commented Jan 31, 2024

The case of a history like this:

  • T0 initialize an object in state 0
  • T1 modifies object to state 1
  • T2 modifies object to state 2
  • T3 undo T2 and T1, bringing back to state 0
  • T4 modified object to state 3
  • T5 undo T4, bringing back object to state 0

was not correct after restore: the state is 1 instead of the expected 0.

This is because T3 contains two data records:

  • an undo of T2, with a backpointer to the data of state 1
  • an undo of T1, with a backpointer to the data of state 0

When restoring T5 (the undo of T4), the transaction is made of one data record, with a backpointer that is copied from the backpointer from T3, but this uses backpointer from the first record for this object, which is incorrect, in such a case where there is more than one backpointer for the same oid, we need to iterate in all data record to find the oldest version.

The case of a history like this:
 - T0 initialize an object in state 0
 - T1 modifies object to state 1
 - T2 modifies object to state 2
 - T3 undo T2 and T1, bringing back to state 0
 - T4 modified object to state 3
 - T5 undo T4, bringing back object to state 0

was not correct after `restore`: the state is 1 instead of the expected 0.

This is because T3 contains two data records:
 - an undo of T2, with a backpointer to the data of state 1
 - an undo of T1, with a backpointer to the data of state 0
When restoring T5 (the undo of T4), the transaction is made of one data
record, with a backpointer that is copied from the backpointer from T3,
but this uses backpointer from the first record for this object, which
is incorrect, in such a case where there is more than one backpointer
for the same oid, we need to iterate in all data record to find the
oldest version.
@perrinjerome perrinjerome force-pushed the fix/restore_undo_to_multiple_undo branch from e68c5f3 to 44d36b0 Compare January 31, 2024 14:44
@perrinjerome perrinjerome changed the title FileStorage: fix data corruption when using restore after multiple undos FileStorage: fix rare data corruption when using restore after multiple undos Jan 31, 2024
@perrinjerome
Copy link
Contributor Author

For reference, the file format is described in

# File-based ZODB storage
#
# Files are arranged as follows.
#
# - The first 4 bytes are a file identifier.
#
# - The rest of the file consists of a sequence of transaction
# "records".
#
# A transaction record consists of:
#
# - 8-byte transaction id, which is also a time stamp.
#
# - 8-byte transaction record length - 8.
#
# - 1-byte status code
# ' ' (a blank) completed transaction that hasn't been packed
# 'p' completed transaction that has been packed
# 'c' checkpoint -- a transaction in progress, at the end of the file;
# it's been thru vote() but not finish(); if finish() completes
# normally, it will be overwritten with a blank; if finish() dies
# (e.g., out of disk space), cleanup code will try to truncate
# the file to chop off this incomplete transaction
# 'u' uncertain; no longer used; was previously used to record something
# about non-transactional undo
#
# - 2-byte length of user name
#
# - 2-byte length of description
#
# - 2-byte length of extension attributes
#
# - user name
#
# - description
#
# - extension attributes
#
# * A sequence of data records
#
# - 8-byte redundant transaction length -8
#
# A data record consists of
#
# - 8-byte oid.
#
# - 8-byte tid, which matches the transaction id in the transaction record.
#
# - 8-byte previous-record file-position.
#
# - 8-byte beginning of transaction record file position.
#
# - 2-bytes with zero values. (Was version length.)
#
# - 8-byte data length
#
# ? data
# (data length > 0)
#
# ? 8-byte position of data record containing data
# (data length == 0)
#
# Note that the lengths and positions are all big-endian.
# Also, the object ids time stamps are big-endian, so comparisons
# are meaningful.
#
# Backpointers
#
# When we undo a record, we don't copy (or delete)
# data. Instead, we write records with back pointers.

@perrinjerome perrinjerome marked this pull request as ready for review January 31, 2024 14:47
src/ZODB/FileStorage/FileStorage.py Show resolved Hide resolved
# looking at other records from this transaction,
# if there is multiple undo records, we use the
# oldest.
backpointer = pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we sure that the "backpointer" found at the highest position is the right one? I.e. why have older undo records a higher position?

With your change, the code no longer matches the introductory source comment. The source comment should get updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello everyone. I had the same concern and actually this issue is not only about undo, but also how FileStorage handles and/or should handle a situation when one transaction contains two data records with the same oid. What loadBefore should return for such case - data from the first data record or from the second? Current behaviour is to return the data from the second data record - the one that was last to be .store()' ed. Proof:

create database with multiple data entries in the same transaction for oid=1

---- 8< ---- (1.txn)

user "author"
description "T1"
extension ""
obj 0000000000000001 3 null:00
AAA
obj 0000000000000001 3 null:00
BBB
$ python -m zodbtools.zodb commit 1.fs 0000000000000000 <1.txn 
03f63ee48fa55c00
$ fs1 dump 1.fs
Trans #00000 tid=03f63ee48fa55c00 time=2024-02-01 10:44:33.667016 offset=35
    status=' ' user='author' description='T1'
  data #00000 oid=0000000000000001 size=3 class=?.?
  data #00001 oid=0000000000000001 size=3 class=?.?

try to read that oid=1

---- 8< ---- (catobj.py)

from __future__ import print_function

from ZODB.FileStorage import FileStorage
from ZODB.utils import p64, u64

stor = FileStorage('1.fs', read_only=True)
head = stor.lastTransaction()
head_ = p64(u64(head)+1)
oid = p64(1)
x = stor.loadBefore(oid, head_)
print(x)
$ python catobj.py 
('BBB', '\x03\xf6>\xe4\x8f\xa5\\\x00', None)

I'm not sure 100% but probably this is kind of expected behaviour.


So taking this into account for consistency it should be also the latest undo record that is taking the effect.

And for generality we should also consider cases like:

  • data record 1: undo to T2
  • data record 2: undo to T1
  • data record 3: store data3

because here it should be data3 to be returned by loadBefore, not data from T1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(undo record can be, and probably should be, perceived as regular store with instruction to copy data from particular previous data record)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above point of view I suggest to amend the patch with the following:

--- a/src/ZODB/FileStorage/FileStorage.py
+++ b/src/ZODB/FileStorage/FileStorage.py
@@ -663,6 +663,10 @@ def _data_find(self, tpos, oid, data):
         # Else oid's data record contains the data, and the file offset of
         # oid's data record is returned.  This data record should contain
         # a pickle identical to the 'data' argument.
+        # When looking for oid's data record we scan all data records in
+        # the transaction till the end looking for the latest record with oid,
+        # even if there are multiple such records. This matches load behaviour
+        # which uses the data record created by last store.
 
         # Unclear:  If the length of the stored data doesn't match len(data),
         # an exception is raised.  If the lengths match but the data isn't
@@ -674,31 +678,36 @@ def _data_find(self, tpos, oid, data):
         self._file.read(ul + dl + el)
         tend = tpos + tl
         pos = self._file.tell()
-        backpointer = None
+        data_hdr = None
+        data_pos = 0
+        # scan all data records in this transaction looking for the latest
+        # record with our oid
         while pos < tend:
             h = self._read_data_header(pos)
             if h.oid == oid:
-                # Make sure this looks like the right data record
-                if h.plen == 0:
-                    # This is also a backpointer, remember it and keep
-                    # looking at other records from this transaction,
-                    # if there is multiple undo records, we use the
-                    # oldest.
-                    backpointer = pos
-                else:
-                    if h.plen != len(data):
-                        # The expected data doesn't match what's in the
-                        # backpointer.  Something is wrong.
-                        logger.error("Mismatch between data and"
-                                     " backpointer at %d", pos)
-                        return 0
-                    _data = self._file.read(h.plen)
-                    if data != _data:
-                        return 0
-                    return pos
+                data_hdr = h
+                data_pos = pos
             pos += h.recordlen()
             self._file.seek(pos)
-        return backpointer or 0
+        if data_hdr is None:
+            return 0
+
+        # return position of found data record, but make sure this looks like
+        # the right data record to return.
+        if data_hdr.plen == 0:
+            # This is also a backpointer,  Gotta trust it.
+            return data_pos
+        else:
+            if data_hdr.plen != len(data):
+                # The expected data doesn't match what's in the
+                # backpointer.  Something is wrong.
+                logger.error("Mismatch between data and"
+                             " backpointer at %d", pos)
+                return 0
+            _data = self._file.read(data_hdr.plen)
+            if data != _data:
+                return 0
+            return data_pos
 
     def restore(self, oid, serial, data, version, prev_txn, transaction):
         # A lot like store() but without all the consistency checks.  This

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I like this patch better, it's more logical. I have integrated this in the pull request.

When I wrote that patch, the reason to take the latest data record was to match the behavior of _txn_undo_write, it iterates while pos < tend: and updates the index.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jérome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I see about your original observations.

@d-maurer
Copy link
Contributor

d-maurer commented Feb 1, 2024

I very much prefer PRs from the main repository (rather than from forks): for difficult code, it is important to see the modification in a larger context and not only the diff. To support this, I have cloned the main repository and do not like to add additional clones (or "remote"s) for reviews.

@navytux
Copy link
Contributor

navytux commented Feb 1, 2024

Hello Dieter. I understand your rationale, and on my side try to follow your advice after you expressed the same reasoning to me. But just for the reference, if you add the following into ZODB/.git/config

[remote "origin"]
        url = https://github.com/zopefoundation/ZODB.git
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = +refs/pull/*/head:refs/remotes/origin/pr/*          <-- this line

you will be able to check out code of all pull requests without adding any remote or something. For example with

hereby PR it would be git fetch origin; git checkout origin/pr/395.

Co-authored-by: Kirill Smelkov <kirr@nexedi.com>
@d-maurer
Copy link
Contributor

d-maurer commented Feb 1, 2024 via email

@perrinjerome
Copy link
Contributor Author

Ah yes, nowadays zope.testrunner's -D no longer enter debugger on skipped tests ( was fixed in zopefoundation/zope.testrunner#141 ) , but there is still something that teardown is not always executed ( zopefoundation/zope.testrunner#142 (comment) ), so it seems still safer to just keep using the "pass" pattern.

perrinjerome added a commit to perrinjerome/ZODB that referenced this pull request Feb 14, 2024
…txn_prev

if restore is called with an existing txn_prev that does not contain
the oid, the "8-byte redundant transaction length -8" and the beginning
of the next transaction was read as a data record.

The condition to stop was incorrect as discussed in [1], in practice
this error would only happens when restore is passed a bad txn_prev so
this should not be a problem in normal conditions because we are
supposed to find the data record for that oid.

[1] zopefoundation#395 (comment)
perrinjerome added a commit to perrinjerome/ZODB that referenced this pull request Feb 14, 2024
…txn_prev

if restore is called with an existing txn_prev that does not contain
the oid, the "8-byte redundant transaction length -8" and the beginning
of the next transaction was read as a data record.

The condition to stop was incorrect as discussed in [1], in practice
this error would only happens when restore is passed a bad txn_prev so
this should not be a problem in normal conditions because we are
supposed to find the data record for that oid.

[1] zopefoundation#395 (comment)
@d-maurer
Copy link
Contributor

d-maurer commented Feb 14, 2024 via email

@navytux
Copy link
Contributor

navytux commented Feb 16, 2024

@perrinjerome, thanks for adding the test and robustifying handling of records reader.

@d-maurer, thanks also for confirming you are ok with the fix and we all now share common understanding.

I think that tests are added not because the code is wrong, but to preserve it to stay working correctly. So in general whenever we fix something, we try to add a test for the fixed problem so that we know for sure it won't return after follow-up changes.

Sometimes it might be hard to do a test and the difficulties outweight usefulness of having tests. But in this case we already have tests done as part of #397 so unless there is a strong reason not to take them, let's just add both the fix and the test to make sure that quality is improved and stays ok.

Anyway, besides the tests, am I right that we all three agree that this patch is ok now? Or did I missed something and there is still something else to be improved?

Kirill

@d-maurer
Copy link
Contributor

d-maurer commented Feb 16, 2024 via email

@navytux
Copy link
Contributor

navytux commented Feb 16, 2024

Dieter, thanks for feedback. What you say is reasonable.

@perrinjerome
Copy link
Contributor Author

I feel that we agreed on merging this after merging #397 , I "requested review" from github interface, if you agree please approve the review and I'll apply the changes.

@perrinjerome perrinjerome merged commit 8160568 into zopefoundation:master Mar 4, 2024
20 checks passed
@perrinjerome perrinjerome deleted the fix/restore_undo_to_multiple_undo branch March 4, 2024 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants