-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deleteObject in history-preserving mode to conform to ZODB interface #484
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ZODB specifies deleteObject to create new revision that indicates object removal: def deleteObject(oid, serial, transaction): """Mark an object as deleted This method marks an object as deleted VIA A NEW OBJECT REVISION. Subsequent attempts to load current data for the object will fail with a POSKeyError, but loads for non-current data will succeed if there are previous non-delete records. The object will be removed from the storage when all not-delete records are removed. https://github.com/zopefoundation/ZODB/blob/bc13ca74/src/ZODB/interfaces.py#L1292-L1307 (emphasis mine) However currently for history-preserving mode, as explained in zopefoundation/ZODB#318 (comment), RelStorage purges latest object revision instead of creating new one with whiteout indication. This goes against deleteObject specification and, as demonstrated by attached test program, against particular FileStorage behaviour. -> Fix it. P.S. I'm complete RelStorage newbie and looked only briefly. It could be that my patch is e.g. incomplete, or not optimal. However it demonstrates a real problem, and it fixes both adjusted testcase and failure of attached tdelete.py P.P.S. Tested only with SQLite backend. ---- 8< ---- (tdelete.py) #!/usr/bin/env python """tdelete.py demonstrates that deleteObject should create new whiteout record, and that older data records should be still accessible. e.g. with FileStorage: $ ./tdelete.py file://1.db @03e40964a0766f33 (= 280359404597309235) obj<0000000000000001> -> int(0) @03e40964a0790944 (= 280359404597479748) obj<0000000000000001> -> int(1) -------- @03e40964a0766f33 obj<0000000000000001> -> int(0) # must be int(0) @03e40964a0790944 obj<0000000000000001> -> int(1) # must be int(1) However it currently fails with RelStorage, because deleteObject does not create new whiteout revision and instead purges already committed data: $ rm x/*; ./tdelete.py sqlite://?data_dir=`pwd`/x @03e40972d5408022 (= 280359465612509218) obj<0000000000000001> -> int(0) @03e40972d541ddee (= 280359465612598766) obj<0000000000000001> -> int(1) -------- @03e40972d5408022 obj<0000000000000001> -> int(0) # must be int(0) Traceback (most recent call last): File "./tdelete.py", line 84, in <module> main() File "./tdelete.py", line 80, in main dumpObjAt(at1, "must be int(1)") File "./tdelete.py", line 75, in dumpObjAt obj = conn.get(oid) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/Connection.py", line 238, in get obj = self._reader.getGhost(p) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/serialize.py", line 598, in getGhost unpickler = self._get_unpickler(pickle) File "/home/kirr/src/wendelin/z/ZODB/src/ZODB/serialize.py", line 478, in _get_unpickler file = BytesIO(pickle) TypeError: StringIO() argument 1 must be string or buffer, not None """ from __future__ import print_function import zodburi from persistent import Persistent from ZODB.DB import DB from ZODB.Connection import TransactionMetaData from ZODB.utils import u64 import transaction import sys class PInt(Persistent): def __init__(self, i): self.i = i def __str__(self): return "int(%d)" % self.i def h(tid): return tid.encode('hex') def dump(obj): print("@%s (= %d) obj<%s> -> %s" % (h(obj._p_serial), u64(obj._p_serial), h(obj._p_oid), obj)) def main(): zurl = sys.argv[1] zstoropen, dbkw = zodburi.resolve_uri(zurl) stor = zstoropen() db = DB(stor, **dbkw) conn = db.open() root = conn.root() root['X'] = obj = PInt(0) transaction.commit() dump(obj) at0 = obj._p_serial oid = obj._p_oid obj.i += 1 transaction.commit() dump(obj) at1 = obj._p_serial txn_meta = TransactionMetaData() stor.tpc_begin(txn_meta) stor.deleteObject(oid, at1, txn_meta) stor.tpc_vote(txn_meta) stor.tpc_finish(txn_meta) print('\n--------\n') def dumpObjAt(at, comment): conn = db.open(at=at) obj = conn.get(oid) print("@%s obj<%s> -> %s\t# %s" % (h(at), h(oid), obj, comment)) conn.close() dumpObjAt(at0, "must be int(0)") dumpObjAt(at1, "must be int(1)") if __name__ == '__main__': main() P.P.P.S. SQLite URI resolver is currently broken after 08259fa (Finer control over sqlite storage locking, oid allocation and stats). I've used the following local patch as a workaround: --- a/src/relstorage/zodburi_resolver.py +++ b/src/relstorage/zodburi_resolver.py @@ -121,14 +121,14 @@ def factory(options): return factory, unused class SqliteAdapterHelper(Resolver): - _string_args = ('path',) + _string_args = ('data_dir',) def __call__(self, parsed_uri, kw): kw, unused = self.interpret_kwargs(kw) def factory(options): from relstorage.adapters.sqlite.adapter import Sqlite3Adapter - return Sqlite3Adapter(options=options, **kw) + return Sqlite3Adapter(options=options, pragmas={}, **kw) return factory, unused # The relstorage support is inspired by django-zodb.
… interface Fix failure of test_exits_critical_section for history-preserving mode: Failure in test test_exits_critical_section (relstorage.storage.tpc.tests.test_vote.TestHistoryPreservingDeleteOnly) Traceback (most recent call last): File "C:\Python27-x64\lib\unittest\case.py", line 329, in run testMethod() File "c:\projects\relstorage\src\relstorage\storage\tpc\tests\test_vote.py", line 136, in test_exits_critical_section self._check_lock_and_move_commit(committed) File "c:\projects\relstorage\src\relstorage\storage\tpc\tests\test_vote.py", line 218, in _check_lock_and_move_commit self.assertFalse(committed) File "C:\Python27-x64\lib\unittest\case.py", line 416, in assertFalse raise self.failureException(msg) AssertionError: True is not false I was originally running tests with --layer sqlite which does not include this particular test - that's why it was missed. Committed becomes true, because now deletions are committed the same way as regular data records - only with data payload = NULL.
… interface Let's try to fix the following failure with PostgreSQL: Error in test test_check_refs_missing_after_prepack (relstorage.tests.testpostgresql.PostgreSQLHistoryFreeTestPack_psycopg2) Traceback (most recent call last): File "C:\Python27-x64\lib\unittest\case.py", line 329, in run testMethod() File "c:\projects\relstorage\src\relstorage\tests\packundo.py", line 238, in test_check_refs_missing_after_prepack missing = self._storage.pack(None, referencesf, check_refs=True, skip_prepack=True) File "c:\projects\relstorage\src\relstorage\storage\__init__.py", line 922, in pack result = pack.check_refs(referencesf) File "src\\perfmetrics\\metric.py", line 72, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\storage\pack.py", line 223, in check_refs return self.packundo.check_refs(tid_int) File "c:\projects\relstorage\src\relstorage\adapters\packundo.py", line 320, in check_refs self.runner.run_script_stmt(ss_load_cursor, stmt) File "c:\projects\relstorage\src\relstorage\adapters\scriptrunner.py", line 81, in run_script_stmt cursor.execute(stmt, generic_params) - __traceback_info__: SELECT zoid, to_zoid FROM pack_object INNER JOIN object_ref USING (zoid) WHERE keep = TRUE AND NOT EXISTS ( SELECT state FROM ( SELECT state, MAX(tid) FROM object_state WHERE object_state.zoid = to_zoid ) WHERE state IS NOT NULL ) SyntaxError: subquery in FROM must have an alias LINE 7: SELECT state FROM ( ^ HINT: For example, FROM (SELECT ...) [AS] foo.
… interface Now try to fix the next problem with PostgreSQL: Error in test test_check_refs_missing_after_prepack (relstorage.tests.testpostgresql.PostgreSQLHistoryFreeTestPack_psycopg2) Traceback (most recent call last): File "C:\Python38-x64\lib\unittest\case.py", line 60, in testPartExecutor yield File "C:\Python38-x64\lib\unittest\case.py", line 676, in run self._callTestMethod(testMethod) File "C:\Python38-x64\lib\unittest\case.py", line 633, in _callTestMethod method() File "c:\projects\relstorage\src\relstorage\tests\packundo.py", line 238, in test_check_refs_missing_after_prepack missing = self._storage.pack(None, referencesf, check_refs=True, skip_prepack=True) File "c:\projects\relstorage\src\relstorage\storage\__init__.py", line 922, in pack result = pack.check_refs(referencesf) File "src\\perfmetrics\\metric.py", line 72, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\storage\pack.py", line 223, in check_refs return self.packundo.check_refs(tid_int) File "c:\projects\relstorage\src\relstorage\adapters\packundo.py", line 320, in check_refs self.runner.run_script_stmt(ss_load_cursor, stmt) File "c:\projects\relstorage\src\relstorage\adapters\scriptrunner.py", line 81, in run_script_stmt cursor.execute(stmt, generic_params) - __traceback_info__: SELECT zoid, to_zoid FROM pack_object INNER JOIN object_ref USING (zoid) WHERE keep = TRUE AND NOT EXISTS ( SELECT state FROM ( SELECT state, MAX(tid) FROM object_state WHERE object_state.zoid = to_zoid ) t WHERE state IS NOT NULL ) psycopg2.errors.GroupingError: column "object_state.state" must appear in the GROUP BY clause or be used in an aggregate function LINE 8: SELECT state, MAX(tid) ^
… interface Try to fix the following problem with MySQL: Error in test test_check_refs_missing_after_prepack (relstorage.tests.testmysql.MySQLHistoryFreeTestPack_PyMySQL) Traceback (most recent call last): File "C:\Python38-x64\lib\unittest\case.py", line 60, in testPartExecutor yield File "C:\Python38-x64\lib\unittest\case.py", line 676, in run self._callTestMethod(testMethod) File "C:\Python38-x64\lib\unittest\case.py", line 633, in _callTestMethod method() File "c:\projects\relstorage\src\relstorage\tests\packundo.py", line 238, in test_check_refs_missing_after_prepack missing = self._storage.pack(None, referencesf, check_refs=True, skip_prepack=True) File "c:\projects\relstorage\src\relstorage\storage\__init__.py", line 922, in pack result = pack.check_refs(referencesf) File "src\\perfmetrics\\metric.py", line 72, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\storage\pack.py", line 223, in check_refs return self.packundo.check_refs(tid_int) File "c:\projects\relstorage\src\relstorage\adapters\packundo.py", line 325, in check_refs self.runner.run_script_stmt(ss_load_cursor, stmt) File "c:\projects\relstorage\src\relstorage\adapters\scriptrunner.py", line 81, in run_script_stmt cursor.execute(stmt, generic_params) - __traceback_info__: SELECT zoid, to_zoid FROM pack_object INNER JOIN object_ref USING (zoid) WHERE keep = TRUE AND NOT EXISTS ( SELECT state FROM ( SELECT state FROM object_state WHERE object_state.zoid = to_zoid AND object_state.tid = ( SELECT MAX(tid) FROM object_state WHERE object_state.zoid = to_zoid ) ) t WHERE state IS NOT NULL ) File "C:\Python38-x64\lib\site-packages\pymysql\cursors.py", line 148, in execute result = self._query(query) File "C:\Python38-x64\lib\site-packages\pymysql\cursors.py", line 415, in _query conn.query(q, unbuffered=True) File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 548, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 768, in _read_query_result result.init_unbuffered_query() File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 1173, in init_unbuffered_query first_packet = self.connection._read_packet() File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 725, in _read_packet packet.raise_for_error() File "C:\Python38-x64\lib\site-packages\pymysql\protocol.py", line 221, in raise_for_error err.raise_mysql_exception(self._data) File "C:\Python38-x64\lib\site-packages\pymysql\err.py", line 143, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.OperationalError: (1054, "Unknown column 'to_zoid' in 'where clause'")
… interface More SQL rewrite trying to please MySQL: Error in test test_check_refs_nothing_missing (relstorage.tests.testmysql.MySQLHistoryPreservingTestPack_PyMySQL) Traceback (most recent call last): File "C:\Python38-x64\lib\unittest\case.py", line 60, in testPartExecutor yield File "C:\Python38-x64\lib\unittest\case.py", line 676, in run self._callTestMethod(testMethod) File "C:\Python38-x64\lib\unittest\case.py", line 633, in _callTestMethod method() File "c:\projects\relstorage\src\relstorage\tests\packundo.py", line 201, in test_check_refs_nothing_missing missing = self._storage.pack(None, referencesf, check_refs=True) File "c:\projects\relstorage\src\relstorage\storage\__init__.py", line 922, in pack result = pack.check_refs(referencesf) File "src\\perfmetrics\\metric.py", line 72, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\storage\pack.py", line 223, in check_refs return self.packundo.check_refs(tid_int) File "c:\projects\relstorage\src\relstorage\adapters\packundo.py", line 325, in check_refs self.runner.run_script_stmt(ss_load_cursor, stmt) File "c:\projects\relstorage\src\relstorage\adapters\scriptrunner.py", line 81, in run_script_stmt cursor.execute(stmt, generic_params) - __traceback_info__: SELECT zoid, to_zoid FROM pack_object INNER JOIN object_ref USING (zoid) WHERE keep = TRUE AND NOT EXISTS ( SELECT state FROM ( SELECT state FROM object_state WHERE object_state.zoid = object_ref.to_zoid AND object_state.tid = ( SELECT MAX(tid) FROM object_state WHERE object_state.zoid = object_ref.to_zoid ) ) t WHERE state IS NOT NULL ) File "C:\Python38-x64\lib\site-packages\pymysql\cursors.py", line 148, in execute result = self._query(query) File "C:\Python38-x64\lib\site-packages\pymysql\cursors.py", line 415, in _query conn.query(q, unbuffered=True) File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 548, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 768, in _read_query_result result.init_unbuffered_query() File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 1173, in init_unbuffered_query first_packet = self.connection._read_packet() File "C:\Python38-x64\lib\site-packages\pymysql\connections.py", line 725, in _read_packet packet.raise_for_error() File "C:\Python38-x64\lib\site-packages\pymysql\protocol.py", line 221, in raise_for_error err.raise_mysql_exception(self._data) File "C:\Python38-x64\lib\site-packages\pymysql\err.py", line 143, in raise_mysql_exception raise errorclass(errno, errval) pymysql.err.OperationalError: (1054, "Unknown column 'object_ref.to_zoid' in 'where clause'")
… interface Try to fix the following error on PostgreSQL: Error in test checkImplementsIExternalGC (relstorage.tests.testpostgresql.PostgreSQLHistoryPreservingRelStorageTests_psycopg2) Traceback (most recent call last): File "C:\Python38-x64\lib\unittest\case.py", line 60, in testPartExecutor yield File "C:\Python38-x64\lib\unittest\case.py", line 676, in run self._callTestMethod(testMethod) File "C:\Python38-x64\lib\unittest\case.py", line 633, in _callTestMethod method() File "c:\projects\relstorage\src\relstorage\tests\hptestbase.py", line 408, in checkImplementsIExternalGC invalidations = storage.tpc_vote(t) File "src\\perfmetrics\\metric.py", line 72, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\storage\__init__.py", line 498, in tpc_vote next_phase = self._tpc_phase.tpc_vote(self, transaction) File "c:\projects\relstorage\src\relstorage\storage\tpc\begin.py", line 101, in tpc_vote next_phase.enter(storage) File "c:\projects\relstorage\src\relstorage\storage\tpc\vote.py", line 158, in enter resolved_in_vote_oid_ints = self._vote(storage) File "c:\projects\relstorage\src\relstorage\storage\tpc\vote.py", line 637, in _vote return super(HistoryPreservingDeleteOnly, self)._vote(storage) File "c:\projects\relstorage\src\relstorage\storage\tpc\vote.py", line 200, in _vote self._flush_temps_to_db(cursor) - __traceback_info__: (<StoreConnection at 0xe7d5b5f9a0 active=False description={'backend_pid': 2820} conn=None cur=None>, <cursor object at 0x000000E7D6742740; closed: -1>) File "c:\projects\relstorage\src\relstorage\_util.py", line 304, in f result = func(*args, **kwargs) File "c:\projects\relstorage\src\relstorage\storage\tpc\vote.py", line 166, in _flush_temps_to_db self.shared_state.adapter.mover.store_temps(cursor, self.shared_state.temp_storage) File "src\\perfmetrics\\metric.py", line 66, in perfmetrics._metric._AbstractMetricImpl.__call__ File "c:\projects\relstorage\src\relstorage\adapters\postgresql\mover.py", line 230, in store_temps self._do_store_temps(cursor, state_oid_tid_iter, 'temp_store') File "c:\projects\relstorage\src\relstorage\adapters\postgresql\mover.py", line 226, in _do_store_temps cursor.copy_expert(buf.COPY_COMMAND, buf) psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: TypeError object of type 'NoneType' has no len() PostgreSQL backend provides optimized mover which copies data in binary format directly into the database (see 334f3e4 "Store objects using COPY FROM BINARY on pg8000 and psycopg2"). So in addition to TPCTemporaryStorage - which was extended to cope with data=NULL in the main patch - this specialized PostgreSQL-specific TempStoreCopyBuffer has to be adjusted as well. Not tested locally - let's see how it goes on CI.
@jamadden, all AppVeyor tests passed. This pull-request is no longer work-in-progress. |
navytux
added a commit
to navytux/relstorage
that referenced
this pull request
Nov 11, 2021
loadBeforeEx is like loadBefore, but simpler, provides better information for object delete records and can be more efficiently implemented by many storages: zopefoundation/ZODB#323 On RelStorage loadBefore is currently implemented via 3 SQL queries: 1) check whether object record exists at all 2) retrieve object state 3) retrieve serial of next object revision Compared to that loadBeforeEx is implemented via only one SQL query "2" from the above - "retrieve object state". It is exactly the same query that loadBefore uses and after the patch loadBefore actually invokes loadBeforeEx for step 2. This change was outlined in zopefoundation/ZODB#318 (comment) and zopefoundation/ZODB#318 (comment) and as explained in the first link this patch is also semantically coupled with zodb#484 This patch passes tests with both ZODB5 and with ZODB5+zopefoundation/ZODB#323: - when ran with ZODB5 it verifies that loadBefore implementation does not become broken. - when ran with ZODB5+zopefoundation/ZODB#323 it verifies that loadBeforeEx implementation is correct. For tests to pass with ZODB5+zopefoundation/ZODB#323 we also need zopefoundation/zc.zlibstorage#11 because without that fix zc.zlibstorage does not decompress data on loadBeforeEx.
jamadden
added a commit
that referenced
this pull request
Jun 28, 2023
…with a whiteout row for each deleted object. Fixes #484.
jamadden
added a commit
that referenced
this pull request
Jun 28, 2023
…with a whiteout row for each deleted object. Fixes #484.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ZODB specifies deleteObject to create new revision that indicates object
removal:
https://github.com/zopefoundation/ZODB/blob/bc13ca74/src/ZODB/interfaces.py#L1292-L1307
(emphasis mine)
However currently for history-preserving mode, as explained in
zopefoundation/ZODB#318 (comment),
RelStorage purges latest object revision instead of creating new one with
whiteout indication. This goes against deleteObject specification and, as
demonstrated by attached test program, against particular FileStorage
behaviour.
-> Fix it.
P.S. I'm complete RelStorage newbie and looked only briefly. It could be that
my patch is e.g. incomplete, or not optimal. However it demonstrates a real
problem, and it fixes both adjusted testcase and failure of attached tdelete.py
P.P.S. Tested only with SQLite backend.
P.P.P.S. SQLite URI resolver is currently broken after 08259fa (Finer
control over sqlite storage locking, oid allocation and stats). I've used the
following local patch as a workaround: