Skip to content

Commit

Permalink
Implement IExternalGC for history-preserving storage. Fixes #76
Browse files Browse the repository at this point in the history
  • Loading branch information
jamadden committed Jul 15, 2019
1 parent aa0e501 commit f27a91d
Show file tree
Hide file tree
Showing 16 changed files with 444 additions and 168 deletions.
17 changes: 17 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,23 @@
``IBlobStorage``, and if ``keep-history`` is false, it won't
implement ``IStorageUndoable``.

- Fix a cache error ("TypeError: NoneType object is not
subscriptable") when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
<https://pypi.org/project/zc.zodbdgc/>`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

It is critical that ``pack-gc`` be turned off (set to false) in a
multi-database and that only ``multi-zodb-gc`` be used to perform
garbage collection.

3.0a5 (2019-07-11)
==================

Expand Down
43 changes: 34 additions & 9 deletions docs/relstorage-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,17 +244,42 @@ GC and Packing
==============

pack-gc
If pack-gc is false, pack operations do not perform
garbage collection. Garbage collection is enabled by default.
If pack-gc is false, pack operations do not perform garbage
collection. Garbage collection is enabled by default.

If garbage collection is disabled, pack operations keep at least one
revision of every object. With garbage collection disabled, the
pack code does not need to follow object references, making
packing conceivably much faster. However, some of that benefit
may be lost due to an ever increasing number of unused objects.
If garbage collection is disabled, pack operations keep at
least one revision of every object that hasn't been deleted.
With garbage collection disabled, the pack code does not need
to follow object references, making packing conceivably much
faster. However, some of that benefit may be lost due to an
ever increasing number of unused objects.

Disabling garbage collection is also a hack that ensures
inter-database references never break.
Disabling garbage collection is **required** in a
multi-database to prevent breaking inter-database references.
The only safe way to collect and then pack databases in a
multi-database is to use `zc.zodbdgc
<https://pypi.org/project/zc.zodbdgc/>`_ and run
``multi-zodb-gc``, and only then pack each individual
database.

.. note::

In history-free databases, packing after running
``multi-zodb-gc`` is not necessary. The garbage collection
process itself handles the packing. Packing is only
required in history-preserving databases.

.. versionchanged:: 3.0

Add support for ``zc.zodbdgc`` to history-preserving
databases.

Objects that have been deleted will be removed during a
pack with ``pack-gc`` disabled.

.. versionchanged:: 2.0

Add support for ``zc.zodbdgc`` to history-free databases.

pack-prepack-only
If pack-prepack-only is true, pack operations perform a full analysis
Expand Down
26 changes: 26 additions & 0 deletions src/relstorage/adapters/interfaces.py
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,32 @@ def pack(pack_tid, sleep=None, packed_func=None):
pauses.
"""

def deleteObject(cursor, oid_int, tid_int):
"""
Delete the revision of *oid_int* in transaction *tid_int*.
This method marks an object as deleted via a new object
revision. Subsequent attempts to load current data for the
object will fail with a POSKeyError, but loads for
non-current data will suceed if there are previous
non-delete records. The object will be removed from the
storage when all not-delete records are removed.
The serial argument must match the most recently committed
serial for the object. This is a seat belt.
--- Documentation for ``IExternalGC``
In history-free databases there is no such thing as a delete record, so
this should remove the single
revision of *oid_int* (which *should* be checked to verify it
is at *tid_int*), leading all access to *oid_int* in the
future to throw ``POSKeyError``.
In history preserving databases, this means to set the state for the object
at the transaction to NULL, signifying that it's been deleted. A subsequent
pack operation is required to actually remove these deleted items.
"""

class IPoller(Interface):
"""Poll for new data"""
Expand Down
133 changes: 102 additions & 31 deletions src/relstorage/adapters/packundo.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@
from ..iter import fetchmany
from ..treemark import TreeMarker
from .interfaces import IPackUndo
from ._util import DatabaseHelpersMixin

# pylint:disable=too-many-lines,unused-argument


log = logging.getLogger(__name__)

class PackUndo(object):
class PackUndo(DatabaseHelpersMixin):
"""Abstract base class for pack/undo"""

verify_sane_database = False
Expand All @@ -52,6 +53,19 @@ def __init__(self, database_driver, connmanager, runner, locker, options):
def _fetchmany(self, cursor):
return fetchmany(cursor)

def with_options(self, options):
"""
Return a new instance that will use the given options, instead
of the options originally constructed.
"""
if options == self.options:
# If the options haven't changed, return ourself. This is
# for tests that make changes to the structure of this
# object not captured in the constructor or options.
# (checkPackWhileReferringObjectChanges)
return self
return self.__class__(self.driver, self.connmanager, self.runner, self.locker, options)

def choose_pack_transaction(self, pack_point):
"""Return the transaction before or at the specified pack time.
Expand Down Expand Up @@ -159,6 +173,21 @@ def upload_batch():
if batch:
upload_batch()

# The only things to worry about are object_state and blob_chuck.
# blob chunks are deleted automatically by a foreign key.

# We shouldn't *have* to verify the oldserial in the delete statement,
# because our only consumer is zc.zodbdgc which only calls us for
# unreachable objects, so they shouldn't be modified and get a new
# TID. But it's safer to do so.
_delete_object_stmt = None

def deleteObject(self, cursor, oid, oldserial):
self.runner.run_script_stmt(
cursor,
self._delete_object_stmt,
{'oid': u64(oid), 'tid': u64(oldserial)})
return cursor.rowcount

@implementer(IPackUndo)
class HistoryPreservingPackUndo(PackUndo):
Expand Down Expand Up @@ -502,8 +531,13 @@ def pre_pack(self, pack_tid, get_references):
"""
conn, cursor = self.connmanager.open_for_pre_pack()
try:
# The pre-pack functions are responsible for managing their
# own commits; when they return, the transaction should be committed.
# The pre-pack functions are responsible for managing
# their own commits; when they return, the transaction
# should be committed.
#
# ``pack_object`` should be populated,
# essentially with the distinct list of all objects and their
# maximum (newest) transaction ids.
if self.options.pack_gc:
log.info("pre_pack: start with gc enabled")
self._pre_pack_with_gc(
Expand All @@ -519,7 +553,8 @@ def pre_pack(self, pack_tid, get_references):
to_remove = 0

if self.options.pack_gc:
# Pack objects with the keep flag set to false.
# Mark all objects we said not to keep as something
# we should discard.
stmt = """
INSERT INTO pack_state (tid, zoid)
SELECT tid, zoid
Expand All @@ -533,8 +568,27 @@ def pre_pack(self, pack_tid, get_references):
self.runner.run_script_stmt(
cursor, stmt, {'pack_tid': pack_tid})
to_remove += cursor.rowcount
else:
# Support for IExternalGC. Also remove deleted objects.
stmt = """
INSERT INTO pack_state (tid, zoid)
SELECT t.tid, t.zoid
FROM (
SELECT zoid, tid
FROM object_state
WHERE state IS NULL
AND tid = (
SELECT MAX(i.tid)
FROM object_state i
WHERE i.zoid = object_state.zoid
)
) t
"""
self.runner.run_script_stmt(cursor, stmt)
to_remove += cursor.rowcount

# Pack object states with the keep flag set to true.
# Pack object states with the keep flag set to true,
# excluding their current TID.
stmt = """
INSERT INTO pack_state (tid, zoid)
SELECT tid, zoid
Expand All @@ -550,6 +604,7 @@ def pre_pack(self, pack_tid, get_references):
cursor, stmt, {'pack_tid': pack_tid})
to_remove += cursor.rowcount

# Make a simple summary of the transactions to examine.
log.info("pre_pack: enumerating transactions to pack")
stmt = "%(TRUNCATE)s pack_state_tid"
self.runner.run_script_stmt(cursor, stmt)
Expand All @@ -572,6 +627,13 @@ def pre_pack(self, pack_tid, get_references):
self.connmanager.close(conn, cursor)

def __initial_populate_pack_object(self, conn, cursor, pack_tid, keep):
"""
Put all objects into ``pack_object`` that have revisions equal
to or below *pack_tid*, setting their initial ``keep`` status
to *keep*.
Commits the transaction to release locks.
"""
# Access the tables that are used by online transactions
# in a short transaction and immediately commit to release any
# locks.
Expand Down Expand Up @@ -600,14 +662,9 @@ def __initial_populate_pack_object(self, conn, cursor, pack_tid, keep):
# Since we switched MySQL back to READ COMMITTED (what PostgreSQL uses)
# I haven't been able to produce the error anymore. So don't explicitly lock.

# lock_affected_objects = affected_objects + '\n' + self._lock_for_update + ';\n'

# self.runner.run_script(cursor, lock_subquery, {'pack_tid': pack_tid})
# cursor.fetchall() # Consume but discard.

stmt = """
INSERT INTO pack_object (zoid, keep, keep_tid)
SELECT zoid, """ + keep + """, MAX(tid)
SELECT zoid, """ + ('%(TRUE)s' if keep else '%(FALSE)s') + """, MAX(tid)
FROM ( """ + affected_objects + """ ) t
GROUP BY zoid;
Expand All @@ -620,18 +677,20 @@ def __initial_populate_pack_object(self, conn, cursor, pack_tid, keep):
conn.commit()

def _pre_pack_without_gc(self, conn, cursor, pack_tid):
"""Determine what to pack, without garbage collection.
"""
Determine what to pack, without garbage collection.
With garbage collection disabled, there is no need to follow
object references.
"""
# Fill the pack_object table with OIDs, but configure them
# all to be kept by setting keep to true.
log.debug("pre_pack: populating pack_object")
self.__initial_populate_pack_object(conn, cursor, pack_tid, '%(TRUE)s')
self.__initial_populate_pack_object(conn, cursor, pack_tid, keep=True)

def _pre_pack_with_gc(self, conn, cursor, pack_tid, get_references):
"""Determine what to pack, with garbage collection.
"""
Determine what to pack, with garbage collection.
"""
stmt = self._script_create_temp_pack_visit
if stmt:
Expand All @@ -643,7 +702,7 @@ def _pre_pack_with_gc(self, conn, cursor, pack_tid, get_references):
# Fill the pack_object table with OIDs that either will be
# removed (if nothing references the OID) or whose history will
# be cut.
self.__initial_populate_pack_object(conn, cursor, pack_tid, '%(FALSE)s')
self.__initial_populate_pack_object(conn, cursor, pack_tid, keep=False)

stmt = """
-- Keep objects that have been revised since pack_tid.
Expand Down Expand Up @@ -707,6 +766,13 @@ def pack(self, pack_tid, packed_func=None):
conn, cursor = self.connmanager.open_for_store()
try: # pylint:disable=too-many-nested-blocks
try:
# If we have a transaction entry in ``pack_state_tid`` (that is,
# we found a transaction with an object in the range of transactions
# we can pack away) that matches an actual transaction entry (XXX:
# How could we be in the state where the transaction row is gone but we still
# have object_state with that transaction id?), then we need to pack that
# transaction. The presence of an entry in ``pack_state_tid`` means that all
# object states from that transaction should be removed.
stmt = """
SELECT transaction.tid,
CASE WHEN packed = %(TRUE)s THEN 1 ELSE 0 END,
Expand Down Expand Up @@ -781,7 +847,12 @@ def pack(self, pack_tid, packed_func=None):

def _pack_transaction(self, cursor, pack_tid, tid, packed,
has_removable, packed_list):
"""Pack one transaction. Requires populated pack tables."""
"""
Pack one transaction. Requires populated pack tables.
If *has_removable* is true, then we have object states and current
object pointers to remove.
"""
log.debug("pack: transaction %d: packing", tid)
removed_objects = 0
removed_states = 0
Expand Down Expand Up @@ -865,6 +936,15 @@ def _pack_cleanup(self, conn, cursor):
stmt = '%(TRUNCATE)s ' + _table
self.runner.run_script_stmt(cursor, stmt)

_delete_object_stmt = """
UPDATE object_state
SET state = NULL,
state_size = 0,
md5 = ''
WHERE zoid = %(oid)s
and tid = %(tid)s
"""


@implementer(IPackUndo)
class HistoryFreePackUndo(PackUndo):
Expand Down Expand Up @@ -1207,18 +1287,9 @@ def _pack_cleanup(self, conn, cursor):
"""
self.runner.run_script(cursor, stmt)

def deleteObject(self, cursor, oid, oldserial):
# The only things to worry about are object_state and blob_chuck.
# blob chunks are deleted automatically by a foreign key.

# We shouldn't *have* to verify the oldserial in the delete statement,
# because our only consumer is zc.zodbdgc which only calls us for
# unreachable objects, so they shouldn't be modified and get a new
# TID. But this is safer.
state = """
DELETE FROM object_state
WHERE zoid = %(oid)s
and tid = %(tid)s
"""
self.runner.run_script_stmt(cursor, state, {'oid': u64(oid), 'tid': u64(oldserial)})
return cursor.rowcount

_delete_object_stmt = """
DELETE FROM object_state
WHERE zoid = %(oid)s
and tid = %(tid)s
"""
16 changes: 12 additions & 4 deletions src/relstorage/adapters/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,10 +377,14 @@ def _create_pack_object(self, cursor):
@noop_when_history_free
def _create_pack_state(self, cursor):
"""
Temporary state during packing: the list of object states
# to pack.
Temporary state populated during pre-packing.
This is only used in history-preserving databases.
This table is poorly named. What it actually holds is the set
of objects, along with their maximum TID, that are potentially
eligible to be discarded because their most recent change
(maximum TID) is earlier than the pack time.
"""
self.runner.run_script(cursor, self.CREATE_PACK_STATE_TMPL)

Expand All @@ -393,10 +397,14 @@ def _create_pack_state(self, cursor):
@noop_when_history_free
def _create_pack_state_tid(self, cursor):
"""
Temporary state during packing: the list of
transactions that have at least one object state to pack.
Temporary state during pre-packing:
This is only used in history-preserving databases.
This table is poorly named. What it actually holds is simply a
summary of the distinct transaction IDs found in
``pack_state``. In other words, it's the list of transaction
IDs that are eligible to be discarded.
"""
self.runner.run_script(cursor, self.CREATE_PACK_STATE_TID_TMPL)

Expand Down

0 comments on commit f27a91d

Please sign in to comment.