Skip to content

Commit

Permalink
Merge 9d76c53 into cb8af9c
Browse files Browse the repository at this point in the history
  • Loading branch information
jamadden committed Jun 8, 2021
2 parents cb8af9c + 9d76c53 commit 4a0d886
Show file tree
Hide file tree
Showing 14 changed files with 292 additions and 101 deletions.
23 changes: 23 additions & 0 deletions CHANGES.rst
Expand Up @@ -7,6 +7,29 @@

- Stop closing RDBMS connections when ``tpc_vote`` raises a
semi-expected ``TransientError`` such as a ``ConflictError``.
- PostgreSQL: Now uses advisory locks instead of row-level locks
during the commit process. This benchmarks substantially faster and
reduces the potential for table bloat.

For environments that process many large, concurrent transactions,
or deploy many RelStorage instances to the same database server, it
might be necessary to increase the PostgreSQL configuration value
``max_locks_per_transaction.`` The default value of 64 is multiplied
by the default value of ``max_connections`` (100) to allow for 6,400
total objects to be locked across the entire database server. See
`the PostgreSQL documentation
<https://www.postgresql.org/docs/13/runtime-config-locks.html>`_ for
more information.

.. caution:: Be careful deploying this version while older versions
are executing. There could be a small window of time
where the locking strategies are different, leading to
database corruption.

.. note:: Deploying multiple RelStorage instances to separate
schemas in the same PostgreSQL database (e.g., the default
of "public" plus another) has never been supported. It is
even less supported now.


3.5.0a3 (2021-05-26)
Expand Down
1 change: 1 addition & 0 deletions docs/internals.rst
Expand Up @@ -50,6 +50,7 @@ Internal Details
relstorage.adapters.postgresql.schema
relstorage.adapters.postgresql.stats
relstorage.adapters.postgresql.txncontrol
relstorage.adapters.postgresql.util
relstorage.adapters.replica
relstorage.adapters.schema
relstorage.adapters.scriptrunner
Expand Down
6 changes: 3 additions & 3 deletions docs/postgresql/index.rst
Expand Up @@ -11,9 +11,9 @@

.. tip::

Using ZODB's ``readCurrent(ob)`` method will result in taking
shared locks (``SELECT FOR SHARE``) in PostgreSQL for the row
holding the data for *ob*.
Prior to version 3.5.0a4, using ZODB's ``readCurrent(ob)`` method
will result in taking shared locks (``SELECT FOR SHARE``) in
PostgreSQL for the row holding the data for *ob*.

This operation performs disk I/O, and consequently has an
associated cost. We recommend using this method judiciously.
Expand Down
144 changes: 132 additions & 12 deletions docs/postgresql/setup.rst
Expand Up @@ -4,6 +4,11 @@

.. highlight:: shell

.. important::

RelStorage can only be installed into a single schema within a
database. This is usually the default "public" schema. It may be
possible to use other schemas, but this is not supported or tested.

If you installed PostgreSQL from a binary package, you probably have a
user account named ``postgres``. Since PostgreSQL respects the name of
Expand Down Expand Up @@ -40,18 +45,133 @@ configuration file::
Configuration
=============

.. tip::
The default PostgreSQL server configuration will work fine for most
users. However, some configuration changes may yield increased performance.

Defaults and Background
-----------------------

This section is current for PostgreSQL 13 and earlier versions.

``max_connections`` (100) gives the number of worker processes that could
possibly be active at a time. Each worker consumes (at most)
``work_mem`` (4MB) + ``temp_mem`` (8MB) = 12MB (plus a tiny bit of
overhead).

``shared_buffers`` is the amount of memory that PostgreSQL will
allocate to keeping database data in memory. It is perhaps the single
most important tunable, larger values are better. If data is not in
this, then a worker will have to go to the operating system with an
I/O request (or two). The default is a measly 128MB.

``max_wal_size`` determines how often the data must be taken from the
write-ahead log and placed into the main tables. Reasons to keep this
small are (a) low amount of disk space; (b) reduced crash recovery
time; (c) if you're doing replication in the WAL-based way, keeping
online replicas more up-to-date.

``random_page_cost`` (4.0) is relative to ``seq_page_cost`` (1.0) and
tells how relatively expensive it is to do random I/O versus large
blocks of sequential I/O. This in turn influences whether the planner
will use an index or not. For solid-state drives, the
``random_page_cost`` should generally be lowered.


General
-------

Many PostgreSQL configuration defaults are conservative on modern
machines. Without knowing the resources available to any particular
installation, some general tips are listed below.

.. important:: Be sure you understand the consequences before changing
any settings. Some of those listed here may be risky,
depending on your level of risk tolerance.

* Increase ``temp_mem``. This prevents having to use disk tables for
temporary storage. RelStorage does a lot with temp tables. In my
benchmarks, I use 32MB.

* ``work_mem`` improves sorting and hashing, that sort of thing.
RelStorage doesn't do much of that *except* when you do a native GC,
and then it can make a big difference. Because this is a max that's
not allocated unless needed, it should be safe to increase it. In my
benchmarks, I leave this alone.

* Increase ``shared_buffers`` as much as you are able. When I
benchmark, on my 16GB laptop, I use 2GB. The rule of thumb for
dedicated servers is 25% of available RAM.

* If deploying on SSDs, then the cost of random page access can probably
be lowered some more. I know they're old SSDs, but the cost is
relative to sequential access, not absolute. This is probably not
important though, unless you're experiencing issues accessing blobs
(the only thing doing sequential scans).

* If you are not doing replication, setting ``wal_level = minimal``
will improve write speed and reduce disk usage. Similarly, setting
``wal_compression = on`` will reduce disk IO for writes (at a tiny
CPU cost). I benchmark with both those settings.

* If you're not doing replication and can stand some longer recovery
times, increasing ``max_wal_size`` (I use 10GB) has benefit for
heavy writes. Even if you are doing replication, increasing
``checkpoint_timeout`` (I use 30 minutes, up from 5),
``checkpoint_completion_target`` (I use 0.9, up from 0.5) and either
increasing or disabling ``checkpoint_flush_after`` (I disable, the
default is a skimpy 256KB) also help. This especially helps on
spinning rust, and for very "bursty" workloads.

* If our IO bandwidth is constrained, and you can't increase
``shared_buffers`` enough to compensate, disabling the background
writer can help too. ``bgwriter_lru_maxpages = 0`` and
``bgwriter_flush_after = 0``. I set these when I benchmark using
spinning rust.

* Setting ``synchronous_commit = off`` makes for faster turnaround
time on ``COMMIT`` calls. This is safe in the sense that it can
never corrupt the database in the event of a crash, but it might
leave the application *thinking* something was saved when it really
wasn't. Since the whole site will go down in the event of a database
crash anyway, you might consider setting this to off if you're
struggling with database performance. I benchmark with it off.


Large Sites
-----------

* For very large sites processing many large or concurrent
transactions, or deploying many RelStorage instances to a single
database server, it may be necessary to increase the value of
``max_locks_per_transaction`` beginning with RelStorage 3.5.0a4. The
default value (64) allows about 6,400 objects to be locked because
it is multiplied by the value of ``max_connections`` (which defaults
to 100). Large sites may have already increased this second value.

* For systems with very high write levels, setting
``wal_writer_flush_after = 10MB`` (or something higher than the
default of 1MB) and ``wal_writer_delay = 10s`` will improve write
speed without any appreciable safety loss (because your write volume
is so high already). I run write benchmarks this way.

* Likewise for high writes, I increase ``autovacuum_max_workers`` from
the default of 3 to 8 so they can keep up. Similarly, consider
lowering ``autovacuum_vacuum_scale_factor`` from its default of 20%
to 10% or even 1%. You might also raise
``autovacuum_vacuum_cost_limit`` from its default of 200 to 1000
or 2000.

For packing large databases, a larger value of the PostgreSQL
configuration paramater ``work_mem`` is likely to yield improved
performance. The default is 4MB; try 16MB if packing performance is
unacceptable.
Packing
-------

.. tip::
* For packing large databases, a larger value of the PostgreSQL
configuration paramater ``work_mem`` is likely to yield improved
performance. The default is 4MB; try 16MB if packing performance is
unacceptable.

For packing large databases, setting the ``pack_object``,
``object_ref`` and ``object_refs_added`` tables to `UNLOGGED
<https://www.postgresql.org/docs/12/sql-createtable.html#SQL-CREATETABLE-UNLOGGED>`_
can provide a performance boost (if replication doesn't matter and
you don't care about the contents of these tables). This can be
done after the schema is created with ``ALTER TABLE table SET UNLOGGED``.
* For packing large databases, setting the ``pack_object``,
``object_ref`` and ``object_refs_added`` tables to `UNLOGGED
<https://www.postgresql.org/docs/12/sql-createtable.html#SQL-CREATETABLE-UNLOGGED>`_
can provide a performance boost (if replication doesn't matter and
you don't care about the contents of these tables). This can be done
after the schema is created with ``ALTER TABLE table SET UNLOGGED``.
42 changes: 26 additions & 16 deletions src/relstorage/adapters/locker.py
Expand Up @@ -118,18 +118,14 @@ class will call this method when :meth:`hold_commit_lock` is
("""
SELECT zoid
FROM current_object
WHERE zoid IN (
SELECT zoid
FROM temp_store
)
INNER JOIN temp_store USING (zoid)
WHERE temp_store.prev_tid <> 0
""", 'current_object'),
("""
SELECT zoid
FROM object_state
WHERE zoid IN (
SELECT zoid
FROM temp_store
)
INNER JOIN temp_store USING (zoid)
WHERE temp_store.prev_tid <> 0
""", 'object_state'),
)

Expand Down Expand Up @@ -194,24 +190,38 @@ def lock_current_objects(self, cursor, read_current_oid_ints, shared_locks_block
# possibly * N
self._lock_rows_being_modified(cursor)

def _lock_readCurrent_oids_for_share(self, cursor, current_oids, shared_locks_block):
_, table = self._get_current_objects_query
oids_to_lock = sorted(set(current_oids))
batcher = self.make_batcher(cursor)

locking_suffix = ' %s ' % (
def _lock_suffix_for_readCurrent(self, shared_locks_block):
return ' %s ' % (
self._lock_share_clause
if shared_locks_block
else
self._lock_share_clause_nowait
)

def _lock_column_name_for_readCurrent(self, shared_locks_block):
# subclasses use the argument
# pylint:disable=unused-argument
return 'zoid'

def _lock_consume_rows_for_readCurrent(self, rows, shared_locks_block):
# subclasses use the argument
# pylint:disable=unused-argument
consume(rows)

def _lock_readCurrent_oids_for_share(self, cursor, current_oids, shared_locks_block):
_, table = self._get_current_objects_query
oids_to_lock = sorted(set(current_oids))
batcher = self.make_batcher(cursor)

locking_suffix = self._lock_suffix_for_readCurrent(shared_locks_block)
lock_column = self._lock_column_name_for_readCurrent(shared_locks_block)
try:
rows = batcher.select_from(
('zoid',), table,
(lock_column,), table,
suffix=locking_suffix,
**{'zoid': oids_to_lock}
)
consume(rows)
self._lock_consume_rows_for_readCurrent(rows, shared_locks_block)
except self.illegal_operation_exceptions: # pragma: no cover
# Bug in our code
raise
Expand Down
8 changes: 2 additions & 6 deletions src/relstorage/adapters/mysql/packundo.py
Expand Up @@ -20,12 +20,8 @@
from ..packundo import HistoryPreservingPackUndo
from ..schema import Schema

class _LockStmt(object):
# 8.0 supports 'FOR SHARE' but before that we have
# this.
_lock_for_share = 'LOCK IN SHARE MODE'

class MySQLHistoryPreservingPackUndo(_LockStmt, HistoryPreservingPackUndo):
class MySQLHistoryPreservingPackUndo(HistoryPreservingPackUndo):

# Previously we needed to work around a MySQL performance bug by
# avoiding an expensive subquery.
Expand Down Expand Up @@ -112,5 +108,5 @@ class MySQLHistoryPreservingPackUndo(_LockStmt, HistoryPreservingPackUndo):
).limit(1000)


class MySQLHistoryFreePackUndo(_LockStmt, HistoryFreePackUndo):
class MySQLHistoryFreePackUndo(HistoryFreePackUndo):
pass
7 changes: 1 addition & 6 deletions src/relstorage/adapters/packundo.py
Expand Up @@ -51,10 +51,6 @@ class PackUndo(DatabaseHelpersMixin):

_choose_pack_transaction_query = None


_lock_for_share = 'FOR SHARE'
_lock_for_update = 'FOR UPDATE'

driver = None
connmanager = None
runner = None
Expand Down Expand Up @@ -106,8 +102,7 @@ def with_options(self, options):
# (checkPackWhileReferringObjectChanges)
return self
result = self.__class__(self.driver, self.connmanager, self.runner, self.locker, options)
# Setting the MAX_TID is important for SQLite,
# as is the _lock_for_share.
# Setting the MAX_TID is important for SQLite.
# This should probably be handled directly in subclasses.
for k, v in vars(self).items():
if k != 'options' and getattr(result, k, None) is not v:
Expand Down
2 changes: 0 additions & 2 deletions src/relstorage/adapters/postgresql/adapter.py
Expand Up @@ -150,8 +150,6 @@ def _create(self):
locker=self.locker,
options=options,
)
# TODO: Subclass for this.
self.packundo._lock_for_share = 'FOR KEY SHARE OF object_state'
self.dbiter = HistoryFreeDatabaseIterator(
driver,
)
Expand Down

0 comments on commit 4a0d886

Please sign in to comment.