Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Can Connection.new_oid use its own storage instead of DB.storage? #139

Closed
jamadden opened this issue Jan 27, 2017 · 2 comments
Closed

Comments

@jamadden
Copy link
Member

Right now, Connection.new_oid is DB.new_oid. That is, in Connection.__init__ it does self.new_oid = db.new_oid. DB.new_oid in turn goes through its own storage instance's new_oid method.

For IMVCCStorages, especially native IMVCCStorages like RelStorage, this means that the new_oid is divorced from any particular Connection and its storage's view of the world.

Since OIDs are globally unique to the database, that's not necessarily automatically a bad thing.

However, in the case of RelStorage, this has two important consequences:

  • It forces the DB's storage to open and keep open a connection to the RDBMS. Moreover, it causes that connection to open---and keep open---a transaction (there's nothing to tell new_oid to commit). Long running transactions in general are not a good thing.

  • As a consequence of the former point, the implementation of new_oid has to be extremely careful not to interfere with any actual storage work. It can't take any locks of any kind. In MySQL, this means that the table used to allocate OIDs has to be MyISAM and not InnoDB. MyISAM tables are completely independent of the transaction system and bypass all kinds of journaling, etc, flushing IO immediately to disk. I suspect there's a potential performance issue here.

A smaller consequence is that even though we preallocate a small pool of OIDs in the master, they're not used effectively, and require global locking to access.

I could have new_oid just "know" that it's only called from DB and that it should autocommit its transaction---that involves extra round trips to the RDBMS server, though.

It seems cleaner to me to have Connection delegate new_oid right to its own storage. For non-IMVCCStorage instances, this changes nothing. For IMVCCStorage instances that need to coordinate with the "master" storage, they'd be responsible for handling that in cooperation with the new_instance method---the easy thing to do would be to copy the method, just like what Connection currently does. For RelStorage, it frees up the connection from the master storage and makes things more cleanly transactional.

Thoughts?

@jimfulton
Copy link
Member

+1

@jamadden
Copy link
Member Author

I used the following monkey-patch to implement this and run RelStorage's tests. They all pass (for mysql and postgresql). So hopefully it's not a big deal to add to ZODB proper.

from ZODB.Connection import Connection

if not hasattr(Connection, 'new_oid'):
    # Monkey patch to make new_oid use the connection's
    # storage. This has to be a data descriptor because
    # connection will try to reset this.

    class NewOid(object):

        __name__ = 'new_oid'

        def __get__(self, inst, klass=None):
            if klass is None:
                return self
            return inst._storage.new_oid

        def __set__(self, inst, value):
            pass

    Connection.new_oid = NewOid()

I didn't see a performance difference (concurrency=6) for either MySQL or PostgreSQL under stock zodbshootout with this. But that uses one storage per process. If I change to threads for a more real-world test, then I see a 5% improvement for postgres and 10-15% improvement for MySQL (that's before any tweaks that might be needed to use the more concurrent InnoDB engine for oid allocation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants