selectinload omit join

The "selectin" loader strategy now omits the JOIN in the case of a simple one-to-many load, where it instead relies upon the foreign key columns of the related table in order to match up to primary keys in the parent table. This optimization can be disabled by setting the :paramref:`.relationship.omit_join` flag to False. Many thanks to Jayson Reis for the efforts on this. As part of this change, horizontal shard no longer relies upon the _mapper_zero() method to get the query-bound mapper, instead using the more generalized _bind_mapper() (which will use mapper_zero if no explicit FROM is present). A short check for the particular recursive condition is added to BundleEntity and it no longer assigns itself as the "namespace" to its ColumnEntity objects which creates a reference cycle. Co-authored-by: Mike Bayer <mike_mp@zzzcomputing.com> Fixes: #4340 Change-Id: I649587e1c07b684ecd63f7d10054cd165891baf4 Pull-request: https://bitbucket.org/zzzeek/sqlalchemy/pull-requests/7
zzzeek · Oct 10, 2018 · 21fbb5e · 21fbb5e
1 parent bb65193
commit 21fbb5e
Show file tree

Hide file tree

Showing 11 changed files with 322 additions and 138 deletions.
diff --git a/doc/build/changelog/migration_13.rst b/doc/build/changelog/migration_13.rst
@@ -20,6 +20,70 @@ potentially backwards-incompatible changes in behavior.
 New Features and Improvements - ORM
 ===================================
 
+.. _change_4340:
+
+selectin loading no longer uses JOIN for simple one-to-many
+------------------------------------------------------------
+
+The "selectin" loading feature added in 1.2 introduced an extremely
+performant new way to eagerly load collections, in many cases much faster
+than that of "subquery" eager loading, as it does not rely upon restating
+the original SELECT query and instead uses a simple IN clause.  However,
+the "selectin" load still relied upon rendering a JOIN between the
+parent and related tables, since it needs the parent primary key values
+in the row in order to match rows up.     In 1.3, a new optimization
+is added which will omit this JOIN in the most common case of a simple
+one-to-many load, where the related row already contains the primary key
+of the parent row expressed in its foreign key columns.   This again provides
+for a dramatic performance improvement as the ORM now can load large numbers
+of collections all in one query without using JOIN or subqueries at all.
+
+Given a mapping::
+
+    class A(Base):
+        __tablename__ = 'a'
+
+        id = Column(Integer, primary_key=True)
+        bs = relationship("B", lazy="selectin")
+
+
+    class B(Base):
+        __tablename__ = 'b'
+        id = Column(Integer, primary_key=True)
+        a_id = Column(ForeignKey("a.id"))
+
+In the 1.2 version of "selectin" loading, a load of A to B looks like:
+
+.. sourcecode:: sql
+
+    SELECT a.id AS a_id FROM a
+    SELECT a_1.id AS a_1_id, b.id AS b_id, b.a_id AS b_a_id
+    FROM a AS a_1 JOIN b ON a_1.id = b.a_id
+    WHERE a_1.id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ORDER BY a_1.id
+    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+With the new behavior, the load looks like:
+
+.. sourcecode:: sql
+
+
+    SELECT a.id AS a_id FROM a
+    SELECT b.a_id AS b_a_id, b.id AS b_id FROM b
+    WHERE b.a_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ORDER BY b.a_id
+    (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
+
+The behavior is being released as automatic, using a similar heuristic that
+lazy loading uses in order to determine if related entities can be fetched
+directly from the identity map.   However, as with most querying features,
+the feature's implementation became more complex as a result of advanced
+scenarios regarding polymorphic loading.   If problems are encountered,
+users should report a bug, however the change also incldues a flag
+:paramref:`.relationship.omit_join` which can be set to False on the
+:func:`.relationship` to disable the optimization.
+
+
+:ticket:`4340`
+
 .. _change_4257:
 
 info dictionary added to InstanceState

diff --git a/doc/build/changelog/unreleased_13/4340.rst b/doc/build/changelog/unreleased_13/4340.rst
@@ -0,0 +1,14 @@
+.. change::
+    :tags: feature, orm
+    :tickets: 4340
+
+    The "selectin" loader strategy now omits the JOIN in the case of a simple
+    one-to-many load, where it instead relies loads only from the related
+    table, relying upon the foreign key columns of the related table in order
+    to match up to primary keys in the parent table.   This optimization can be
+    disabled by setting the :paramref:`.relationship.omit_join` flag to False.
+    Many thanks to Jayson Reis for the efforts on this.
+
+    .. seealso::
+
+        :ref:`change_4340`
diff --git a/doc/build/orm/loading_relationships.rst b/doc/build/orm/loading_relationships.rst
@@ -692,6 +692,13 @@ eager loading, is compatible with batching of results using
 :meth:`.Query.yield_per`, provided the database driver supports simultaneous
 cursors.
 
+Overall, especially as of the 1.3 series of SQLAlchemy, selectin loading
+is the most simple and efficient way to eagerly load collections of objects
+in most cases.  The only scenario in which selectin eager loading is not feasible
+is when the model is using composite primary keys, and the backend database
+does not support tuples with IN, which includes SQLite, Oracle and
+SQL Server.
+
 .. versionadded:: 1.2
 
 "Select IN" eager loading is provided using the ``"selectin"`` argument
@@ -714,20 +721,49 @@ in order to load related associations:
     WHERE users.name = ? OR users.name = ?
     ('jack', 'ed')
     SELECT
-        users_1.id AS users_1_id,
         addresses.id AS addresses_id,
         addresses.email_address AS addresses_email_address,
         addresses.user_id AS addresses_user_id
-    FROM users AS users_1
-    JOIN addresses ON users_1.id = addresses.user_id
-    WHERE users_1.id IN (?, ?)
-    ORDER BY users_1.id, addresses.id
+    FROM addresses
+    WHERE addresses.user_id IN (?, ?)
+    ORDER BY addresses.user_id, addresses.id
     (5, 7)
 
-Above, the second SELECT refers to ``users_1.id IN (5, 7)``, where the
+Above, the second SELECT refers to ``addresses.user_id IN (5, 7)``, where the
 "5" and "7" are the primary key values for the previous two ``User``
 objects loaded; after a batch of objects are completely loaded, their primary
 key values are injected into the ``IN`` clause for the second SELECT.
+Because the relatonship between ``User`` and ``Address`` provides that the
+primary key values for ``User`` can be derived from ``Address.user_id``, the
+statement has no joins or subqueries at all.
+
+.. versionchanged:: 1.3 selectin loading can omit the JOIN for a simple
+   one-to-many collection.
+
+In the case where the primary key of the parent object isn't present in
+the related row, "selectin" loading will also JOIN to the parent table so that
+the parent primary key values are present:
+
+.. sourcecode:: python+sql
+
+    >>> session.query(Address).\
+    ... options(selectinload('user')).all()
+    {opensql}SELECT
+        addresses.id AS addresses_id,
+        addresses.email_address AS addresses_email_address,
+        addresses.user_id AS addresses_user_id
+        FROM addresses
+    SELECT
+        addresses_1.id AS addresses_1_id,
+        users.id AS users_id,
+        users.name AS users_name,
+        users.fullname AS users_fullname,
+        users.password AS users_password
+    FROM addresses AS addresses_1
+    JOIN users ON users.id = addresses_1.user_id
+    WHERE addresses_1.id IN (?, ?)
+    ORDER BY addresses_1.id
+    (1, 2)
 
 "Select IN" loading is the newest form of eager loading added to SQLAlchemy
 as of the 1.2 series.   Things to know about this kind of loading include:
@@ -746,7 +782,8 @@ as of the 1.2 series.   Things to know about this kind of loading include:
 * "selectin" loading, unlike joined or subquery eager loading, always emits
   its SELECT in terms of the immediate parent objects just loaded, and
   not the original type of object at the top of the chain.  So if eager loading
-  many levels deep, "selectin" loading still uses exactly one JOIN in the statement.
+  many levels deep, "selectin" loading still uses no more than one JOIN,
+  and usually no JOINs, in the statement.   In comparison,
   joined and subquery eager loading always refer to multiple JOINs up to
   the original parent.
 

diff --git a/doc/build/orm/tutorial.rst b/doc/build/orm/tutorial.rst
@@ -1643,23 +1643,22 @@ which involves custom criterion.   All three are usually invoked via functions k
 as :term:`query options` which give additional instructions to the :class:`.Query` on how
 we would like various attributes to be loaded, via the :meth:`.Query.options` method.
 
-Subquery Load
+Selectin Load
 -------------
 
 In this case we'd like to indicate that ``User.addresses`` should load eagerly.
 A good choice for loading a set of objects as well as their related collections
-is the :func:`.orm.subqueryload` option, which emits a second SELECT statement
+is the :func:`.orm.selectinload` option, which emits a second SELECT statement
 that fully loads the collections associated with the results just loaded.
-The name "subquery" originates from the fact that the SELECT statement
-constructed directly via the :class:`.Query` is re-used, embedded as a subquery
-into a SELECT against the related table.   This is a little elaborate but
-very easy to use:
+The name "selectin" originates from the fact that the SELECT statement
+uses an IN clause in order to locate related rows for multiple objects
+at once:
 
 .. sourcecode:: python+sql
 
-    >>> from sqlalchemy.orm import subqueryload
+    >>> from sqlalchemy.orm import selectinload
     {sql}>>> jack = session.query(User).\
-    ...                 options(subqueryload(User.addresses)).\
+    ...                 options(selectinload(User.addresses)).\
     ...                 filter_by(name='jack').one()
     SELECT users.id AS users_id,
             users.name AS users_name,
@@ -1668,27 +1667,19 @@ very easy to use:
     FROM users
     WHERE users.name = ?
     ('jack',)
-    SELECT addresses.id AS addresses_id,
-            addresses.email_address AS addresses_email_address,
-            addresses.user_id AS addresses_user_id,
-            anon_1.users_id AS anon_1_users_id
-    FROM (SELECT users.id AS users_id
-        FROM users WHERE users.name = ?) AS anon_1
-    JOIN addresses ON anon_1.users_id = addresses.user_id
-    ORDER BY anon_1.users_id, addresses.id
-    ('jack',)
+    SELECT addresses.user_id AS addresses_user_id,
+            addresses.id AS addresses_id,
+            addresses.email_address AS addresses_email_address
+    FROM addresses
+    WHERE addresses.user_id IN (?)
+    ORDER BY addresses.user_id, addresses.id
+    (5,)
     {stop}>>> jack
     <User(name='jack', fullname='Jack Bean', password='gjffdd')>
 
     >>> jack.addresses
     [<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]
 
-.. note::
-
-   :func:`.subqueryload` when used in conjunction with limiting such as
-   :meth:`.Query.first`, :meth:`.Query.limit` or :meth:`.Query.offset`
-   should also include :meth:`.Query.order_by` on a unique column in order to
-   ensure correct results.  See :ref:`subqueryload_ordering`.
 
 Joined Load
 -----------
@@ -1731,11 +1722,14 @@ one instance of ``User`` back.  This is because :class:`.Query` applies a "uniqu
 strategy, based on object identity, to the returned entities.  This is specifically
 so that joined eager loading can be applied without affecting the query results.
 
-While :func:`.joinedload` has been around for a long time, :func:`.subqueryload`
-is a newer form of eager loading.   :func:`.subqueryload` tends to be more appropriate
+While :func:`.joinedload` has been around for a long time, :func:`.selectinload`
+is a newer form of eager loading.   :func:`.selectinload` tends to be more appropriate
 for loading related collections while :func:`.joinedload` tends to be better suited
 for many-to-one relationships, due to the fact that only one row is loaded
-for both the lead and the related object.
+for both the lead and the related object.   Another form of loading,
+:func:`.subqueryload`, also exists, which can be used in place of
+:func:`.selectinload` when making use of composite primary keys on certain
+backends.
 
 .. topic:: ``joinedload()`` is not a replacement for ``join()``
 

diff --git a/lib/sqlalchemy/ext/horizontal_shard.py b/lib/sqlalchemy/ext/horizontal_shard.py
@@ -45,7 +45,7 @@ def _execute_and_instances(self, context):
         def iter_for_shard(shard_id):
             context.attributes['shard_id'] = context.identity_token = shard_id
             result = self._connection_from_session(
-                mapper=self._mapper_zero(),
+                mapper=self._bind_mapper(),
                 shard_id=shard_id).execute(
                 context.statement,
                 self._params)

diff --git a/lib/sqlalchemy/orm/query.py b/lib/sqlalchemy/orm/query.py
@@ -4020,13 +4020,17 @@ def __init__(self, query, bundle, setup_entities=True):
                 if isinstance(expr, Bundle):
                     _BundleEntity(self, expr)
                 else:
-                    _ColumnEntity(self, expr, namespace=self)
+                    _ColumnEntity(self, expr)
 
         self.supports_single_entity = self.bundle.single_entity
 
     @property
     def mapper(self):
-        return self.entity_zero.mapper
+        ezero = self.entity_zero
+        if ezero is not None:
+            return ezero.mapper
+        else:
+            return None
 
     @property
     def entities(self):

diff --git a/lib/sqlalchemy/orm/relationships.py b/lib/sqlalchemy/orm/relationships.py
@@ -116,7 +116,8 @@ def __init__(self, argument,
                  bake_queries=True,
                  _local_remote_pairs=None,
                  query_class=None,
-                 info=None):
+                 info=None,
+                 omit_join=None):
         """Provide a relationship between two mapped classes.
 
         This corresponds to a parent-child or associative table relationship.
@@ -816,6 +817,13 @@ class Parent(Base):
           the full set of related objects, to prevent modifications of the
           collection from resulting in persistence operations.
 
+        :param omit_join:
+          Allows manual control over the "selectin" automatic join
+          optimization.  Set to ``False`` to disable the "omit join" feature
+          added in SQLAlchemy 1.3.
+
+          .. versionadded:: 1.3
+
 
         """
         super(RelationshipProperty, self).__init__()
@@ -843,6 +851,7 @@ class Parent(Base):
         self.doc = doc
         self.active_history = active_history
         self.join_depth = join_depth
+        self.omit_join = omit_join
         self.local_remote_pairs = _local_remote_pairs
         self.extension = extension
         self.bake_queries = bake_queries