Skip to content

Commit

Permalink
selectinload omit join
Browse files Browse the repository at this point in the history
The "selectin" loader strategy now omits the JOIN in the case of a
simple one-to-many load, where it instead relies upon the foreign key
columns of the related table in order to match up to primary keys in
the parent table.   This optimization can be disabled by setting
the :paramref:`.relationship.omit_join` flag to False.
Many thanks to Jayson Reis for the efforts on this.

As part of this change, horizontal shard no longer relies upon
the _mapper_zero() method to get the query-bound mapper, instead
using the more generalized _bind_mapper() (which will use mapper_zero
if no explicit FROM is present).  A short check for the particular
recursive condition is added to BundleEntity and it no longer assigns
itself as the "namespace" to its ColumnEntity objects which creates
a reference cycle.

Co-authored-by: Mike Bayer <mike_mp@zzzcomputing.com>
Fixes: #4340
Change-Id: I649587e1c07b684ecd63f7d10054cd165891baf4
Pull-request: https://bitbucket.org/zzzeek/sqlalchemy/pull-requests/7
  • Loading branch information
jaysonsantos and zzzeek committed Oct 10, 2018
1 parent bb65193 commit 21fbb5e
Show file tree
Hide file tree
Showing 11 changed files with 322 additions and 138 deletions.
64 changes: 64 additions & 0 deletions doc/build/changelog/migration_13.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,70 @@ potentially backwards-incompatible changes in behavior.
New Features and Improvements - ORM
===================================

.. _change_4340:

selectin loading no longer uses JOIN for simple one-to-many
------------------------------------------------------------

The "selectin" loading feature added in 1.2 introduced an extremely
performant new way to eagerly load collections, in many cases much faster
than that of "subquery" eager loading, as it does not rely upon restating
the original SELECT query and instead uses a simple IN clause. However,
the "selectin" load still relied upon rendering a JOIN between the
parent and related tables, since it needs the parent primary key values
in the row in order to match rows up. In 1.3, a new optimization
is added which will omit this JOIN in the most common case of a simple
one-to-many load, where the related row already contains the primary key
of the parent row expressed in its foreign key columns. This again provides
for a dramatic performance improvement as the ORM now can load large numbers
of collections all in one query without using JOIN or subqueries at all.

Given a mapping::

class A(Base):
__tablename__ = 'a'

id = Column(Integer, primary_key=True)
bs = relationship("B", lazy="selectin")


class B(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
a_id = Column(ForeignKey("a.id"))

In the 1.2 version of "selectin" loading, a load of A to B looks like:

.. sourcecode:: sql

SELECT a.id AS a_id FROM a
SELECT a_1.id AS a_1_id, b.id AS b_id, b.a_id AS b_a_id
FROM a AS a_1 JOIN b ON a_1.id = b.a_id
WHERE a_1.id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ORDER BY a_1.id
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

With the new behavior, the load looks like:

.. sourcecode:: sql


SELECT a.id AS a_id FROM a
SELECT b.a_id AS b_a_id, b.id AS b_id FROM b
WHERE b.a_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) ORDER BY b.a_id
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

The behavior is being released as automatic, using a similar heuristic that
lazy loading uses in order to determine if related entities can be fetched
directly from the identity map. However, as with most querying features,
the feature's implementation became more complex as a result of advanced
scenarios regarding polymorphic loading. If problems are encountered,
users should report a bug, however the change also incldues a flag
:paramref:`.relationship.omit_join` which can be set to False on the
:func:`.relationship` to disable the optimization.


:ticket:`4340`

.. _change_4257:

info dictionary added to InstanceState
Expand Down
14 changes: 14 additions & 0 deletions doc/build/changelog/unreleased_13/4340.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. change::
:tags: feature, orm
:tickets: 4340

The "selectin" loader strategy now omits the JOIN in the case of a simple
one-to-many load, where it instead relies loads only from the related
table, relying upon the foreign key columns of the related table in order
to match up to primary keys in the parent table. This optimization can be
disabled by setting the :paramref:`.relationship.omit_join` flag to False.
Many thanks to Jayson Reis for the efforts on this.

.. seealso::

:ref:`change_4340`
51 changes: 44 additions & 7 deletions doc/build/orm/loading_relationships.rst
Original file line number Diff line number Diff line change
Expand Up @@ -692,6 +692,13 @@ eager loading, is compatible with batching of results using
:meth:`.Query.yield_per`, provided the database driver supports simultaneous
cursors.

Overall, especially as of the 1.3 series of SQLAlchemy, selectin loading
is the most simple and efficient way to eagerly load collections of objects
in most cases. The only scenario in which selectin eager loading is not feasible
is when the model is using composite primary keys, and the backend database
does not support tuples with IN, which includes SQLite, Oracle and
SQL Server.

.. versionadded:: 1.2

"Select IN" eager loading is provided using the ``"selectin"`` argument
Expand All @@ -714,20 +721,49 @@ in order to load related associations:
WHERE users.name = ? OR users.name = ?
('jack', 'ed')
SELECT
users_1.id AS users_1_id,
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id
FROM users AS users_1
JOIN addresses ON users_1.id = addresses.user_id
WHERE users_1.id IN (?, ?)
ORDER BY users_1.id, addresses.id
FROM addresses
WHERE addresses.user_id IN (?, ?)
ORDER BY addresses.user_id, addresses.id
(5, 7)

Above, the second SELECT refers to ``users_1.id IN (5, 7)``, where the
Above, the second SELECT refers to ``addresses.user_id IN (5, 7)``, where the
"5" and "7" are the primary key values for the previous two ``User``
objects loaded; after a batch of objects are completely loaded, their primary
key values are injected into the ``IN`` clause for the second SELECT.
Because the relatonship between ``User`` and ``Address`` provides that the
primary key values for ``User`` can be derived from ``Address.user_id``, the
statement has no joins or subqueries at all.

.. versionchanged:: 1.3 selectin loading can omit the JOIN for a simple
one-to-many collection.

In the case where the primary key of the parent object isn't present in
the related row, "selectin" loading will also JOIN to the parent table so that
the parent primary key values are present:

.. sourcecode:: python+sql

>>> session.query(Address).\
... options(selectinload('user')).all()
{opensql}SELECT
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id
FROM addresses
SELECT
addresses_1.id AS addresses_1_id,
users.id AS users_id,
users.name AS users_name,
users.fullname AS users_fullname,
users.password AS users_password
FROM addresses AS addresses_1
JOIN users ON users.id = addresses_1.user_id
WHERE addresses_1.id IN (?, ?)
ORDER BY addresses_1.id
(1, 2)

"Select IN" loading is the newest form of eager loading added to SQLAlchemy
as of the 1.2 series. Things to know about this kind of loading include:
Expand All @@ -746,7 +782,8 @@ as of the 1.2 series. Things to know about this kind of loading include:
* "selectin" loading, unlike joined or subquery eager loading, always emits
its SELECT in terms of the immediate parent objects just loaded, and
not the original type of object at the top of the chain. So if eager loading
many levels deep, "selectin" loading still uses exactly one JOIN in the statement.
many levels deep, "selectin" loading still uses no more than one JOIN,
and usually no JOINs, in the statement. In comparison,
joined and subquery eager loading always refer to multiple JOINs up to
the original parent.

Expand Down
46 changes: 20 additions & 26 deletions doc/build/orm/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1643,23 +1643,22 @@ which involves custom criterion. All three are usually invoked via functions k
as :term:`query options` which give additional instructions to the :class:`.Query` on how
we would like various attributes to be loaded, via the :meth:`.Query.options` method.

Subquery Load
Selectin Load
-------------

In this case we'd like to indicate that ``User.addresses`` should load eagerly.
A good choice for loading a set of objects as well as their related collections
is the :func:`.orm.subqueryload` option, which emits a second SELECT statement
is the :func:`.orm.selectinload` option, which emits a second SELECT statement
that fully loads the collections associated with the results just loaded.
The name "subquery" originates from the fact that the SELECT statement
constructed directly via the :class:`.Query` is re-used, embedded as a subquery
into a SELECT against the related table. This is a little elaborate but
very easy to use:
The name "selectin" originates from the fact that the SELECT statement
uses an IN clause in order to locate related rows for multiple objects
at once:

.. sourcecode:: python+sql

>>> from sqlalchemy.orm import subqueryload
>>> from sqlalchemy.orm import selectinload
{sql}>>> jack = session.query(User).\
... options(subqueryload(User.addresses)).\
... options(selectinload(User.addresses)).\
... filter_by(name='jack').one()
SELECT users.id AS users_id,
users.name AS users_name,
Expand All @@ -1668,27 +1667,19 @@ very easy to use:
FROM users
WHERE users.name = ?
('jack',)
SELECT addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address,
addresses.user_id AS addresses_user_id,
anon_1.users_id AS anon_1_users_id
FROM (SELECT users.id AS users_id
FROM users WHERE users.name = ?) AS anon_1
JOIN addresses ON anon_1.users_id = addresses.user_id
ORDER BY anon_1.users_id, addresses.id
('jack',)
SELECT addresses.user_id AS addresses_user_id,
addresses.id AS addresses_id,
addresses.email_address AS addresses_email_address
FROM addresses
WHERE addresses.user_id IN (?)
ORDER BY addresses.user_id, addresses.id
(5,)
{stop}>>> jack
<User(name='jack', fullname='Jack Bean', password='gjffdd')>

>>> jack.addresses
[<Address(email_address='jack@google.com')>, <Address(email_address='j25@yahoo.com')>]

.. note::

:func:`.subqueryload` when used in conjunction with limiting such as
:meth:`.Query.first`, :meth:`.Query.limit` or :meth:`.Query.offset`
should also include :meth:`.Query.order_by` on a unique column in order to
ensure correct results. See :ref:`subqueryload_ordering`.

Joined Load
-----------
Expand Down Expand Up @@ -1731,11 +1722,14 @@ one instance of ``User`` back. This is because :class:`.Query` applies a "uniqu
strategy, based on object identity, to the returned entities. This is specifically
so that joined eager loading can be applied without affecting the query results.

While :func:`.joinedload` has been around for a long time, :func:`.subqueryload`
is a newer form of eager loading. :func:`.subqueryload` tends to be more appropriate
While :func:`.joinedload` has been around for a long time, :func:`.selectinload`
is a newer form of eager loading. :func:`.selectinload` tends to be more appropriate
for loading related collections while :func:`.joinedload` tends to be better suited
for many-to-one relationships, due to the fact that only one row is loaded
for both the lead and the related object.
for both the lead and the related object. Another form of loading,
:func:`.subqueryload`, also exists, which can be used in place of
:func:`.selectinload` when making use of composite primary keys on certain
backends.

.. topic:: ``joinedload()`` is not a replacement for ``join()``

Expand Down
2 changes: 1 addition & 1 deletion lib/sqlalchemy/ext/horizontal_shard.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def _execute_and_instances(self, context):
def iter_for_shard(shard_id):
context.attributes['shard_id'] = context.identity_token = shard_id
result = self._connection_from_session(
mapper=self._mapper_zero(),
mapper=self._bind_mapper(),
shard_id=shard_id).execute(
context.statement,
self._params)
Expand Down
8 changes: 6 additions & 2 deletions lib/sqlalchemy/orm/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -4020,13 +4020,17 @@ def __init__(self, query, bundle, setup_entities=True):
if isinstance(expr, Bundle):
_BundleEntity(self, expr)
else:
_ColumnEntity(self, expr, namespace=self)
_ColumnEntity(self, expr)

self.supports_single_entity = self.bundle.single_entity

@property
def mapper(self):
return self.entity_zero.mapper
ezero = self.entity_zero
if ezero is not None:
return ezero.mapper
else:
return None

@property
def entities(self):
Expand Down
11 changes: 10 additions & 1 deletion lib/sqlalchemy/orm/relationships.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ def __init__(self, argument,
bake_queries=True,
_local_remote_pairs=None,
query_class=None,
info=None):
info=None,
omit_join=None):
"""Provide a relationship between two mapped classes.
This corresponds to a parent-child or associative table relationship.
Expand Down Expand Up @@ -816,6 +817,13 @@ class Parent(Base):
the full set of related objects, to prevent modifications of the
collection from resulting in persistence operations.
:param omit_join:
Allows manual control over the "selectin" automatic join
optimization. Set to ``False`` to disable the "omit join" feature
added in SQLAlchemy 1.3.
.. versionadded:: 1.3
"""
super(RelationshipProperty, self).__init__()
Expand Down Expand Up @@ -843,6 +851,7 @@ class Parent(Base):
self.doc = doc
self.active_history = active_history
self.join_depth = join_depth
self.omit_join = omit_join
self.local_remote_pairs = _local_remote_pairs
self.extension = extension
self.bake_queries = bake_queries
Expand Down

0 comments on commit 21fbb5e

Please sign in to comment.