Added FULL OUTER JOIN through Join(full=True) #167

Stiivi · 2015-04-17T19:18:37Z

I've added support for FULL OUTER JOIN through join argument full (as suggested in this post)

FULL OUTER JOIN, besides other uses, is very useful in ETLs for doing transformations on top of table differences. Using two outer joins, as suggested in the above mentioned post, unnecessarily overcomplicates statements.

Stiivi · 2015-04-17T19:27:20Z

Short script to demonstrate it:

import sqlalchemy as sa

engine = sa.create_engine("postgresql://localhost/data")
md = sa.MetaData(engine)

left = sa.Table("left", md,
                sa.Column("id", sa.Integer))
right = sa.Table("right", md,
                sa.Column("id", sa.Integer))

md.create_all()

insert = left.insert()
for i in range(1, 6):
    engine.execute(insert.values([i]))

insert = right.insert()
for i in range(4, 10):
    engine.execute(insert.values([i]))


cond = left.c.id == right.c.id
select = sa.select([left.c.id, right.c.id],
                   from_obj=left.join(right, onclause=cond, full=True))

result = engine.execute(select)

for row in result:
    print row

Output:

(1, None)
(2, None)
(3, None)
(4, 4)
(5, 5)
(None, 6)
(None, 7)
(None, 8)
(None, 9)

zzzeek · 2015-04-17T21:13:38Z

lib/sqlalchemy/sql/compiler.py

take a look in some of the dialects, particularly mysql/base.py, which has its own routine here; need to upgrade that also.

zzzeek · 2015-04-17T21:14:35Z

it's good, will need tests in test/sql/test_compiler.py at least

seth-p · 2015-04-18T00:15:24Z

lib/sqlalchemy/sql/selectable.py

Shouldn't this say instead of LEFT OUTER JOIN rather than instead of JOIN?

With current implementation the full precedes isouter therefore if full == True then there will be FULL OUTER JOIN instead of any other kind of join, regardless of what the value of isouter is. The assumption here is: "full join" implies "outer join".

It would be more appropriate to have it a tri-state argument, however that would break backward compatibility. Another maybe correct way would be to set isouter to True everytime full is set to True, however that would unnecessarily increase complexity.

Right. My comment isn't about the logic, just the documentation of the full parameter. All I mean to say is that if full == False here, it will render LEFT OUTER JOIN (rather than just JOIN, as implied by the comment), since it's constructing a Join with isouter=True. So I would think the parameter documentation should be:

:param full: if True, render a FULL OUTER JOIN, instead of LEFT OUTER JOIN.

Makes sense, updated the docs.

minor documentation improvement; pass keyword argument properly

zzzeek · 2015-05-01T21:53:32Z

just a reminder that this still needs some test coverage somewhere.

Stiivi · 2015-05-21T21:42:27Z

Which critical tests are missing for this pull requests to be merged?

zzzeek · 2015-05-21T22:05:25Z

our core tests for "compiler" functions are unfortunately still pretty crufty in their format, nevertheless having something in test/sql/test_compiler.py would be the beginning of it. There's a ball-of-mud method right now called test_joins(), I'd leave that but for the moment probably add a test_full_outer_join() method below. Then there's some custom stuff for MySQL here, so that indicates some tests in test/dialect/mysql/test_compiler.py as well (also an older crufty suite at the moment), I'd build into the suite called SQLTest for now.

NEVER_SET symbol could easily leak into a lazyload query, subsequent to the flush of a pending object. This would occur typically for a many-to-one relationship that does not use a simple "get" strategy. The good news is that the fix improves efficiency vs. 0.9, because we can now skip the SELECT statement entirely when we detect NEVER_SET symbols present in the parameters; prior to :ticket:`3061`, we couldn't discern if the None here were set or not. fixes #3368

assignments that we would see from #3060; fixes #3369

to transient objects with attributes unset would leak NEVER_SET, and negated_contains_or_equals would do so for any transient object as the comparison used only the committed value. Repaired the NEVER_SET cases, fixes #3371, and also made negated_contains_or_equals() use state_attr_by_column() just like a non-negated comparison, fixes #3374

assign the proper result type of Boolean to the result mapping, and instead would leak column types from within the query into the result map. This issue exists in 0.9 and earlier as well, however has less of an impact in those versions. In 1.0, due to #918 this becomes a regression in that we now rely upon the result mapping to be very accurate, else we can assign result-type processors to the wrong column. In all versions, this issue also has the effect that a simple EXISTS will not apply the Boolean type handler, leading to simple 1/0 values for backends without native boolean instead of True/False. The fix includes that an EXISTS columns argument will be anon-labeled like other column expressions; a similar fix is implemented for pure-boolean expressions like ``not_(True())``. fixes #3372

:paramref:`.Pool.reset_on_return` parameter as a synonym for ``None``, so that string values can be used for all settings, allowing .ini file utilities like :func:`.engine_from_config` to be usable without issue. fixes #3375

applying any topological sort to tables on SQLite. See the changelog for details, but we now continue to sort tables for SQLite on DROP, prohibit the sort from considering alter, and only warn if we encounter an unresolvable cycle, in which case, then we forego the ordering. use_alter as always is used to break such a cycle. fixes #3378

Fix TypeError: Boolean value of this clause is not defined

with Firebird, so that the values are again rendered inline when this is selected. Related to 🎫`3034`. fixes #3381

- repair category for EXISTS issue

This makes unique_list approx 2x faster in my (simple) tests

and ``__declare_last__`` accessors where these would no longer be called on the superclass of the declarative base. fixes #3383

without primary keys to be reflected as well as ensured that a SQL statement involved in foreign key detection is pre-fetched up front to avoid driver issues upon nested queries. Fixes here courtesy Eugene Zapolsky; note that we cannot currently test Sybase to locally verify these changes. fixes #3508 fixes #3509

to function for a many-to-one relationship. The loader used an API to place "None" into the dictionary which no longer actually writes a value; this is a side effect of 🎫`3061`. - remove InstanceState._initialize() totally, it's used nowhere else and no longer does what it says it does - fill in fowards-port version ids throughout the changes for 1.0.9

such as :meth:`.Query.union` now handle the case where the embedded SELECT statements need to be parenthesized due to the fact that they include LIMIT, OFFSET and/or ORDER BY. These queries **do not work on SQLite**, and will fail on that backend as they did before, but should now work on all other backends. fixes #2528

- clean up other things we aren't using anymore

By telling wheel that we have extension modules, even though we have none, wheel will create a Wheel which is platform and interpreter specific. This will ensure that the pure Python wheels on PyPy do not trigger installs on CPython without the C speedups.

- ensure that kwargs can be modified in-place within InstanceEvents.init and that these take effect for the __init__ method. - improve documentation for these and related events, including that kwargs can be modified in-place.

Would be nice to fix Oracle here

- The "hashable" flag on special datatypes such as :class:`.postgresql.ARRAY`, :class:`.postgresql.JSON` and :class:`.postgresql.HSTORE` is now set to False, which allows these types to be fetchable in ORM queries that include entities within the row. fixes #3499 - The Postgresql :class:`.postgresql.ARRAY` type now supports multidimensional indexed access, e.g. expressions such as ``somecol[5][6]`` without any need for explicit casts or type coercions, provided that the :paramref:`.postgresql.ARRAY.dimensions` parameter is set to the desired number of dimensions. fixes #3487 - The return type for the :class:`.postgresql.JSON` and :class:`.postgresql.JSONB` when using indexed access has been fixed to work like Postgresql itself, and returns an expression that itself is of type :class:`.postgresql.JSON` or :class:`.postgresql.JSONB`. Previously, the accessor would return :class:`.NullType` which disallowed subsequent JSON-like operators to be used. part of fixes #3503 - The :class:`.postgresql.JSON`, :class:`.postgresql.JSONB` and :class:`.postgresql.HSTORE` datatypes now allow full control over the return type from an indexed textual access operation, either ``column[someindex].astext`` for a JSON type or ``column[someindex]`` for an HSTORE type, via the :paramref:`.postgresql.JSON.astext_type` and :paramref:`.postgresql.HSTORE.text_type` parameters. also part of fixes #3503 - The :attr:`.postgresql.JSON.Comparator.astext` modifier no longer calls upon :meth:`.ColumnElement.cast` implicitly, as PG's JSON/JSONB types allow cross-casting between each other as well. Code that makes use of :meth:`.ColumnElement.cast` on JSON indexed access, e.g. ``col[someindex].cast(Integer)``, will need to be changed to call :attr:`.postgresql.JSON.Comparator.astext` explicitly. This is part of the refactor in references #3503 for consistency in operator use.

- Fixes to the ORM and to the postgresql JSON type regarding the ``None`` constant in conjunction with the Postgresql :class:`.JSON` type. When the :paramref:`.JSON.none_as_null` flag is left at its default value of ``False``, the ORM will now correctly insert the Json "'null'" string into the column whenever the value on the ORM object is set to the value ``None`` or when the value ``None`` is used with :meth:`.Session.bulk_insert_mappings`, **including** if the column has a default or server default on it. This makes use of a new type-level flag "evaluates_none" which is implemented by the JSON type based on the none_as_null flag. fixes #3514 - Added a new constant :attr:`.postgresql.JSON.NULL`, indicating that the JSON NULL value should be used for a value regardless of other settings. part of fixes #3514

"super" instead of hardcoding to "self.type" for the default return value, the base Comparator was returning other_comparator.type. It's not clear what the rationale for this was, though in theory the base Comparator should possibly even throw an exception if the two types aren't the same (or of the same affinity?) . - mysql.SET was broken on this because the bitwise version adds "0" to the value to force an integer within column_expression, we are doing type_coerces here now in any case so that there is no type ambiguity for this operation

- new test for json col['x']['y']['z'] seems to fail pre PG 9.4, fails on comparisons for non-compatible data instead of not matching - no need to call SpecPredicate(db) directly in exclusion functions, by using Predicate.as_predicate() the spec strings can have version comparisons

refer mostly to the DDL object; this system is primarily useful in that case, and not for built-in objects. Reference that the built-in case is not really viable right now. References #3442.

or mapped instances into contexts where they are interpreted as SQL bound parameters; a new exception is raised for this. fixes #3321

by mock and other __getattr__ impostors

could encounter empty history, and where a column keyed to an alternate attribute name would fail to track properly. Fixes courtesy Alex Fraser.

mapper_configured(), after_configured(), and before_configured().

Stiivi · 2015-08-26T06:59:01Z

Added the compiler test, but can't add the MySQL test, since I don't have MySQL instance running.

zzzeek · 2015-08-26T13:43:10Z

OK github is showing hundreds of subsequent commits, maybe if you rebase this it will clean up b.c. i can't even view the diff online at the moment.

mysql-specific tests can run without mysql installed in test/dialect/mysql/test_compiler.py.

Stiivi · 2015-08-26T16:19:29Z

Re commits: This happened after rebase with master. Anyone here with better experience with Git to tell what might have gone wrong?

If this can't be undone, I'll create a new branch/PR.

Stiivi · 2015-10-24T20:36:39Z

Replaced by cleaning-up the rebase mess in #209.

Stiivi added 2 commits April 17, 2015 15:15

Added FULL OUTER JOIN through Join(full=True)

78789eb

Added full= to outerjoin()

32d08ff

zzzeek reviewed Apr 17, 2015
View reviewed changes

seth-p reviewed Apr 18, 2015
View reviewed changes

Update FULL OUTER JOIN in the MySQL dialect

948fe4d

minor documentation improvement; pass keyword argument properly

zzzeek and others added 19 commits August 25, 2015 23:55

1.0.1

27bf39b

- add documentation describing the behavioral change in relationship

7b52e60

assignments that we would see from #3060; fixes #3369

- add migration note for #3084

163381d

- Added the string value "none" to those accepted by the

6b64e83

:paramref:`.Pool.reset_on_return` parameter as a synonym for ``None``, so that string values can be used for all settings, allowing .ini file utilities like :func:`.engine_from_config` to be usable without issue. fixes #3375

- add a warning to SingletonThreadPool that it isn't for general use

23551ae

- typo

f36fd0f

- correct migration note for #3084; doesn't apply to create/drop

a75eeaf

Update base.py

4f35766

Fix TypeError: Boolean value of this clause is not defined

changelog for #3380, fixes #3380

dcf73b6

- Fixed support for "literal_binds" mode when using limit/offset

eb865e3

with Firebird, so that the values are again rendered inline when this is selected. Related to 🎫`3034`. fixes #3381

- change/migration edits

df6e4bf

- update date

d3a0258

- repair category for EXISTS issue

1.0.1

9dcbd89

micro-optimize unique_list

406b57d

This makes unique_list approx 2x faster in my (simple) tests

- Fixed regression regarding the declarative __declare_first__

312e013

and ``__declare_last__`` accessors where these would no longer be called on the superclass of the declarative base. fixes #3383

halfcrazy and others added 23 commits August 25, 2015 23:56

fix typo int->into

bf63607

- pep8 test/orm/test_mapper

d187407

- we can again use setuptools.find_packages since we require setuptools

2048b62

- clean up other things we aren't using anymore

- make these failure cases more specific to catch Oracle.

43aeedb

Would be nice to fix Oracle here

- correct this, need brackets

868afaa

- document workaround type for ARRAY of ENUM, fixes #3467

473ebd8

- forgot to implement py3k logic for JSON.NULL

13c4583

- some cleanups in compiler.py

c3dda37

- rework the "controlling DDL sequences" documentation to

6562087

refer mostly to the DDL object; this system is primarily useful in that case, and not for built-in objects. Reference that the built-in case is not really viable right now. References #3442.

- Added new checks for the common error case of passing mapped classes

cef0470

or mapped instances into contexts where they are interpreted as SQL bound parameters; a new exception is raised for this. fixes #3321

- repair the inspection hook in sqltypes to not be fooled

f6f83c1

by mock and other __getattr__ impostors

- Fixed two issues in the "history_meta" example where history tracking

32d9466

could encounter empty history, and where a column keyed to an alternate attribute name would fail to track properly. Fixes courtesy Alex Fraser.

- re-document and fully cross link all of configure_mappers(),

7e06585

mapper_configured(), after_configured(), and before_configured().

added sql compiler test for FULL OUTER JOIN

72478db

Stiivi mentioned this pull request Oct 24, 2015

Added FULL OUTER JOIN through Join(full=True) #209

Closed

Stiivi closed this Oct 24, 2015

sqlalchemy-bot mentioned this pull request Nov 27, 2018

Full outer join support as a sql expression sqlalchemy/sqlalchemy#1957

Closed

Uh oh!

Added FULL OUTER JOIN through Join(full=True) #167

Added FULL OUTER JOIN through Join(full=True) #167

Uh oh!

Conversation

Stiivi commented Apr 17, 2015

Uh oh!

Stiivi commented Apr 17, 2015

Uh oh!

zzzeek Apr 17, 2015

Choose a reason for hiding this comment

Uh oh!

Stiivi May 1, 2015

Choose a reason for hiding this comment

Uh oh!

zzzeek commented Apr 17, 2015

Uh oh!

seth-p Apr 18, 2015

Choose a reason for hiding this comment

Uh oh!

Stiivi Apr 26, 2015

Choose a reason for hiding this comment

Uh oh!

seth-p Apr 27, 2015

Choose a reason for hiding this comment

Uh oh!

Stiivi May 1, 2015

Choose a reason for hiding this comment

Uh oh!

zzzeek commented May 1, 2015

Uh oh!

Stiivi commented May 21, 2015

Uh oh!

zzzeek commented May 21, 2015

Uh oh!

Stiivi commented Aug 26, 2015

Uh oh!

zzzeek commented Aug 26, 2015

Uh oh!

Stiivi commented Aug 26, 2015

Uh oh!

Stiivi commented Oct 24, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants