Skip to content

Conversation

@Stiivi
Copy link

@Stiivi Stiivi commented Apr 17, 2015

I've added support for FULL OUTER JOIN through join argument full (as suggested in this post)

FULL OUTER JOIN, besides other uses, is very useful in ETLs for doing transformations on top of table differences. Using two outer joins, as suggested in the above mentioned post, unnecessarily overcomplicates statements.

@Stiivi
Copy link
Author

Stiivi commented Apr 17, 2015

Short script to demonstrate it:

import sqlalchemy as sa

engine = sa.create_engine("postgresql://localhost/data")
md = sa.MetaData(engine)

left = sa.Table("left", md,
                sa.Column("id", sa.Integer))
right = sa.Table("right", md,
                sa.Column("id", sa.Integer))

md.create_all()

insert = left.insert()
for i in range(1, 6):
    engine.execute(insert.values([i]))

insert = right.insert()
for i in range(4, 10):
    engine.execute(insert.values([i]))


cond = left.c.id == right.c.id
select = sa.select([left.c.id, right.c.id],
                   from_obj=left.join(right, onclause=cond, full=True))

result = engine.execute(select)

for row in result:
    print row

Output:

(1, None)
(2, None)
(3, None)
(4, 4)
(5, 5)
(None, 6)
(None, 7)
(None, 8)
(None, 9)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a look in some of the dialects, particularly mysql/base.py, which has its own routine here; need to upgrade that also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@zzzeek
Copy link
Owner

zzzeek commented Apr 17, 2015

it's good, will need tests in test/sql/test_compiler.py at least

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say instead of LEFT OUTER JOIN rather than instead of JOIN?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With current implementation the full precedes isouter therefore if full == True then there will be FULL OUTER JOIN instead of any other kind of join, regardless of what the value of isouter is. The assumption here is: "full join" implies "outer join".

It would be more appropriate to have it a tri-state argument, however that would break backward compatibility. Another maybe correct way would be to set isouter to True everytime full is set to True, however that would unnecessarily increase complexity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. My comment isn't about the logic, just the documentation of the full parameter. All I mean to say is that if full == False here, it will render LEFT OUTER JOIN (rather than just JOIN, as implied by the comment), since it's constructing a Join with isouter=True. So I would think the parameter documentation should be:

        :param full: if True, render a FULL OUTER JOIN, instead of LEFT OUTER JOIN.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, updated the docs.

minor documentation improvement; pass keyword argument properly
@zzzeek
Copy link
Owner

zzzeek commented May 1, 2015

just a reminder that this still needs some test coverage somewhere.

@Stiivi
Copy link
Author

Stiivi commented May 21, 2015

Which critical tests are missing for this pull requests to be merged?

@zzzeek
Copy link
Owner

zzzeek commented May 21, 2015

our core tests for "compiler" functions are unfortunately still pretty crufty in their format, nevertheless having something in test/sql/test_compiler.py would be the beginning of it. There's a ball-of-mud method right now called test_joins(), I'd leave that but for the moment probably add a test_full_outer_join() method below. Then there's some custom stuff for MySQL here, so that indicates some tests in test/dialect/mysql/test_compiler.py as well (also an older crufty suite at the moment), I'd build into the suite called SQLTest for now.

zzzeek and others added 19 commits August 25, 2015 23:55
NEVER_SET symbol could easily leak into a lazyload query, subsequent
to the flush of a pending object.  This would occur typically
for a many-to-one relationship that does not use a simple
"get" strategy.   The good news is that the fix improves efficiency
vs. 0.9, because we can now skip the SELECT statement entirely
when we detect NEVER_SET symbols present in the parameters; prior to
:ticket:`3061`, we couldn't discern if the None here were set or not.
fixes #3368
assignments that we would see from #3060; fixes #3369
to transient objects with attributes unset would leak NEVER_SET,
and negated_contains_or_equals would do so for any transient
object as the comparison used only the committed value.
Repaired the NEVER_SET cases, fixes #3371, and also made
negated_contains_or_equals() use state_attr_by_column() just
like a non-negated comparison, fixes #3374
assign the proper result type of Boolean to the result mapping, and
instead would leak column types from within the query into the
result map.  This issue exists in 0.9 and earlier as well, however
has less of an impact in those versions.  In 1.0, due to #918
this becomes a regression in that we now rely upon the result mapping
to be very accurate, else we can assign result-type processors to
the wrong column.   In all versions, this issue also has the effect
that a simple EXISTS will not apply the Boolean type handler, leading
to simple 1/0 values for backends without native boolean instead of
True/False.   The fix includes that an EXISTS columns argument
will be anon-labeled like other column expressions; a similar fix is
implemented for pure-boolean expressions like ``not_(True())``.
fixes #3372
:paramref:`.Pool.reset_on_return` parameter as a synonym for ``None``,
so that string values can be used for all settings, allowing
.ini file utilities like :func:`.engine_from_config` to be usable
without issue.
fixes #3375
applying any topological sort to tables on SQLite.  See the
changelog for details, but we now continue to sort
tables for SQLite on DROP, prohibit the sort from considering
alter, and only warn if we encounter an unresolvable cycle, in
which case, then we forego the ordering.  use_alter as always
is used to break such a cycle.
fixes #3378
Fix TypeError: Boolean value of this clause is not defined
with Firebird, so that the values are again rendered inline when
this is selected.  Related to 🎫`3034`.
fixes #3381
- repair category for EXISTS issue
This makes unique_list approx 2x faster in my (simple) tests
and ``__declare_last__`` accessors where these would no longer be
called on the superclass of the declarative base.
fixes #3383
halfcrazy and others added 23 commits August 25, 2015 23:56
without primary keys to be reflected as well as ensured that
a SQL statement involved in foreign key detection is pre-fetched up
front to avoid driver issues upon nested queries.  Fixes here
courtesy Eugene Zapolsky; note that we cannot currently test
Sybase to locally verify these changes.
fixes #3508  fixes #3509
to function for a many-to-one relationship.  The loader used an
API to place "None" into the dictionary which no longer actually
writes a value; this is a side effect of 🎫`3061`.
- remove InstanceState._initialize() totally, it's used nowhere
else and no longer does what it says it does
- fill in fowards-port version ids throughout the changes for 1.0.9
such as :meth:`.Query.union` now handle the case where the embedded
SELECT statements need to be parenthesized due to the fact that they
include LIMIT, OFFSET and/or ORDER BY.   These queries **do not work
on SQLite**, and will fail on that backend as they did before, but
should now work on all other backends.
fixes #2528
By telling wheel that we have extension modules, even though we
have none, wheel will create a Wheel which is platform and
interpreter specific. This will ensure that the pure Python wheels
on PyPy do not trigger installs on CPython without the C speedups.
- ensure that kwargs can be modified in-place within InstanceEvents.init
and that these take effect for the __init__ method.
- improve documentation for these and related events, including
that kwargs can be modified in-place.
- The "hashable" flag on special datatypes such as :class:`.postgresql.ARRAY`,
:class:`.postgresql.JSON` and :class:`.postgresql.HSTORE` is now
set to False, which allows these types to be fetchable in ORM
queries that include entities within the row.  fixes #3499
- The Postgresql :class:`.postgresql.ARRAY` type now supports multidimensional
indexed access, e.g. expressions such as ``somecol[5][6]`` without
any need for explicit casts or type coercions, provided
that the :paramref:`.postgresql.ARRAY.dimensions` parameter is set to the
desired number of dimensions. fixes #3487
- The return type for the :class:`.postgresql.JSON` and :class:`.postgresql.JSONB`
when using indexed access has been fixed to work like Postgresql itself,
and returns an expression that itself is of type :class:`.postgresql.JSON`
or :class:`.postgresql.JSONB`.  Previously, the accessor would return
:class:`.NullType` which disallowed subsequent JSON-like operators to be
used. part of fixes #3503
- The :class:`.postgresql.JSON`, :class:`.postgresql.JSONB` and
:class:`.postgresql.HSTORE` datatypes now allow full control over the
return type from an indexed textual access operation, either ``column[someindex].astext``
for a JSON type or ``column[someindex]`` for an HSTORE type,
via the :paramref:`.postgresql.JSON.astext_type` and
:paramref:`.postgresql.HSTORE.text_type` parameters. also part of fixes #3503
- The :attr:`.postgresql.JSON.Comparator.astext` modifier no longer
calls upon :meth:`.ColumnElement.cast` implicitly, as PG's JSON/JSONB
types allow cross-casting between each other as well.  Code that
makes use of :meth:`.ColumnElement.cast` on JSON indexed access,
e.g. ``col[someindex].cast(Integer)``, will need to be changed
to call :attr:`.postgresql.JSON.Comparator.astext` explicitly.  This is
part of the refactor in references #3503 for consistency in operator
use.
- Fixes to the ORM and to the postgresql JSON type regarding the
``None`` constant in conjunction with the Postgresql :class:`.JSON` type.  When
the :paramref:`.JSON.none_as_null` flag is left at its default
value of ``False``, the ORM will now correctly insert the Json
"'null'" string into the column whenever the value on the ORM
object is set to the value ``None`` or when the value ``None``
is used with :meth:`.Session.bulk_insert_mappings`,
**including** if the column has a default or server default on it.  This
makes use of a new type-level flag "evaluates_none" which is implemented
by the JSON type based on the none_as_null flag. fixes #3514
- Added a new constant :attr:`.postgresql.JSON.NULL`, indicating
that the JSON NULL value should be used for a value
regardless of other settings. part of fixes #3514
"super" instead of hardcoding to "self.type" for the default return
value, the base Comparator was returning other_comparator.type.   It's
not clear what the rationale for this was, though in theory the
base Comparator should possibly even throw an exception if the two
types aren't the same (or of the same affinity?) .
- mysql.SET was broken on this because the bitwise version adds "0"
to the value to force an integer within column_expression, we are doing type_coerces here
now in any case so that there is no type ambiguity for this
operation
- new test for json col['x']['y']['z'] seems to fail pre PG 9.4,
fails on comparisons for non-compatible data instead of not matching
- no need to call SpecPredicate(db) directly in exclusion functions,
by using Predicate.as_predicate() the spec strings can have version
comparisons
refer mostly to the DDL object; this system is primarily useful
in that case, and not for built-in objects.  Reference that
the built-in case is not really viable right now. References #3442.
or mapped instances into contexts where they are interpreted as
SQL bound parameters; a new exception is raised for this.
fixes #3321
by mock and other __getattr__ impostors
could encounter empty history, and where a column keyed to an alternate
attribute name would fail to track properly.  Fixes courtesy
Alex Fraser.
mapper_configured(), after_configured(), and before_configured().
@Stiivi
Copy link
Author

Stiivi commented Aug 26, 2015

Added the compiler test, but can't add the MySQL test, since I don't have MySQL instance running.

@zzzeek
Copy link
Owner

zzzeek commented Aug 26, 2015

OK github is showing hundreds of subsequent commits, maybe if you rebase this it will clean up b.c. i can't even view the diff online at the moment.

mysql-specific tests can run without mysql installed in test/dialect/mysql/test_compiler.py.

@Stiivi
Copy link
Author

Stiivi commented Aug 26, 2015

Re commits: This happened after rebase with master. Anyone here with better experience with Git to tell what might have gone wrong?

If this can't be undone, I'll create a new branch/PR.

@Stiivi
Copy link
Author

Stiivi commented Oct 24, 2015

Replaced by cleaning-up the rebase mess in #209.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.