-
-
Notifications
You must be signed in to change notification settings - Fork 875
Added FULL OUTER JOIN through Join(full=True) #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Short script to demonstrate it: import sqlalchemy as sa
engine = sa.create_engine("postgresql://localhost/data")
md = sa.MetaData(engine)
left = sa.Table("left", md,
sa.Column("id", sa.Integer))
right = sa.Table("right", md,
sa.Column("id", sa.Integer))
md.create_all()
insert = left.insert()
for i in range(1, 6):
engine.execute(insert.values([i]))
insert = right.insert()
for i in range(4, 10):
engine.execute(insert.values([i]))
cond = left.c.id == right.c.id
select = sa.select([left.c.id, right.c.id],
from_obj=left.join(right, onclause=cond, full=True))
result = engine.execute(select)
for row in result:
print rowOutput: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take a look in some of the dialects, particularly mysql/base.py, which has its own routine here; need to upgrade that also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
it's good, will need tests in test/sql/test_compiler.py at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this say instead of LEFT OUTER JOIN rather than instead of JOIN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With current implementation the full precedes isouter therefore if full == True then there will be FULL OUTER JOIN instead of any other kind of join, regardless of what the value of isouter is. The assumption here is: "full join" implies "outer join".
It would be more appropriate to have it a tri-state argument, however that would break backward compatibility. Another maybe correct way would be to set isouter to True everytime full is set to True, however that would unnecessarily increase complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. My comment isn't about the logic, just the documentation of the full parameter. All I mean to say is that if full == False here, it will render LEFT OUTER JOIN (rather than just JOIN, as implied by the comment), since it's constructing a Join with isouter=True. So I would think the parameter documentation should be:
:param full: if True, render a FULL OUTER JOIN, instead of LEFT OUTER JOIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, updated the docs.
minor documentation improvement; pass keyword argument properly
|
just a reminder that this still needs some test coverage somewhere. |
|
Which critical tests are missing for this pull requests to be merged? |
|
our core tests for "compiler" functions are unfortunately still pretty crufty in their format, nevertheless having something in test/sql/test_compiler.py would be the beginning of it. There's a ball-of-mud method right now called test_joins(), I'd leave that but for the moment probably add a test_full_outer_join() method below. Then there's some custom stuff for MySQL here, so that indicates some tests in test/dialect/mysql/test_compiler.py as well (also an older crufty suite at the moment), I'd build into the suite called SQLTest for now. |
NEVER_SET symbol could easily leak into a lazyload query, subsequent to the flush of a pending object. This would occur typically for a many-to-one relationship that does not use a simple "get" strategy. The good news is that the fix improves efficiency vs. 0.9, because we can now skip the SELECT statement entirely when we detect NEVER_SET symbols present in the parameters; prior to :ticket:`3061`, we couldn't discern if the None here were set or not. fixes #3368
assignments that we would see from #3060; fixes #3369
to transient objects with attributes unset would leak NEVER_SET, and negated_contains_or_equals would do so for any transient object as the comparison used only the committed value. Repaired the NEVER_SET cases, fixes #3371, and also made negated_contains_or_equals() use state_attr_by_column() just like a non-negated comparison, fixes #3374
assign the proper result type of Boolean to the result mapping, and instead would leak column types from within the query into the result map. This issue exists in 0.9 and earlier as well, however has less of an impact in those versions. In 1.0, due to #918 this becomes a regression in that we now rely upon the result mapping to be very accurate, else we can assign result-type processors to the wrong column. In all versions, this issue also has the effect that a simple EXISTS will not apply the Boolean type handler, leading to simple 1/0 values for backends without native boolean instead of True/False. The fix includes that an EXISTS columns argument will be anon-labeled like other column expressions; a similar fix is implemented for pure-boolean expressions like ``not_(True())``. fixes #3372
:paramref:`.Pool.reset_on_return` parameter as a synonym for ``None``, so that string values can be used for all settings, allowing .ini file utilities like :func:`.engine_from_config` to be usable without issue. fixes #3375
applying any topological sort to tables on SQLite. See the changelog for details, but we now continue to sort tables for SQLite on DROP, prohibit the sort from considering alter, and only warn if we encounter an unresolvable cycle, in which case, then we forego the ordering. use_alter as always is used to break such a cycle. fixes #3378
Fix TypeError: Boolean value of this clause is not defined
with Firebird, so that the values are again rendered inline when this is selected. Related to 🎫`3034`. fixes #3381
- repair category for EXISTS issue
This makes unique_list approx 2x faster in my (simple) tests
and ``__declare_last__`` accessors where these would no longer be called on the superclass of the declarative base. fixes #3383
without primary keys to be reflected as well as ensured that a SQL statement involved in foreign key detection is pre-fetched up front to avoid driver issues upon nested queries. Fixes here courtesy Eugene Zapolsky; note that we cannot currently test Sybase to locally verify these changes. fixes #3508 fixes #3509
to function for a many-to-one relationship. The loader used an API to place "None" into the dictionary which no longer actually writes a value; this is a side effect of 🎫`3061`. - remove InstanceState._initialize() totally, it's used nowhere else and no longer does what it says it does - fill in fowards-port version ids throughout the changes for 1.0.9
such as :meth:`.Query.union` now handle the case where the embedded SELECT statements need to be parenthesized due to the fact that they include LIMIT, OFFSET and/or ORDER BY. These queries **do not work on SQLite**, and will fail on that backend as they did before, but should now work on all other backends. fixes #2528
- clean up other things we aren't using anymore
By telling wheel that we have extension modules, even though we have none, wheel will create a Wheel which is platform and interpreter specific. This will ensure that the pure Python wheels on PyPy do not trigger installs on CPython without the C speedups.
- ensure that kwargs can be modified in-place within InstanceEvents.init and that these take effect for the __init__ method. - improve documentation for these and related events, including that kwargs can be modified in-place.
Would be nice to fix Oracle here
- The "hashable" flag on special datatypes such as :class:`.postgresql.ARRAY`, :class:`.postgresql.JSON` and :class:`.postgresql.HSTORE` is now set to False, which allows these types to be fetchable in ORM queries that include entities within the row. fixes #3499 - The Postgresql :class:`.postgresql.ARRAY` type now supports multidimensional indexed access, e.g. expressions such as ``somecol[5][6]`` without any need for explicit casts or type coercions, provided that the :paramref:`.postgresql.ARRAY.dimensions` parameter is set to the desired number of dimensions. fixes #3487 - The return type for the :class:`.postgresql.JSON` and :class:`.postgresql.JSONB` when using indexed access has been fixed to work like Postgresql itself, and returns an expression that itself is of type :class:`.postgresql.JSON` or :class:`.postgresql.JSONB`. Previously, the accessor would return :class:`.NullType` which disallowed subsequent JSON-like operators to be used. part of fixes #3503 - The :class:`.postgresql.JSON`, :class:`.postgresql.JSONB` and :class:`.postgresql.HSTORE` datatypes now allow full control over the return type from an indexed textual access operation, either ``column[someindex].astext`` for a JSON type or ``column[someindex]`` for an HSTORE type, via the :paramref:`.postgresql.JSON.astext_type` and :paramref:`.postgresql.HSTORE.text_type` parameters. also part of fixes #3503 - The :attr:`.postgresql.JSON.Comparator.astext` modifier no longer calls upon :meth:`.ColumnElement.cast` implicitly, as PG's JSON/JSONB types allow cross-casting between each other as well. Code that makes use of :meth:`.ColumnElement.cast` on JSON indexed access, e.g. ``col[someindex].cast(Integer)``, will need to be changed to call :attr:`.postgresql.JSON.Comparator.astext` explicitly. This is part of the refactor in references #3503 for consistency in operator use.
- Fixes to the ORM and to the postgresql JSON type regarding the ``None`` constant in conjunction with the Postgresql :class:`.JSON` type. When the :paramref:`.JSON.none_as_null` flag is left at its default value of ``False``, the ORM will now correctly insert the Json "'null'" string into the column whenever the value on the ORM object is set to the value ``None`` or when the value ``None`` is used with :meth:`.Session.bulk_insert_mappings`, **including** if the column has a default or server default on it. This makes use of a new type-level flag "evaluates_none" which is implemented by the JSON type based on the none_as_null flag. fixes #3514 - Added a new constant :attr:`.postgresql.JSON.NULL`, indicating that the JSON NULL value should be used for a value regardless of other settings. part of fixes #3514
"super" instead of hardcoding to "self.type" for the default return value, the base Comparator was returning other_comparator.type. It's not clear what the rationale for this was, though in theory the base Comparator should possibly even throw an exception if the two types aren't the same (or of the same affinity?) . - mysql.SET was broken on this because the bitwise version adds "0" to the value to force an integer within column_expression, we are doing type_coerces here now in any case so that there is no type ambiguity for this operation
- new test for json col['x']['y']['z'] seems to fail pre PG 9.4, fails on comparisons for non-compatible data instead of not matching - no need to call SpecPredicate(db) directly in exclusion functions, by using Predicate.as_predicate() the spec strings can have version comparisons
refer mostly to the DDL object; this system is primarily useful in that case, and not for built-in objects. Reference that the built-in case is not really viable right now. References #3442.
or mapped instances into contexts where they are interpreted as SQL bound parameters; a new exception is raised for this. fixes #3321
by mock and other __getattr__ impostors
could encounter empty history, and where a column keyed to an alternate attribute name would fail to track properly. Fixes courtesy Alex Fraser.
mapper_configured(), after_configured(), and before_configured().
|
Added the compiler test, but can't add the MySQL test, since I don't have MySQL instance running. |
|
OK github is showing hundreds of subsequent commits, maybe if you rebase this it will clean up b.c. i can't even view the diff online at the moment. mysql-specific tests can run without mysql installed in test/dialect/mysql/test_compiler.py. |
|
Re commits: This happened after rebase with master. Anyone here with better experience with Git to tell what might have gone wrong? If this can't be undone, I'll create a new branch/PR. |
|
Replaced by cleaning-up the rebase mess in #209. |
I've added support for
FULL OUTER JOINthrough join argumentfull(as suggested in this post)FULL OUTER JOIN, besides other uses, is very useful in ETLs for doing transformations on top of table differences. Using two outer joins, as suggested in the above mentioned post, unnecessarily overcomplicates statements.