-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joinable subselects #408
Joinable subselects #408
Conversation
@@ -618,49 +624,60 @@ def _select_wcols(self, query, fields, left=False, join=False, | |||
#: build joins (inner, left outer) and table names | |||
if join: | |||
( | |||
# FIXME? ijoin_tables is never used | |||
ijoin_tables, ijoin_on, itables_to_merge, ijoin_on_tables, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible bug: ijoin_tables
is never used.
boys I so don't like _select() not returning a string. People may have been grow fond of that. On another entire matter, I don't feel writing that much of code over a raw t-sql when in need of such complications...the proposed piece of pydal code to write is maybe longer than the query itself, and it's not readable at all. |
@niphlod on the other hand I find the idea quite interesting, but I'm thinking about it in the matter of how pydal builds sql strings. I think we should really move the discussion on a different "floor" here, but is definitely another story from this PR and the idea behind it. @mdipierro what do you think? |
_select MUST return a string. it is in the specs. select() makes a query and _select() gives me the corresponding SQL. Same for update and _update, insert and _insert, count and _count. That is also needed for subselects like db(db.table.field.belongs(db(subquery)._select(db.table.field)).select() We should not change it. It is ok to return an object that extends str. |
If I understand correctly how Python strings work, making the |
Yes. I would like that better. No need to break the current functionality since we can have a nested_select that is more clear anyway. |
Current coverage is 66.38% (diff: 85.03%)@@ master #408 diff @@
==========================================
Files 69 69
Lines 8908 9102 +194
Methods 0 0
Messages 0 0
Branches 1836 1987 +151
==========================================
+ Hits 5924 6042 +118
- Misses 2482 2548 +66
- Partials 502 512 +10
|
It appears that nobody actually tested Rows.render(). Unit tests call it but don't check the output. If you called it as |
@mdipierro I'm for changing the behaviour: I think it's quite useless to stay on strings if we want to alter queries on the go, because it's just loosing time in building a string that will be rebuilt again and again. Having a @nextghost maybe before you continue with subselects we can design something more generic for the way strings are built in pyDAL, so that you can benefit from the new structure to implement subselects? @niphlod opinions? |
GAE adapter should be working again, at least according to unit tests. IMAP, Mongo and CouchDB adapters don't seem to support joins anyway so I'll skip refactoring them. However, I highly recommend refactoring the rest of Also, @gi0baro I'm almost done with adapter refactoring so there's no point in waiting for another big redesign. Just 2 or 3 more commits and I can wrap it up by writing unit tests. But here are some tips you might find useful:
|
@gi0baro : on top of the notes on #411, I don't think that caching the generated query will solve THAT amount of performance to be noticeable. If we want to really save time, it should be saved in avoiding expanding arguments by ourselves, as the 99% of the drivers are mature enough to do it by themselves (and query plans for prepared statements are cached on the backend, that's one real speedup). It'll eventually close also the requirement for using binaries instead of base64 encoding them (and all the memory issues in heavy-processing pieces of code). |
Defaults to True. When set to False, the subquery will be self-contained, meaning it will never reference any tables from the parent scope.
what's the status here? |
@mdipierro : everything scheduled for the next or the following release (meaning hopefully for december) |
@mdipierro I still haven't had time to finish the unit tests. If things go well, I might get it done next week. |
Aaaand, done. If you don't have any further comments or suggestions for this pull request, you can merge it. |
@nextghost @mdipierro will review this carefully in the next days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made few questions and a suggestion for a change. I will also perform a performance test before the merge. Thank you for the hard work @nextghost
colnames = list(map(self._colexpand, fields)) | ||
sql_fields = ', '.join(map(self._geoexpand, fields)) | ||
for item in outer_scoped: | ||
# FIXME: check for name conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost what about this? Should be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing it would break backward compatibility with third-party code. outer_scoped
is currently a list of internal table names. It would have to be changed into a dict of table objects, otherwise there is no way to check for conflicts (same name referencing different objects).
tokens = [table_alias(cross_joins[0])] | ||
tokens += [self.dialect.cross_join(table_alias(t), query_env) | ||
for t in cross_joins[1:]] | ||
# FIXME: WTF? This is not correct syntax at least on PostgreSQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost can you explain this better? Which is the correct syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join_tables
gets expanded below the comment into something like LEFT JOIN tab1, tab2, tab3, ...
. MySQL and PostgreSQL both raise syntax error on that but SQLite seems to accept it. At the very least, the expansion should be handled by the dialect, not SQLAdapter._select_wcols()
.
The ON
clause is mandatory for each table in LEFT JOIN
in MySQL and PostgreSQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost ok, will inspect that apart from this PR
tokens += [self.dialect.cross_join(table_alias(t), query_env) | ||
for t in tables_not_in_joinon[1:]] | ||
tokens += [self.dialect.join(t, query_env) for t in ijoin_on] | ||
# FIXME: WTF? This is not correct syntax at least on PostgreSQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost same as the other comment
colnames, sql = self._colnames_cache, self._sql_cache | ||
if with_alias and self._tablename is not None: | ||
alias = self._db._adapter.dialect.quote(self._tablename) | ||
if 'Oracle' in str(type(self._db._adapter)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost you should use dialect's _as
method instead of checking the adapter type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit was copied from Table.__str__()
. Should I fix both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nextghost would be awesome :)
@nextghost can you also please update the description of this PR, also adding a quick example of the usage? |
@gi0baro I've fixed table/subselect alias expansion as you requested and added another minor bugfix. I've also updated PR description and added usage example. (Should I remove the old description as well?) But before you merge it, you should also take another look at the two pieces of code which I've highlighted right below the PR description. |
Thank you @nextghost! The description looks good to me right now. |
@nextghost ok, the performance seems quite the same from my benchmarks. If you don't have any other considerations, I will merge this. |
I have nothing more to add, you can merge this. |
* added codecov settings file this should silence unnecessary messages from codecov. we can then adjust settings going forward * checkpoint * possibly fixed the issue with id in import_from_csv * fixed import_from_csv_file * removed unwanted print * another import fix * more import debugging * Enabled entity quoting as default also on DAL class (web2py#365) * Enhancement/teradata lastrowid (web2py#363) * Initial change to remove ESCAPE from teradata syntax * Revert "Initial change to remove ESCAPE from teradata syntax" This reverts commit 81489e9. * Fix lastrowid error for Teradata adapter * further update to teradata adapter * Allow custom Rows classes (web2py#367) * Added Rows.append and Rows.insert (web2py#366) * [WIP] Indexes support (web2py#361) * First implementation of web2py#152 * Returning True on index creation/drop * Added where option on postgre create_index * Added tests for indexes * Added support for additional arguments to Table.create_index * Better expansion in sql instructions for indexes, better errors for indexes * Re-implemented _lastsql on DAL class (web2py#368) * Fixed web2py#319 and web2py#333 * turn off codecov/changes report hopefully :) * Some fixes on StringIO usages with py3 * Better blobs handling with MongoDB and python3 * Added missing representation of 'password' fields on NoSQL adapters * Updated mongoDB tests on py3 and blobs * Updated changelog * Releasing 16.06 * Added postgres3 adapters, put back multiple values type boolean parsing in base parser * Removed postgres2 adapter from tests matrix since implicitly tested on postgres3 * Updated dev version * Making connection and cursors fork-safe in multiprocess environments * Fixing missing inheritance of additional fields expressions in dialects * Releasing 16.06.09 * Updated dev version * Added forced parsing of date and datetime fields in SQLite parser * Added aggregation type detection test for fields * Fix web2py#376 * Fixed web2py#377 * default quote_template for CommonDialect * Enhance d13e82f * Fix web2py#380 * Releasing 16.06.20 * Updated dev version * fixes web2py#382 * Improving parsing performance (web2py#385) * Fix web2py#383 * Releasing 16.06.28 * Updated dev version * added rows.join('name',lambda ids: db.referenced.field.belongs(ids)) * remove test that depends on details of json serialization * fixed removal order of new table in test * yet simpler and smarter row.join * fixed rows.join(...) * one more improvement in join * allow many2many in row.join * ignore pool_size for sqlite * better security * enabled python3.5 on travis (web2py#393) * fix py35 Field.__hash__(), close web2py#392 (web2py#394) * added py35 to tox.ini * Avoid to run travis tests on py35 and google * fixes web2py#396 * Skip travis tests on python 3.3 * Releasing 16.07 * Updated dev version * Fix web2py#399 - postgres adapter lastrowid() to use currval(seq) instead of lastval() * reinstated self._last_insert, no need to remove working logic * defied long in _compat * merged pydal * portalocker: removed pywin32 dep, added tests (web2py#401) * Avoid upload errors on py3 if upload content is a bytes string * Compat module cleaning * Releasing 16.08 * Updated dev version * MongoDB: testcase and fix for query(field==list:reference) * Enhancement/teradata test (web2py#413) * Initial change to remove ESCAPE from teradata syntax * Revert "Initial change to remove ESCAPE from teradata syntax" This reverts commit 81489e9. * Initial teradata tests changes. More to come * first sql.py change, TestAddMethod now OK * include teradata in setup * Postgres fixes (web2py#414) * Fixed postgres2 and postgres3 adapters inheritance, updated `PostgreMeta` to choose the right class depending on uri * Fixed representation for lists on postgres2 and postgres3 adapters * Fixed typo in PostgreDialectArrays * Ensure strings in PostgreArrayRepresenter._listify_elements * New test helper class DALtest that handles connection boilerplate (web2py#415) * Enhancement/teradata test (web2py#418) * Initial change to remove ESCAPE from teradata syntax * Revert "Initial change to remove ESCAPE from teradata syntax" This reverts commit 81489e9. * TestBelongs for Teradata * More Teradata test changes * run the test. * fix for opening files in py3 (web2py#420) * fix for opening files in py3 fixes also web2py/web2py#1451 * fix for PY2 detection * Releasing 16.09 * Updated dev version * Enhancement/teradatatest (web2py#426) * Initial change to remove ESCAPE from teradata syntax * Revert "Initial change to remove ESCAPE from teradata syntax" This reverts commit 81489e9. * Skipping TestChainedJoinUnique for Teradata. * Clean connection data on thread local on DAL.close (web2py#427) * Releasing 16.11 * Updated dev version * Joinable subselects (web2py#408) * Minimal implementation of joinable subselects (work in progress) * Generate subselects in Set.nested_select() instead of BaseAdapter._select() * Add support for subselects to BaseAdapter._count() * Prevent table name collisions when applying record versioning filters * Fix common filters in select with subselects * Add support for subselect fields to Rows.render() and fix bugs * Refactor Google datastore adapter and BaseAdapter.get_table() * Check for table name collisions when building a SELECT query * When building subselect string, ignore cache if outer_scoped is not empty * Minor optimization in SQLAdapter._select_wcols() * Add parameter "correlated" to Select constructor defaults to True. When set to False, the subquery will be self-contained, meaning it will never reference any tables from the parent scope. * Implement proper expansion of correlated subqueries * Check that subquery in belongs() expression has exactly 1 column * Implement expansion of correlated subqueries in count(), update(), delete() * Update Mongo unit tests to new adapter API * Move part of Set.nested_select() to adapter code * Unit tests for the Select class * Additional unit tests for fixed bugs * Expand aliased table names using dialect methods * Fix expansion of correlated subqueries used in UPDATE/DELETE statements * Refactor table name attributes and fix related aliasing/rname bugs * Update IMAP adapter methods to new API * Add table alias support to UPDATE/DELETE in MSSQL dialect * Store values for callbacks on insert/update/delete operations (web2py#431) * Changed listify to store computations in values * Moved attempt_upload logic to callbacks * Fixed callbacks access to fields * One more check that Table.with_alias() updates all field attributes * Rename Table.sqlsafe and Table.sqlsafe_alias * Refactor field name attributes and fix related rname bugs * Fix type error in MSSQL TFK reference template * Use the correct constraint name in MSSQL TFK references * Added missing 'OpRow.get' method * Fixed after_update callbacks argument in validate_and_update * tablemap should contains all tables to avoid key errors in adapter _select_wcols * Fix field migrations and some final refactoring * Fixed wrong add operation with float types web2py#445 * Updated changelog * Releasing 17.01 * Fix DAL(None).define_table() * Fix perpetual migrations due to changes in name quoting * Fix storing Rows objects in pickle-based cache * Pass the list of field references when creating a subset of Rows object * Disable part of pickling test on MSSQL, driver problem... * Completely disable the pickling test on MSSQL (driver problem...) * Updated dev version * Correctly configure adapters that need connection for configuration * Added locking for reconnect method switch, created a mixin for adapters that need reconnection mocking * added tests for py36 * Always find the driver on adapters * Cleaner solution for web2py#446 * Updated changelog * Releasing 17.03 * added missing "oracle" in adapters.register_for() (web2py#457) Fixes web2py/web2py#1599 * field.set_attributes(..) could return the instance (web2py#458) Would let the code be more straightforward -- example, when passing the "decorated" field to the function... Now I have to "scatter" the code in two places for each field... * testing py36 on appveyor (web2py#456) * Fix get in pydal.helpers.classes Reference class calling __getattr__ … (web2py#465) * Fix get in pydal.helpers.classes Reference class calling __getattr__ with 3 variables instead of 2 as per signature * Move fix of wrong method call in Reference class from get to __getattr__ * Make DAL support jsonb for postgres (web2py#469) * Make DAL support jsonb for postgres * added parser for jsonb * experimantal code to address issue #1613 * fixed error introduced in recent refactoring by me * mysql and postgres unix_socket support * fixed _find_work_folder * support for gcloud sql v2. thanks Karoly Kantor * small typo in warnings * Releasing 17.07 * fixed issue web2py#480 * Fixes web2py/web2py#1683 (web2py#484) * allow DAL(..., adapter_args=dict(migrator=InDBMigrator)) * fixed custom migrator * Updated dev version * Cleaning up f71aade * Fix Table objects in common fields (web2py#486) This fixes the problem of putting and tables on common_fields and having lazy_define_table throw an exception because Table doesn't have a clone attribute. * fixed typo changing `oracle_fix` to `cmd_fix` in adapters/oracle.py (web2py#471) fixed error where %-formats were only applying to second part of concatenated SQL statement in dialects/oracle.py * override _select_aux_execute method in adapters/oracle.py to use oracle-specific `fetchall` (web2py#473) * fixed MSSQL TOP * possible fix for tTIZLy33Yw8 * fixed del of missing key, thanks Lazaro Aulan * Fixed st_asgeojson (web2py#489) see https://groups.google.com/forum/#!topic/web2py/EixMPrPr9SY
This changeset implements subselect objects that can be used in a join like a virtual table or passed to
Expression.belongs()
. Subselect objects can be created usingSet.nested_select()
which accepts the same arguments as regularselect()
. One additional argument,correlated
, determines whether the subselect may reference tables from parent select (defaults toTrue
which means referencing is permitted). Example usage:I'm opening this pull request early for design discussion. Unit tests for Python 2.7, 3.4 & 3.5 and SQLite, MySQL and PostgreSQL are passing on my machine but there's still work to do. Comments and suggestions are appreciated.
Design changes
DAL.__getitem__()
. They must pass Table/Select objects around.Table
:query_name()
- Full table name for query FROM clause, including possible alias.query_alias()
- The appropriate string that identifies the table throughout a query (e.g. for building fully qualified field names).Rows.__init__()
now takes additional argument - list ofField
/Expression
objects corresponding to columnsBaseAdapter._select()
now returns aSelect
object instead of stringBaseAdapter.tables()
now returns a dict mapping table names to table objects instead of name list.BaseAdapter.common_filter()
now accepts table objects as well as table names.BaseAdapter
internal methods now take or return table objects instead of table names.