Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joinable subselects #408

Merged
merged 21 commits into from
Dec 1, 2016
Merged

Joinable subselects #408

merged 21 commits into from
Dec 1, 2016

Conversation

nextghost
Copy link
Contributor

@nextghost nextghost commented Aug 23, 2016

This changeset implements subselect objects that can be used in a join like a virtual table or passed to Expression.belongs(). Subselect objects can be created using Set.nested_select() which accepts the same arguments as regular select(). One additional argument, correlated, determines whether the subselect may reference tables from parent select (defaults to True which means referencing is permitted). Example usage:

total = table1.field2.sum().with_alias('total')
sub1 = db(table1.field2 != None).nested_select(table1.field1, total, groupby=table1.field1)
sub1 = sub1.with_alias('tmp')
result1 = db(table2.field1 == sub1.field1).select(table2.ALL, sub1.total)

total = table1.field2.sum().with_alias('total')
sub2 = db(table1.field2 != None).nested_select(total, groupby=table1.field1)
result2 = db(table2.field2.belongs(sub2)).select(table2.ALL)

I'm opening this pull request early for design discussion. Unit tests for Python 2.7, 3.4 & 3.5 and SQLite, MySQL and PostgreSQL are passing on my machine but there's still work to do. Comments and suggestions are appreciated.

Design changes

  • Adapters and dialects must no longer resolve table names over and over through DAL.__getitem__(). They must pass Table/Select objects around.
  • New methods for class Table:
    • query_name() - Full table name for query FROM clause, including possible alias.
    • query_alias() - The appropriate string that identifies the table throughout a query (e.g. for building fully qualified field names).
  • Rows.__init__() now takes additional argument - list of Field/Expression objects corresponding to columns
  • BaseAdapter._select() now returns a Select object instead of string
  • BaseAdapter.tables() now returns a dict mapping table names to table objects instead of name list.
  • BaseAdapter.common_filter() now accepts table objects as well as table names.
  • many other BaseAdapter internal methods now take or return table objects instead of table names.

@nextghost nextghost mentioned this pull request Aug 23, 2016
@@ -618,49 +624,60 @@ def _select_wcols(self, query, fields, left=False, join=False,
#: build joins (inner, left outer) and table names
if join:
(
# FIXME? ijoin_tables is never used
ijoin_tables, ijoin_on, itables_to_merge, ijoin_on_tables,
Copy link
Contributor Author

@nextghost nextghost Aug 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible bug: ijoin_tables is never used.

@niphlod
Copy link
Member

niphlod commented Aug 25, 2016

boys I so don't like _select() not returning a string. People may have been grow fond of that.

On another entire matter, I don't feel writing that much of code over a raw t-sql when in need of such complications...the proposed piece of pydal code to write is maybe longer than the query itself, and it's not readable at all.
Also, writing adapters for all backends could be a nightmare. I feel that a relevant part of the problems you can solve with correlated subselects would be better served by window functions.

@gi0baro
Copy link
Member

gi0baro commented Aug 26, 2016

@niphlod on the other hand I find the idea quite interesting, but I'm thinking about it in the matter of how pydal builds sql strings.
I think would be quite powerful to change the way the dialects builds sql strings, 'cause using custom classes and objects (that could implement the __str__ method and behave like strings with __repr__ so users wouldn't notice the difference) would allow us to really customize the sql generation. Moreover this opens the chance to implement caching on sql generation, and to simplify the flow of quoting-unquoting strings when talking to the driver (= performance improvements).

I think we should really move the discussion on a different "floor" here, but is definitely another story from this PR and the idea behind it. @mdipierro what do you think?

@gi0baro gi0baro changed the title Joinable subselects [WIP] Joinable subselects Aug 26, 2016
@mdipierro
Copy link
Contributor

_select MUST return a string. it is in the specs.

select() makes a query and _select() gives me the corresponding SQL. Same for update and _update, insert and _insert, count and _count.

That is also needed for subselects like db(db.table.field.belongs(db(subquery)._select(db.table.field)).select()

We should not change it. It is ok to return an object that extends str.

@nextghost
Copy link
Contributor Author

If I understand correctly how Python strings work, making the Select class an extension of str would require building the full SQL subquery string immediately in Select.__new__() and never changing it afterwards. I don't think that's a good idea and preparing subselects that reference tables from parent query would be more complex than it needs to be. Expanding a Select object in the belongs() expression is already implemented in SQLDialect.belongs(). Only Mongo and Google adapters reimplement that and they don't seem to support nested selects there anyway. But if you insist on _select() returning a string, I'll just revert the one-line change there and move it to nested_select() instead.

@mdipierro
Copy link
Contributor

Yes. I would like that better. No need to break the current functionality since we can have a nested_select that is more clear anyway.

@codecov-io
Copy link

codecov-io commented Aug 31, 2016

Current coverage is 66.38% (diff: 85.03%)

Merging #408 into master will decrease coverage by 0.12%

@@             master       #408   diff @@
==========================================
  Files            69         69          
  Lines          8908       9102   +194   
  Methods           0          0          
  Messages          0          0          
  Branches       1836       1987   +151   
==========================================
+ Hits           5924       6042   +118   
- Misses         2482       2548    +66   
- Partials        502        512    +10   

Powered by Codecov. Last update a77e208...883b97e

@nextghost
Copy link
Contributor Author

It appears that nobody actually tested Rows.render(). Unit tests call it but don't check the output. If you called it as result.render(fields=[tab.f1, tab.f2, tab.f3]), those three fields would be rendered three times each. Fixed in the refactored version.

@gi0baro
Copy link
Member

gi0baro commented Sep 1, 2016

@mdipierro I'm for changing the behaviour: I think it's quite useless to stay on strings if we want to alter queries on the go, because it's just loosing time in building a string that will be rebuilt again and again. Having a SQLString object that behaves like a string, and that can be cached and takes care of quoting sounds quite promising regarding performance and customization.

@nextghost maybe before you continue with subselects we can design something more generic for the way strings are built in pyDAL, so that you can benefit from the new structure to implement subselects?

@niphlod opinions?

@nextghost
Copy link
Contributor Author

GAE adapter should be working again, at least according to unit tests. IMAP, Mongo and CouchDB adapters don't seem to support joins anyway so I'll skip refactoring them. However, I highly recommend refactoring the rest of self.db[tablename] anti-pattern later. It can break if you get too creative with table aliasing.

Also, BaseAdapter should probably have its own nested_select() method so that NoSQL adapters can throw NotOnNOSQLError from it.

@gi0baro I'm almost done with adapter refactoring so there's no point in waiting for another big redesign. Just 2 or 3 more commits and I can wrap it up by writing unit tests. But here are some tips you might find useful:

  • I recommend using different subclasses of Expression for each operation rather than storing a dialect method in the Expression object. E.g. the AndExpression should call Dialect._and() directly. Then you can easily test what kind of expression you're looking at using isinstance().
  • The new interface should have preliminary support for using placeholders in the finished query. For example the query expansion method should return a tuple ('SQL string', [placeholder values]). If you look at Select.query_name() in this PR, it already works like that.
  • The return value of Table.on() should not be an instance of Expression. You should probably keep it only for backward compatibility and instead implement Join objects that allow arbitrary tree of joins. The current interface for example doesn't allow you to do this: SELECT ... FROM a LEFT JOIN (b INNER JOIN c ON ...) ON ....

@niphlod
Copy link
Member

niphlod commented Sep 1, 2016

@gi0baro : on top of the notes on #411, I don't think that caching the generated query will solve THAT amount of performance to be noticeable.
Finalizing on the "_select() returning a string", I'd say that if
q = db(db.table.id>0)._select()
db.executesql(q[:-1])
works with what you want to make _select() do, we should be fine.

If we want to really save time, it should be saved in avoiding expanding arguments by ourselves, as the 99% of the drivers are mature enough to do it by themselves (and query plans for prepared statements are cached on the backend, that's one real speedup). It'll eventually close also the requirement for using binaries instead of base64 encoding them (and all the memory issues in heavy-processing pieces of code).

@nextghost
Copy link
Contributor Author

@niphlod Sorry, no. The bug is very much still present here. This pull request is already huge and I will not add another 200 lines of code just to fix something I didn't break myself.

We can discuss the solution in issue #411. I'll open a new pull request once this gets merged.

@mdipierro
Copy link
Contributor

what's the status here?

@niphlod
Copy link
Member

niphlod commented Oct 3, 2016

@mdipierro : everything scheduled for the next or the following release (meaning hopefully for december)

@nextghost
Copy link
Contributor Author

@mdipierro I still haven't had time to finish the unit tests. If things go well, I might get it done next week.

@nextghost
Copy link
Contributor Author

Aaaand, done. If you don't have any further comments or suggestions for this pull request, you can merge it.

@gi0baro gi0baro self-assigned this Nov 7, 2016
@gi0baro
Copy link
Member

gi0baro commented Nov 7, 2016

@nextghost @mdipierro will review this carefully in the next days.

Copy link
Member

@gi0baro gi0baro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made few questions and a suggestion for a change. I will also perform a performance test before the merge. Thank you for the hard work @nextghost

colnames = list(map(self._colexpand, fields))
sql_fields = ', '.join(map(self._geoexpand, fields))
for item in outer_scoped:
# FIXME: check for name conflicts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost what about this? Should be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing it would break backward compatibility with third-party code. outer_scoped is currently a list of internal table names. It would have to be changed into a dict of table objects, otherwise there is no way to check for conflicts (same name referencing different objects).

tokens = [table_alias(cross_joins[0])]
tokens += [self.dialect.cross_join(table_alias(t), query_env)
for t in cross_joins[1:]]
# FIXME: WTF? This is not correct syntax at least on PostgreSQL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost can you explain this better? Which is the correct syntax?

Copy link
Contributor Author

@nextghost nextghost Nov 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join_tables gets expanded below the comment into something like LEFT JOIN tab1, tab2, tab3, .... MySQL and PostgreSQL both raise syntax error on that but SQLite seems to accept it. At the very least, the expansion should be handled by the dialect, not SQLAdapter._select_wcols().

The ON clause is mandatory for each table in LEFT JOIN in MySQL and PostgreSQL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost ok, will inspect that apart from this PR

tokens += [self.dialect.cross_join(table_alias(t), query_env)
for t in tables_not_in_joinon[1:]]
tokens += [self.dialect.join(t, query_env) for t in ijoin_on]
# FIXME: WTF? This is not correct syntax at least on PostgreSQL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost same as the other comment

colnames, sql = self._colnames_cache, self._sql_cache
if with_alias and self._tablename is not None:
alias = self._db._adapter.dialect.quote(self._tablename)
if 'Oracle' in str(type(self._db._adapter)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost you should use dialect's _as method instead of checking the adapter type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit was copied from Table.__str__(). Should I fix both?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nextghost would be awesome :)

@gi0baro gi0baro modified the milestones: 17.01, 16.12 Nov 16, 2016
@gi0baro
Copy link
Member

gi0baro commented Nov 16, 2016

@nextghost can you also please update the description of this PR, also adding a quick example of the usage?

@nextghost nextghost changed the title [WIP] Joinable subselects Joinable subselects Nov 19, 2016
@nextghost
Copy link
Contributor Author

@gi0baro I've fixed table/subselect alias expansion as you requested and added another minor bugfix. I've also updated PR description and added usage example. (Should I remove the old description as well?)

But before you merge it, you should also take another look at the two pieces of code which I've highlighted right below the PR description.

@gi0baro
Copy link
Member

gi0baro commented Nov 22, 2016

Thank you @nextghost! The description looks good to me right now.
I will update this later during the week regarding your two comments and some benchmarks.
I feel quite sorry about the slowness of this but it's quite a large PR you know? :)

@gi0baro
Copy link
Member

gi0baro commented Nov 30, 2016

@nextghost ok, the performance seems quite the same from my benchmarks.
Regarding your comments, I would like to open up two new issues for those (as for the wrong syntax about the left joins) and discuss about them separately from this.

If you don't have any other considerations, I will merge this.

@nextghost
Copy link
Contributor Author

I have nothing more to add, you can merge this.

@gi0baro gi0baro merged commit 5cd99f5 into web2py:master Dec 1, 2016
@nextghost nextghost deleted the subselect branch December 2, 2016 14:36
valq7711 added a commit to valq7711/pydal that referenced this pull request Aug 26, 2017
* added codecov settings file

this should silence unnecessary messages from codecov. we can then
adjust settings going forward

* checkpoint

* possibly fixed the issue with id in import_from_csv

* fixed import_from_csv_file

* removed unwanted print

* another import fix

* more import debugging

* Enabled entity quoting as default also on DAL class (web2py#365)

* Enhancement/teradata lastrowid (web2py#363)

* Initial change to remove ESCAPE from teradata syntax

* Revert "Initial change to remove ESCAPE from teradata syntax"

This reverts commit 81489e9.

* Fix lastrowid error for Teradata adapter

* further update to teradata adapter

* Allow custom Rows classes (web2py#367)

* Added Rows.append and Rows.insert (web2py#366)

* [WIP] Indexes support (web2py#361)

* First implementation of web2py#152

* Returning True on index creation/drop

* Added where option on postgre create_index

* Added tests for indexes

* Added support for additional arguments to Table.create_index

* Better expansion in sql instructions for indexes, better errors for indexes

* Re-implemented _lastsql on DAL class (web2py#368)

* Fixed web2py#319 and web2py#333

* turn off codecov/changes report

hopefully :)

* Some fixes on StringIO usages with py3

* Better blobs handling with MongoDB and python3

* Added missing representation of 'password' fields on NoSQL adapters

* Updated mongoDB tests on py3 and blobs

* Updated changelog

* Releasing 16.06

* Added postgres3 adapters, put back multiple values type boolean parsing in base parser

* Removed postgres2 adapter from tests matrix since implicitly tested on postgres3

* Updated dev version

* Making connection and cursors fork-safe in multiprocess environments

* Fixing missing inheritance of additional fields expressions in dialects

* Releasing 16.06.09

* Updated dev version

* Added forced parsing of date and datetime fields in SQLite parser

* Added aggregation type detection test for fields

* Fix web2py#376

* Fixed web2py#377

* default quote_template for CommonDialect

* Enhance d13e82f

* Fix web2py#380

* Releasing 16.06.20

* Updated dev version

* fixes web2py#382

* Improving parsing performance (web2py#385)

* Fix web2py#383

* Releasing 16.06.28

* Updated dev version

* added rows.join('name',lambda ids: db.referenced.field.belongs(ids))

* remove test that depends on details of json serialization

* fixed removal order of new table in test

* yet simpler and smarter row.join

* fixed rows.join(...)

* one more improvement in join

* allow many2many in row.join

* ignore pool_size for sqlite

* better security

* enabled python3.5 on travis (web2py#393)

* fix py35 Field.__hash__(), close web2py#392 (web2py#394)

* added py35 to tox.ini

* Avoid to run travis tests on py35 and google

* fixes web2py#396

* Skip travis tests on python 3.3

* Releasing 16.07

* Updated dev version

* Fix web2py#399 - postgres adapter lastrowid() to use currval(seq) instead of lastval()

* reinstated self._last_insert, no need to remove working logic

* defied long in _compat

* merged pydal

* portalocker: removed pywin32 dep, added tests (web2py#401)

* Avoid upload errors on py3 if upload content is a bytes string

* Compat module cleaning

* Releasing 16.08

* Updated dev version

* MongoDB: testcase and fix for query(field==list:reference)

* Enhancement/teradata test (web2py#413)

* Initial change to remove ESCAPE from teradata syntax

* Revert "Initial change to remove ESCAPE from teradata syntax"

This reverts commit 81489e9.

* Initial teradata tests changes.  More to come

* first sql.py change, TestAddMethod now OK

* include teradata in setup

* Postgres fixes (web2py#414)

* Fixed postgres2 and postgres3 adapters inheritance, updated `PostgreMeta` to choose the right class depending on uri

* Fixed representation for lists on postgres2 and postgres3 adapters

* Fixed typo in PostgreDialectArrays

* Ensure strings in PostgreArrayRepresenter._listify_elements

* New test helper class DALtest that handles connection boilerplate (web2py#415)

* Enhancement/teradata test (web2py#418)

* Initial change to remove ESCAPE from teradata syntax

* Revert "Initial change to remove ESCAPE from teradata syntax"

This reverts commit 81489e9.

* TestBelongs for Teradata

* More Teradata test changes

* run the test.

* fix for opening files in py3 (web2py#420)

* fix for opening files in py3

fixes also web2py/web2py#1451

* fix for PY2 detection

* Releasing 16.09

* Updated dev version

* Enhancement/teradatatest (web2py#426)

* Initial change to remove ESCAPE from teradata syntax

* Revert "Initial change to remove ESCAPE from teradata syntax"

This reverts commit 81489e9.

* Skipping TestChainedJoinUnique for Teradata.

* Clean connection data on thread local on DAL.close (web2py#427)

* Releasing 16.11

* Updated dev version

* Joinable subselects (web2py#408)

* Minimal implementation of joinable subselects (work in progress)
* Generate subselects in Set.nested_select() instead of BaseAdapter._select()
* Add support for subselects to BaseAdapter._count()
* Prevent table name collisions when applying record versioning filters
* Fix common filters in select with subselects
* Add support for subselect fields to Rows.render() and fix bugs
* Refactor Google datastore adapter and BaseAdapter.get_table()
* Check for table name collisions when building a SELECT query
* When building subselect string, ignore cache if outer_scoped is not empty
* Minor optimization in SQLAdapter._select_wcols()
* Add parameter "correlated" to Select constructor defaults to True. When set to False, the subquery will be self-contained, meaning it will never reference any tables from the parent scope.
* Implement proper expansion of correlated subqueries
* Check that subquery in belongs() expression has exactly 1 column
* Implement expansion of correlated subqueries in count(), update(), delete()
* Update Mongo unit tests to new adapter API
* Move part of Set.nested_select() to adapter code
* Unit tests for the Select class
* Additional unit tests for fixed bugs
* Expand aliased table names using dialect methods
* Fix expansion of correlated subqueries used in UPDATE/DELETE statements

* Refactor table name attributes and fix related aliasing/rname bugs

* Update IMAP adapter methods to new API

* Add table alias support to UPDATE/DELETE in MSSQL dialect

* Store values for callbacks on insert/update/delete operations (web2py#431)

* Changed listify to store computations in values

* Moved attempt_upload logic to callbacks

* Fixed callbacks access to fields

* One more check that Table.with_alias() updates all field attributes

* Rename Table.sqlsafe and Table.sqlsafe_alias

* Refactor field name attributes and fix related rname bugs

* Fix type error in MSSQL TFK reference template

* Use the correct constraint name in MSSQL TFK references

* Added missing 'OpRow.get' method

* Fixed after_update callbacks argument in validate_and_update

* tablemap should contains all tables to avoid key errors in adapter _select_wcols

* Fix field migrations and some final refactoring

* Fixed wrong add operation with float types web2py#445

* Updated changelog

* Releasing 17.01

* Fix DAL(None).define_table()

* Fix perpetual migrations due to changes in name quoting

* Fix storing Rows objects in pickle-based cache

* Pass the list of field references when creating a subset of Rows object

* Disable part of pickling test on MSSQL, driver problem...

* Completely disable the pickling test on MSSQL (driver problem...)

* Updated dev version

* Correctly configure adapters that need connection for configuration

* Added locking for reconnect method switch, created a mixin for adapters that need reconnection mocking

* added tests for py36

* Always find the driver on adapters

* Cleaner solution for web2py#446

* Updated changelog

* Releasing 17.03

* added missing "oracle" in adapters.register_for() (web2py#457)

Fixes web2py/web2py#1599

* field.set_attributes(..) could return the instance (web2py#458)

Would let the code be  more straightforward -- example, when passing the "decorated" field to the function...
Now I have to   "scatter" the code in two places for each field...

* testing py36 on appveyor (web2py#456)

* Fix get in pydal.helpers.classes Reference class calling __getattr__ … (web2py#465)

* Fix get in pydal.helpers.classes Reference class calling __getattr__ with 3 variables instead of 2 as per signature

* Move fix of wrong method call in Reference class from get to __getattr__

* Make DAL support jsonb for postgres (web2py#469)

* Make DAL support jsonb for postgres

* added parser for jsonb

* experimantal code to address issue #1613

* fixed error introduced in recent refactoring by me

* mysql and postgres unix_socket support

* fixed _find_work_folder

* support for gcloud sql v2. thanks Karoly Kantor

* small typo in warnings

* Releasing 17.07

* fixed issue web2py#480

* Fixes web2py/web2py#1683 (web2py#484)

* allow DAL(..., adapter_args=dict(migrator=InDBMigrator))

* fixed custom migrator

* Updated dev version

* Cleaning up f71aade

* Fix Table objects in common fields (web2py#486)

This fixes the problem of putting and tables on common_fields and having lazy_define_table throw an exception because Table doesn't have a clone attribute.

* fixed typo changing `oracle_fix` to `cmd_fix` in adapters/oracle.py (web2py#471)

fixed error where %-formats were only applying to second part of concatenated SQL statement in dialects/oracle.py

* override _select_aux_execute method in adapters/oracle.py to use oracle-specific `fetchall` (web2py#473)

* fixed MSSQL TOP

* possible fix for tTIZLy33Yw8

* fixed del of missing key, thanks Lazaro Aulan

* Fixed st_asgeojson (web2py#489)

see https://groups.google.com/forum/#!topic/web2py/EixMPrPr9SY
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants