Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to specify catalog as a SQLALchemy table argument #186

Merged
merged 1 commit into from
Aug 16, 2022

Conversation

laserkaplan
Copy link
Member

@laserkaplan laserkaplan commented Jun 23, 2022

This change allows for specifying a trino_catalog table argument to SQLALchemy Table objects, which is then checked when compiling statements and prepended to the proper objects. This allows for writing queries that talk to multiple catalogs at the same time.

This change was largely implemented by @AlexandreOuellet in the PyHive repository, with some minor edits by @VinceDPM and myself. However, since that repository is no longer well-maintained, and since Trino seems to have better support currently than Presto on the whole, I have switched my own focus to using Trino, and thus wanted to get this functionality working here as well.

As this is the first time I've attempted to contribute here, please let me know if I am missing anything in this PR! It would be great if this could be merged in an efficient manner.

@cla-bot
Copy link

cla-bot bot commented Jun 23, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@laserkaplan laserkaplan changed the title Add ability to specify catalog as a table argument Add ability to specify catalog as a SQLALchemy table argument Jun 23, 2022
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you submit CLA if you haven't yet sent it?

@@ -44,3 +44,16 @@ def test_offset(dialect):
statement = select(table).offset(0)
query = statement.compile(dialect=dialect)
assert str(query) == 'SELECT "table".id, "table".name \nFROM "table"\nOFFSET :param_1'


def test_multiple_catalogs(dialect):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "multiple catalogs" mean. Could you rename the test method or leave a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the test name to test_catalog_argument.

trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved
@cla-bot
Copy link

cla-bot bot commented Jun 24, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@laserkaplan
Copy link
Member Author

Could you submit CLA if you haven't yet sent it?

I sent my signed CLA form yesterday. Please let me know if it hasn't been received!

@ebyhr
Copy link
Member

ebyhr commented Jul 4, 2022

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Jul 4, 2022
@cla-bot
Copy link

cla-bot bot commented Jul 4, 2022

The cla-bot has been summoned, and re-checked this pull request!

@laserkaplan
Copy link
Member Author

I have added an additional bit to this PR since it hadn't been approved/merged yet. The visitors worked when compiling SQL statements, but another method was needed to do the same functionality for DDL statements (like CREATE TABLE).

@laserkaplan laserkaplan requested a review from ebyhr July 6, 2022 17:28
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase on upstream.

'\n'\
'CREATE TABLE "system".information_schema."table" (\n'\
'\tid INTEGER NOT NULL, \n'\
'\tPRIMARY KEY (id)\n'\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DDL looks invalid as Trino query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the table to use a fake other catalog so that this test doesn't potentially interfere with the system catalog.

column, add_to_result_map, include_table, **kwargs
)
table = column.table
return self.add_catalog(sql, table)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the benefit for table, but not sure about column. Could you add a test case to show the usecase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that it's not needed. Removed the column visitor.

@laserkaplan
Copy link
Member Author

Hi all, is there anything else blocking this PR from completion?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @mdesmet Can you please take a look as well?

@laserkaplan Please squash the commits now (since it's one logical change).

table_with_catalog = Table(
'table',
metadata,
Column('id', Integer, primary_key=True),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the primary_key=True required for the test? I don't think so. If so please remove it since it distracts attention from what the test and this change is actually about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the DDL which is used below - the PRIMARY KEY seems unrelated to the feature being added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the primary_key=True just above this one is pre-existing but that should also be removed. Unfortunate that it didn't get caught when it was introduced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed this from both temp tables, as well as in this test.

Comment on lines +21 to +23
CTE = type(None)
Subquery = type(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment here why this is ok to do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is because the CTE and Subquery classes weren't introduced until SQLAlchemy 1.4. Added a comment.

Copy link
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved
trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved
trino/sqlalchemy/compiler.py Outdated Show resolved Hide resolved
@laserkaplan
Copy link
Member Author

Please squash the commits now (since it's one logical change).

Is "squash and merge" enabled on the repo for when this is ready to merge?

@mdesmet
Copy link
Contributor

mdesmet commented Aug 11, 2022

Is "squash and merge" enabled on the repo for when this is ready to merge?

AFAIK we don't do squash and merge, only rebase and merge.

Copy link
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % commit squash

def test_catalogs_argument(dialect):
statement = select(table_with_catalog)
query = statement.compile(dialect=dialect)
assert str(query) == 'SELECT default."table".id \nFROM "other".default."table"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pre-existing but any idea why the schema name doesn't get quoted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked into this a little. visit_table quotes table names by default, but it checks whether schema names "need" to be quoted (e.g. if they are a reserved word). The PyHive package got around this by considering everything to be a reserved word, which would trigger this quoting. I don't think it's necessary here, but it definitely is a little weird to see schema names not quoted while table names are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and explaining. I think we can leave as is for now since from the few queries I tested with special schema names it was able to handle them correctly. I didn't try too hard to break it though to be honest. 😄

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reworded the commit message. LGTM.

@hashhar hashhar merged commit aee6064 into trinodb:master Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants