Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use EXECUTE IMMEDIATE for parameterized statement executions #375

Merged
merged 1 commit into from
May 25, 2023

Conversation

aalbu
Copy link
Member

@aalbu aalbu commented May 17, 2023

Description

Make parametrized statement executions more efficient by taking advantage of trinodb/trino#17341.

When running against older versions of Trino that don't support the EXECUTE IMMEDIATE statement, the old behavior can be restored by passing legacy_prepared_statements=True when creating a connection.

Draft PR until the back-end change is released.

Non-technical explanation

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* Fix some things. ({issue}`issuenumber`)

@hashhar
Copy link
Member

hashhar commented May 17, 2023

When running against older versions of Trino that don't support the EXECUTE IMMEDIATE statement, the old behavior can be restored by passing legacy_prepared_statements=True when creating a connection.

Since client knows server version can we do this automatically instead of user having to opt-into it? We should only have the toggle so that if someone wants to manually revert to old PREPARE + EXECUTE + DEALLOCATE they can do so.
The default behaviour should be to auto-cofigured.

cc: @findepi is this reasonable?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have existing tests with query text which needs escaping? Since EXECUTE IMMEDIATE accepts query text as a string literal and hence quotes need to be escaped.

trino/dbapi.py Outdated Show resolved Hide resolved
@hovaesco
Copy link
Member

Since client knows server version can we do this automatically instead of user having to opt-into it?

How client knows server version? I assume client need to execute SELECT version() to know it.

@aalbu
Copy link
Member Author

aalbu commented May 17, 2023

do we have existing tests with query text which needs escaping? Since EXECUTE IMMEDIATE accepts query text as a string literal and hence quotes need to be escaped.

We do: https://github.com/trinodb/trino-python-client/blob/master/tests/integration/test_dbapi_integration.py#L107-L120

@aalbu aalbu force-pushed the aa/execute-immediate branch 3 times, most recently from 7b5b6e1 to 67563ba Compare May 18, 2023 03:50
@hovaesco
Copy link
Member

hovaesco commented May 18, 2023

Some checks are failing, please run pre-commit install to run these checks locally before committing.

@aalbu aalbu force-pushed the aa/execute-immediate branch 2 times, most recently from 1c75a46 to f404520 Compare May 18, 2023 11:54
@aalbu aalbu marked this pull request as ready for review May 18, 2023 12:05
@aalbu aalbu requested a review from hashhar May 18, 2023 12:06
@findepi
Copy link
Member

findepi commented May 18, 2023

Since client knows server version can we do this automatically instead of user having to opt-into it? We should only have the toggle so that if someone wants to manually revert to old PREPARE + EXECUTE + DEALLOCATE they can do so.
The default behaviour should be to auto-cofigured.

I agree that -- if this feature is important -- it should auto-configure.
People should typically be expected not to know to turn a thing on.

Alternatively, we enable it by default after some time, once we believe only compliant server versions are still in use.
(Even with auto-configure, we should remove the auto-configuration code after some time).

Since client knows server version can we do this automatically instead of user having to opt-into it?

How client knows server version? I assume client need to execute SELECT version() to know it.

This would work. Of course, server can return any version string, so we need to guess whether this looks like an official Trino number and not fail if it doesn't.

Also note that it would be good to avoid additional round-trip to the server to configure the feature. Can we piggy-back on some pre-existing call? Can we auto-enable the feature eg starting from second query on a connection?

@aalbu
Copy link
Member Author

aalbu commented May 18, 2023

I agree that -- if this feature is important -- it should auto-configure. People should typically be expected not to know to turn a thing on.

Alternatively, we enable it by default after some time, once we believe only compliant server versions are still in use. (Even with auto-configure, we should remove the auto-configuration code after some time).

Perhaps my approach of enabling by default is overly aggressive. I was thinking that by causing a failure, we encourage people to upgrade to a new version of Trino, though I realize it would be a frustrating user experience, because it would not be clear from the exception. I was thinking our release notes could document the behavior, but nobody RTFM.

How client knows server version? I assume client need to execute SELECT version() to know it.

This would work. Of course, server can return any version string, so we need to guess whether this looks like an official Trino number and not fail if it doesn't.

Also note that it would be good to avoid additional round-trip to the server to configure the feature. Can we piggy-back on some pre-existing call? Can we auto-enable the feature eg starting from second query on a connection?

Right, the issue we've been trying to address with this change is reducing the number of round-trips. I am not sure how we could piggy-back on an existing call. We could get the version before executing the first statement on a connection and cache the result. Is there a smarter way to do it? Note that we've seen some usage patterns that don't reuse connections (e.g. because they execute statements with different roles or on behalf of different users). Such situations would not benefit from caching the version.

But to summarize, is the recommendation to use an auto-detection mechanism and a toggle to allow users to configure behavior?

@aalbu aalbu force-pushed the aa/execute-immediate branch 5 times, most recently from 5e189ce to 18ccb30 Compare May 20, 2023 04:03
trino/dbapi.py Outdated Show resolved Hide resolved
@aalbu aalbu force-pushed the aa/execute-immediate branch 3 times, most recently from f71a830 to f4fb6d2 Compare May 22, 2023 00:58
@aalbu
Copy link
Member Author

aalbu commented May 22, 2023

@findepi, @hashhar, @mdesmet PTAL.

trino/dbapi.py Show resolved Hide resolved
trino/dbapi.py Outdated Show resolved Hide resolved
trino/dbapi.py Outdated Show resolved Hide resolved
@aalbu aalbu force-pushed the aa/execute-immediate branch 3 times, most recently from 80545e4 to dc54d46 Compare May 22, 2023 11:40
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % one last comment

trino/dbapi.py Outdated
Comment on lines 292 to 290
# not updating the cache
value = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of failure with the query we'd not cache the failure and keep retrying on every new connection creation.

Is the failure expected to be transient? IMO we should cache failure as well and revert to legacy statements in such cases - there's no downside to doing so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking the failure would be transient, but your suggestion makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this, what do you think about changing the query used for checking the capability. Instead of relying on the version check, just see if the new statement works. Basically run EXECUTE IMMEDIATE 'SELECT 1' and see if it succeeds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me.

I also see that SELECT can be constrained via resource groups/access control, I'm not sure if EXECUTE has such restrictions as well.

Copy link
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except small style remark.

trino/dbapi.py Outdated Show resolved Hide resolved
@hashhar
Copy link
Member

hashhar commented May 24, 2023

Please squash after applying #375 (comment) (if you want to add that) or squash otherwise too.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

cc: @electrum on similar caching approach we can follow in JDBC driver when switching prepared statements to EXECUTE IMMEDIATE.

@hashhar hashhar merged commit 784fff3 into trinodb:master May 25, 2023
12 checks passed
@hashhar
Copy link
Member

hashhar commented May 25, 2023

@aalbu please check if the release notes at #377 (comment) properly capture the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants