Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jdbc merge support #20532

Closed
wants to merge 10,000 commits into from
Closed

Jdbc merge support #20532

wants to merge 10,000 commits into from

Conversation

chenjian2664
Copy link
Contributor

@chenjian2664 chenjian2664 commented Feb 1, 2024

Description

This is a follow up of #16693

It mainly implements several important interfaces in the https://trino.io/docs/current/develop/supporting-merge.html#connector-support-for-merge document:

  • ConnectorMergeSink API. In storeMergedRows, the page data is classified into INSERT, UPDATE, DELETE three categories, and then call the corresponding Sink class to operate the corresponding Connector.
    INSERT data is processed through the existing JdbcPageSink.
    UPDATE and DELETE type data are modified through the primary keys. The implementation reuses the JdbcPageSink code and only changes the executed sql statement. The syntax used when deleting data using primary key constraints in each database may be different, so when creating the Sink class, mergeRowIdConjuncts will be obtained from BaseJdbcClient (BaseJdbcClient#buildMergeRowIdConjuncts)

  • getMergeRowIdColumnHandle API. For connectors that support Merge, return a RowType containing all primary key column information, otherwise return as original.

  • beginMerge API. Returns JdbcMergeTableHandle, which contains all columns that need to be scanned, (implemented in BaseJdbcClient#updatedScanColumnsForMerge), and the tableHandle of Insert.

  • In the third commit, FTE needs to be supported, so the finishMerge API is also implemented. Write all insert+delete data to the temporary table first, and then update the target table.

Additional context and related issues

There are also some details in #16944

Release notes

(x) Release notes are required, with the following suggested text:

# Base Jdbc
* Support SQL MERGE for base jdbc connectors ({issue}`16709`)
* Update Phoenix merge implementation by reusing base Jdbc merge implementation.
* Support update/merge for Ignite
* Support update/merge for Postgresql

@chenjian2664
Copy link
Contributor Author

chenjian2664 commented Feb 2, 2024

@kokosing I divide the #16944 into those 3 commits, each commit is mainly support merge in one connector(Phoenix, Ignite, Postgresql). Oracle connector will be added latter since it requires to modify some base module tests. I hope this pr is cleaner for your review.

@kokosing
Copy link
Member

kokosing commented Feb 5, 2024

@wendigo @vlad-lyutenko @hashhar would you like to review this PR?

@chenjian2664
Copy link
Contributor Author

@electrum Would you mind to have a look when you have sometime? thanks

@hashhar hashhar self-requested a review February 21, 2024 13:18
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the implementation pieces, only skimmed the test code.

Have left some initial comments, the overall mechanism appears "correct" to me % details like column quoting, having to fetch metadata multiple times within a single query (higher chances of inconsistency because of concurrent changes in underlying DB) etc.

@hashhar hashhar mentioned this pull request Mar 4, 2024
@chenjian2664 chenjian2664 marked this pull request as draft March 11, 2024 11:01
@chenjian2664
Copy link
Contributor Author

Postgresql tests failures due to OOM will be resolved #21127

@chenjian2664 chenjian2664 marked this pull request as ready for review March 18, 2024 07:25
@chenjian2664
Copy link
Contributor Author

@hashhar @kokosing Gentle reminder.

@chenjian2664
Copy link
Contributor Author

@hashhar @kokosing Would you mind to have a review now

@findepi
Copy link
Member

findepi commented Apr 25, 2024

Can you please expand description section of the PR with high level overview of approach taken?
ie. how it works.

@chenjian2664
Copy link
Contributor Author

chenjian2664 commented Apr 30, 2024

@findepi @hashhar @kokosing PTAL

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 27, 2024
@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels May 27, 2024
@nizarhejazi
Copy link

Hey @chenjian2664, @mosabua, @findepi any update on this PR? We would love to use this feature w/ dbt-trino adapter.

@chenjian2664
Copy link
Contributor Author

Hey @chenjian2664, @mosabua, @findepi any update on this PR? We would love to use this feature w/ dbt-trino adapter.

Waiting for the review

@findepi
Copy link
Member

findepi commented Jun 3, 2024

cc @djsstarburst @electrum for more MERGE support

@nizarhejazi
Copy link

Hey team, any update on the status of this PR?

@LovAsawa-Draup
Copy link

LovAsawa-Draup commented Jul 12, 2024

Would love to have merge support for postgresql connector, as Update with help of from query is not directly supported.
Any timeline when this feature can go live

@mosabua
Copy link
Member

mosabua commented Jul 12, 2024

We are trying to give this priority in the review and merge queue at the moment. We definitely plan on getting this in. Timing is to be determined though.

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some incrementa comments

Copy link

cla-bot bot commented Aug 13, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Jason.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot added cla-signed and removed cla-signed labels Aug 13, 2024
@github-actions github-actions bot added the jdbc Relates to Trino JDBC driver label Aug 14, 2024
@chenjian2664 chenjian2664 closed this by deleting the head repository Aug 14, 2024
@shohamyamin
Copy link

@chenjian2664 Any update on that issue?
And you know if the Oracle connector will also benefit from that? Merge in oracle connector would be very useful for me

@chenjian2664
Copy link
Contributor Author

@shohamyamin for some reason, we move the working on #23034 now.
The workaround is also fit for the Oracle connector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs jdbc Relates to Trino JDBC driver stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.