-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[WIP] Add delta sharing support in delta lake #27158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] Add delta sharing support in delta lake #27158
Conversation
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Reviewer's GuideThis PR adds Delta Sharing support to the Trino Delta Lake plugin by introducing a new REST client to implement the Delta Sharing protocol, a profile class to load and manage sharing credentials, and updating the plugin’s POM to include the required HTTP client dependency. Sequence diagram for listing tables via Delta Sharing REST clientsequenceDiagram
participant Client
participant DeltaSharingRestClient
participant HttpClient
Client->>DeltaSharingRestClient: listTables(shareName, schemaName, ...)
DeltaSharingRestClient->>HttpClient: HTTP GET /shares/{share}/schemas/{schema}/tables
HttpClient-->>DeltaSharingRestClient: JSON response
DeltaSharingRestClient-->>Client: TablesResponse
Sequence diagram for retrieving table metadata via Delta Sharing REST clientsequenceDiagram
participant Client
participant DeltaSharingRestClient
participant HttpClient
Client->>DeltaSharingRestClient: getTableMetadata(shareName, schemaName, tableName)
DeltaSharingRestClient->>HttpClient: HTTP POST /shares/{share}/schemas/{schema}/tables/{table}/metadata
HttpClient-->>DeltaSharingRestClient: NDJSON response
DeltaSharingRestClient-->>Client: TableMetadata
Class diagram for DeltaSharingRestClient and related typesclassDiagram
class DeltaSharingRestClient {
-HttpClient httpClient
-DeltaSharingProfile profile
+listShares(maxResults, pageToken) SharesResponse
+listSchemas(shareName, maxResults, pageToken) SchemasResponse
+listTables(shareName, schemaName, maxResults, pageToken) TablesResponse
+listAllTables(shareName, maxResults, pageToken) TablesResponse
+queryTableVersion(shareName, schemaName, tableName, startingTimestamp) long
+getTableMetadata(shareName, schemaName, tableName) TableMetadata
}
class DeltaSharingProfile {
-int shareCredentialsVersion
-String endpoint
-Optional<String> bearerToken
-Optional<String> expirationTime
+isExpired() boolean
+fromFile(profilePath) DeltaSharingProfile
}
class SharesResponse {
-List<Share> items
-Optional<String> nextPageToken
+getItems() List<Share>
+getNextPageToken() Optional<String>
}
class SchemasResponse {
-List<Schema> items
-Optional<String> nextPageToken
+getItems() List<Schema>
+getNextPageToken() Optional<String>
}
class TablesResponse {
-List<Table> items
-Optional<String> nextPageToken
+getItems() List<Table>
+getNextPageToken() Optional<String>
}
class Share {
-String name
-String id
+getName() String
+getId() Optional<String>
}
class Schema {
-String name
-String share
+getName() String
+getShare() String
}
class Table {
-String name
-String share
-String schema
+getName() String
+getShare() String
+getSchema() String
}
class TableMetadata {
-String deltaPath
-Protocol protocol
-Metadata metadata
+getDeltaPath() String
+getProtocol() Protocol
+getMetadata() Metadata
}
class Protocol {
-int minReaderVersion
-int minWriterVersion
+getMinReaderVersion() int
+getMinWriterVersion() int
}
class Metadata {
-String id
-String name
-String description
-String schemaString
+getId() String
+getName() Optional<String>
+getDescription() Optional<String>
+getSchemaString() String
}
DeltaSharingRestClient --> DeltaSharingProfile
DeltaSharingRestClient --> SharesResponse
DeltaSharingRestClient --> SchemasResponse
DeltaSharingRestClient --> TablesResponse
DeltaSharingRestClient --> TableMetadata
SharesResponse --> Share
SchemasResponse --> Schema
TablesResponse --> Table
TableMetadata --> Protocol
TableMetadata --> Metadata
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In parseTableMetadata you’re constructing deltaPath with the literal strings “share”, “schema”, and “table” instead of using the actual shareName, schemaName, and tableName—consider passing those values into the parser or building the path before calling it.
- Remove the System.out.println debug statements in parseTableMetadata and replace them with proper logging or include debugging context in the thrown exception.
- Avoid manual concatenation of path segments (e.g. "/shares/" + shareName + "/schemas/" + schemaName) and instead use the URI builder’s path‐segment methods to ensure correct encoding and avoid double slashes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In parseTableMetadata you’re constructing deltaPath with the literal strings “share”, “schema”, and “table” instead of using the actual shareName, schemaName, and tableName—consider passing those values into the parser or building the path before calling it.
- Remove the System.out.println debug statements in parseTableMetadata and replace them with proper logging or include debugging context in the thrown exception.
- Avoid manual concatenation of path segments (e.g. "/shares/" + shareName + "/schemas/" + schemaName) and instead use the URI builder’s path‐segment methods to ensure correct encoding and avoid double slashes.
## Individual Comments
### Comment 1
<location> `plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/metastore/deltasharing/DeltaSharingRestClient.java:335` </location>
<code_context>
+ }
+
+ // Delta path is constructed from share/schema/table
+ String deltaPath = String.format("%s#%s.%s.%s", profile.getEndpoint(), "share", "schema", "table");
+
+ return new TableMetadata(deltaPath, protocol, metadata);
</code_context>
<issue_to_address>
**issue (bug_risk):** Hardcoded share, schema, and table names in deltaPath construction.
Use shareName, schemaName, and tableName variables in deltaPath instead of hardcoded strings to ensure correct path construction.
</issue_to_address>
### Comment 2
<location> `plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/metastore/deltasharing/DeltaSharingRestClient.java:328-330` </location>
<code_context>
+
+ if (protocol == null || metadata == null) {
+ // Add debugging information
+ System.out.println("Debug: Response JSON: " + json);
+ System.out.println("Debug: Protocol found: " + (protocol != null));
+ System.out.println("Debug: Metadata found: " + (metadata != null));
+ throw new IOException("Invalid table metadata response: missing protocol or metadata");
+ }
</code_context>
<issue_to_address>
**issue:** Debugging output left in production code.
Please remove these System.out.println statements or replace them with proper logging to avoid cluttering production logs and exposing sensitive data.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| } | ||
|
|
||
| // Delta path is constructed from share/schema/table | ||
| String deltaPath = String.format("%s#%s.%s.%s", profile.getEndpoint(), "share", "schema", "table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Hardcoded share, schema, and table names in deltaPath construction.
Use shareName, schemaName, and tableName variables in deltaPath instead of hardcoded strings to ensure correct path construction.
| System.out.println("Debug: Response JSON: " + json); | ||
| System.out.println("Debug: Protocol found: " + (protocol != null)); | ||
| System.out.println("Debug: Metadata found: " + (metadata != null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Debugging output left in production code.
Please remove these System.out.println statements or replace them with proper logging to avoid cluttering production logs and exposing sensitive data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend starting a discussion in Trino Slack first. https://trino.io/slack.html
A separate connector might be a better approach than modifying the Delta Lake connector.
06e9ee7 to
cec650b
Compare
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
I had started a thread two months ago: |
Description
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Summary by Sourcery
Implement Delta Sharing protocol support in the Delta Lake plugin by adding a REST client and profile loader to list shares, schemas, tables, query table versions, and fetch table metadata.
New Features:
Build: