Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HTTP transport in thrift metastore client #20371

Merged
merged 2 commits into from Feb 20, 2024

Conversation

vihangk1
Copy link
Member

@vihangk1 vihangk1 commented Jan 14, 2024

Description

This PR adds support for using HTTP as the transport for the thrift metastore client. In this mode the hive.metastore.uri can support a http(s) URL to the metastore service. Since each thrift API is a POST request to the the metastore endpoint, this PR also adds support for adding a HTTP bearer token which must be configured to authenticate the metastore client to the metastore server. The token can be configured using a configuration hive.metastore.http.client.bearer-token. To support Databricks Unity Catalog's HMS API interface, the configuration hive.metastore.http.client.additional-headers must be set to X-Databricks-Unity-Catalog-Name=<catalog_name> where the catalog_name is the name of the catalog object in Databricks Unity Catalog.

Additional context and related issues

Thrift has support HTTP based transport for a long time. Ref https://github.com/apache/thrift/blob/master/lib/javame/src/org/apache/thrift/transport/THttpClient.java. We have used the http mode for thrift on various services in other projects like HiveServer2 in Apache Hive and Apache Spark for a while now.

More recently, Apache Hive also added support for Hive Metastore server to use HTTP mode.
https://issues.apache.org/jira/browse/HIVE-21456

This PR modifies the logic in HttpThriftMetastoreClientFactory to instantiate THttpClient for the transport when creating the thrift metastore client.

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

Add HTTP support for the thrift metastore client.

@cla-bot cla-bot bot added the cla-signed label Jan 14, 2024
@github-actions github-actions bot added docs tests:hive hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Jan 14, 2024
@ebyhr
Copy link
Member

ebyhr commented Jan 14, 2024

What's the difference from #17925?

@vihangk1
Copy link
Member Author

What's the difference from #17925?

This PR attempts the recommendation from @findinpath to try to completely separate the code path for thrift metastore client and http metastore client to avoid the concern raised by you about having to validate configs like impersonation which do not work with http metastore client in HivePlugin.

This is done by creating a http client factory instead of overloading the Default thrift based client factory. I feel this could be a cleaner approach but would love to understand your opinion on this approach before spending more time and taking this further. Let me know if you think this is better approach than #17925. If yes, I can discard this PR and update the original one with these changes.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think your submission is shaping up what we talked two months ago.
let's see what your next iteration brings.

i'm thinking that this is in line to cover what @ebyhr was concerned about in #17925

@vihangk1
Copy link
Member Author

i think your submission is shaping up what we talked two months ago.

I am happy to address these comments but with all due respect I would like to hear from @ebyhr if this is the approach which is strictly better and would make this PR worthy to be merged eventually. Unfortunately, this work has taken more than 6 months of back and forth (including holidays) and I would like to make sure that any time and energy that I spent on this is eventually going to take this PR towards getting it merged from @ebyhr or anyone else who can help merge the PR.

Also, should we abandon this PR and update the original PR or do you want me to use this PR from now?

@ebyhr
Copy link
Member

ebyhr commented Jan 17, 2024

@vihangk1 This approach looks better than the previous one. Please feel free to close the old PR.

Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits

The only question is whether the following binding in the ThriftMetastoreModule when dealing with a http/https metastore is actually justified:

configBinder(binder).bindConfig(ThriftMetastoreConfig.class);

I hope not.

@findinpath
Copy link
Contributor

@vihangk1 pls also rebase on trino/master to address the code conflicts.

@vihangk1
Copy link
Member Author

vihangk1 commented Feb 3, 2024

What is the exception you're getting?

If I comment out ThriftMetastoreConfig binding, I get the following error when I run TestHivePlugin

[ERROR] Errors:
[ERROR] TestHivePlugin.testHttpMetastoreConfigs:236 » Creation Unable to create injector, see the following errors:

  1. [Guice/JitDisabled]: Explicit bindings are required and ThriftMetastoreConfig is not explicitly bound.
    at ThriftMetastoreModule.createWriteStatisticsExecutor(ThriftMetastoreModule.java:114)
    _ for 1st parameter hiveConfig
    at ThriftMetastoreModule.createWriteStatisticsExecutor(ThriftMetastoreModule.java:114)
    _ installed by: HiveMetastoreModule -> ConditionalModule -> ThriftMetastoreModule

@findinpath
Copy link
Contributor

@vihangk1 here is a patch to solve the binding problem which was keeping you from removing the dependency towards ThriftMetastoreConfig

Subject: [PATCH] hive bindings
---
Index: plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java b/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java
--- a/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java	
+++ b/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java	
@@ -14,10 +14,11 @@
 package io.trino.plugin.hive.metastore.thrift;
 
 import com.google.inject.Binder;
+import com.google.inject.Inject;
 import com.google.inject.Key;
-import com.google.inject.Provides;
+import com.google.inject.Provider;
 import com.google.inject.Scopes;
-import com.google.inject.Singleton;
+import com.google.inject.TypeLiteral;
 import com.google.inject.multibindings.OptionalBinder;
 import io.airlift.configuration.AbstractConfigurationAwareModule;
 import io.trino.plugin.base.security.UserNameProvider;
@@ -55,9 +56,9 @@
                     .setDefault().to(HttpThriftMetastoreClientFactory.class).in(Scopes.SINGLETON);
             binder.bind(IdentityAwareMetastoreClientFactory.class).to(StaticTokenAwareHttpMetastoreClientFactory.class).in(Scopes.SINGLETON);
             configBinder(binder).bindConfig(StaticMetastoreConfig.class);
-            configBinder(binder).bindConfig(ThriftHttpMetastoreConfig.class);
-            configBinder(binder).bindConfig(ThriftMetastoreConfig.class);
             binder.bind(ThriftMetastoreFactory.class).to(ThriftHttpMetastoreFactory.class).in(Scopes.SINGLETON);
+            newOptionalBinder(binder, Key.get(new TypeLiteral<ExecutorService>() {}, ThriftHiveWriteStatisticsExecutor.class))
+                    .setDefault().toInstance(newFixedThreadPool(20, threadsNamed("http-thrift-statistics-write-%s")));
         }
         else {
             OptionalBinder.newOptionalBinder(binder, ThriftMetastoreClientFactory.class)
@@ -65,6 +66,8 @@
             binder.bind(TokenAwareMetastoreClientFactory.class).to(StaticTokenAwareMetastoreClientFactory.class).in(Scopes.SINGLETON);
             configBinder(binder).bindConfig(StaticMetastoreConfig.class);
             configBinder(binder).bindConfig(ThriftMetastoreConfig.class);
+            newOptionalBinder(binder, Key.get(new TypeLiteral<ExecutorService>() {}, ThriftHiveWriteStatisticsExecutor.class))
+                    .setDefault().toProvider(ThriftHiveMetastoreStatisticExecutorProvider.class).in(Scopes.SINGLETON);
             install(new ThriftMetastoreAuthenticationModule());
             binder.bind(ThriftMetastoreFactory.class).to(ThriftHiveMetastoreFactory.class).in(Scopes.SINGLETON);
         }
@@ -108,17 +111,27 @@
         }
     }
 
-    @Provides
-    @Singleton
-    @ThriftHiveWriteStatisticsExecutor
-    public ExecutorService createWriteStatisticsExecutor(ThriftMetastoreConfig hiveConfig)
-    {
-        return newFixedThreadPool(hiveConfig.getWriteStatisticsThreads(), threadsNamed("hive-thrift-statistics-write-%s"));
-    }
-
     @PreDestroy
     public void shutdownsWriteStatisticExecutor(@ThriftHiveWriteStatisticsExecutor ExecutorService executor)
     {
         executor.shutdownNow();
     }
+
+    private static class ThriftHiveMetastoreStatisticExecutorProvider
+            implements Provider<ExecutorService>
+    {
+        private final ThriftMetastoreConfig thriftMetastoreConfig;
+
+        @Inject
+        public ThriftHiveMetastoreStatisticExecutorProvider(ThriftMetastoreConfig thriftMetastoreConfig)
+        {
+            this.thriftMetastoreConfig = thriftMetastoreConfig;
+        }
+
+        @Override
+        public ExecutorService get()
+        {
+            return newFixedThreadPool(thriftMetastoreConfig.getWriteStatisticsThreads(), threadsNamed("hive-thrift-statistics-write-%s"));
+        }
+    }
 }

@vihangk1 vihangk1 force-pushed the master_thrift-http-client-factory branch from e623f6d to 74b63f6 Compare February 5, 2024 01:43
@vihangk1 vihangk1 changed the title [DO-NOT-MERGE] Master thrift http client factory Add support for HTTP transport in thrift metastore client Feb 5, 2024
@findinpath
Copy link
Contributor

Build is 🔴

Error:  Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.12.1:compile (default-compile) on project trino-hive: Compilation failure
Error:  /home/runner/work/trino/trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastoreModule.java:[107,49] [UnnecessaryParentheses] These grouping parentheses are unnecessary; it is unlikely the code will be misinterpreted without them
Error:      (see https://errorprone.info/bugpattern/UnnecessaryParentheses)
Error:    Did you mean 'throw new IllegalStateException( "'hive.metastore.http.client.authentication.type' must be set while using http/https metastore URIs in 'hive.metastore.uri'");'?

@vihangk1 vihangk1 force-pushed the master_thrift-http-client-factory branch 2 times, most recently from 0e5df58 to 8e52e2e Compare February 6, 2024 02:13
Copy link
Contributor

@findinpath findinpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comments

@ebyhr
Copy link
Member

ebyhr commented Feb 13, 2024

Could you confirm product test failure? https://github.com/trinodb/trino/actions/runs/7881376966/job/21505082153?pr=20371

@findinpath
Copy link
Contributor

Could you confirm product test failure? https://github.com/trinodb/trino/actions/runs/7881376966/job/21505082153?pr=20371

This is related to an internal hickup in the Databricks account used for testing Trino OSS. Apparently someone removed (accidentaly) in the meantime the Unity cluster. Recreating the cluster.

@findinpath
Copy link
Contributor

ci/test (plugin/trino-kudu) unrelated failure #20697

@ebyhr ebyhr force-pushed the master_thrift-http-client-factory branch from 27758b7 to 2b5f643 Compare February 14, 2024 23:55
@vihangk1 vihangk1 force-pushed the master_thrift-http-client-factory branch 3 times, most recently from 1762171 to 20cd909 Compare February 15, 2024 18:19
@@ -144,9 +143,9 @@ private static URI checkMetastoreUri(URI uri)
requireNonNull(uri, "uri is null");
String scheme = uri.getScheme();
checkArgument(!isNullOrEmpty(scheme), "metastoreUri scheme is missing: %s", uri);
checkArgument(scheme.equals("thrift"), "metastoreUri scheme must be thrift: %s", uri);
checkArgument(scheme.equals("thrift") || scheme.equals("https") || scheme.equals("http"), "metastoreUri scheme must be thrift, http or https: %s", uri);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this check for https and http because this class only works with thrift now.

BEARER
}

private Duration readTimeout = new Duration(10, TimeUnit.SECONDS);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating this to 60 seconds which is the timeout value for Databricks.

@chenyair
Copy link

It is a game changer for us! Thank you for contributing this feature

@chenyair
Copy link

Regarding token expiration. How will Trino handle the refreshing of the token or fetching a new token if needed?
Is it possible to add option to send credentials to the connector (client_id, client_secret) and then the logic for it will be implemented in Trino somehow or will I have to take care of it manually?

@ebyhr ebyhr force-pushed the master_thrift-http-client-factory branch from 20cd909 to db1c627 Compare February 20, 2024 01:25
@ebyhr ebyhr force-pushed the master_thrift-http-client-factory branch from db1c627 to 9241b75 Compare February 20, 2024 06:54
@ebyhr ebyhr force-pushed the master_thrift-http-client-factory branch from 9241b75 to d7cb320 Compare February 20, 2024 08:18
@ebyhr ebyhr merged commit 0da647c into trinodb:master Feb 20, 2024
95 checks passed
@github-actions github-actions bot added this to the 440 milestone Feb 20, 2024
@mosabua
Copy link
Member

mosabua commented Feb 21, 2024

Semi related question.. are there good reasons why this is not using the airlift http client and its configuration like other http client use cases in Trino - see https://trino.io/docs/current/admin/properties-http-client.html

cc @colebow

@ebyhr
Copy link
Member

ebyhr commented Feb 21, 2024

@mosabua #17925 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector docs hive Hive connector hudi Hudi connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

None yet

5 participants