Add Hudi connector #10228

codope · 2021-12-08T03:08:20Z

This PR adds a new connector for Hudi (Issue #9877 ). For new Hudi features within the existing Hive connector please take a look at #9641 . In order to avoid making core changes in Hive connector we wrote a new connector as suggested in #9641. The design of the connector can be viewed here.

feiwei8586 · 2021-12-09T12:44:36Z

i want to know ， ”codope:hudi-plugin“ - branch is running ok on hive-rt-table?

codope · 2021-12-09T13:52:44Z

i want to know ， ”codope:hudi-plugin“ - branch is running ok on hive-rt-table?

@feiwei8586 Thanks for your interest in the new connector. It is not ready for hive-rt tables yet. The plan in the first phase is to get copy-on-write tables ready.

feiwei8586 · 2021-12-09T14:39:48Z

Thanks for your answer. our difficulties is the query hive-rt table . please .

super-sponge · 2021-12-10T09:59:18Z

I debug in idea get some error

etc/catalog/hudi.properties
connector.name=hudi

2021-12-10T17:55:08.296+0800 INFO main io.trino.metadata.StaticCatalogStore -- Loading catalog etc/catalog/hudi.properties --
2021-12-10T17:55:08.332+0800 ERROR main io.trino.server.Server java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
java.lang.RuntimeException: java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
at io.trino.plugin.hudi.HudiConnectorFactory.create(HudiConnectorFactory.java:79)
at io.trino.connector.ConnectorManager.createConnector(ConnectorManager.java:375)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:219)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:211)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:197)
at io.trino.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:88)
at io.trino.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at io.trino.server.Server.doStart(Server.java:125)
at io.trino.server.Server.lambda$start$0(Server.java:78)
at io.trino.$gen.Trino_dev____20211210_095357_1.run(Unknown Source)
at io.trino.server.Server.start(Server.java:78)
at io.trino.server.DevelopmentServer.main(DevelopmentServer.java:41)
Caused by: java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
at java.base/java.lang.Class.getMethod(Class.java:2108)
at io.trino.plugin.hudi.HudiConnectorFactory.create(HudiConnectorFactory.java:70)
... 11 more

codope · 2021-12-14T02:10:09Z

@super-sponge Can you pull the latest changes and try again? You will have to place hbase-common (version 1.2.3) jar in the classpath (inside <trino_install>/plugin/hudi/). I will resolve this manual step as well since now we have a separate hudi-trino-bundle.

cc @caneGuy

super-sponge · 2021-12-14T12:24:30Z

@super-sponge Can you pull the latest changes and try again? You will have to place hbase-common (version 1.2.3) jar in the classpath (inside <trino_install>/plugin/hudi/). I will resolve this manual step as well since now we have a separate hudi-trino-bundle.

cc @caneGuy

thanks!

yihua · 2021-12-17T16:56:25Z

@alexeykudinkin

codope · 2022-01-28T16:38:59Z

@findepi This PR is ready for review. Could you please approve the workflows?
cc @vinothchandar

mxdzs0612 · 2022-02-15T09:28:46Z

I meet some error when selecting COW hudi table with the newest PR, but it runs well with the older version, do you have any idea about it?

2022-02-15T17:16:43.942+0800 INFO pool-60-thread-1 org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based view for basePath hdfs://xxxxx
2022-02-15T17:16:43.942+0800 INFO pool-60-thread-1 org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 1 ms to read 0 instants, 0 replaced file groups
2022-02-15T17:16:44.461+0800 WARN dispatcher-query-1 io.trino.execution.scheduler.SqlQueryScheduler Error closing split source
java.lang.NullPointerException
at io.trino.plugin.hudi.query.HudiFileListing.close(HudiFileListing.java:81)
at io.trino.plugin.hudi.HudiSplitSource.close(HudiSplitSource.java:130)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.close(ClassLoaderSafeConnectorSplitSource.java:61)
at io.trino.split.ConnectorAwareSplitSource.close(ConnectorAwareSplitSource.java:68)
at io.trino.split.BufferingSplitSource.close(BufferingSplitSource.java:60)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.closeSplitSources(SqlQueryScheduler.java:1448)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler$1.stateChanged(SqlQueryScheduler.java:1329)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler$1.stateChanged(SqlQueryScheduler.java:1319)
at io.trino.execution.StateMachine.fireStateChangedListener(StateMachine.java:241)
at io.trino.execution.StateMachine.lambda$fireStateChanged$0(StateMachine.java:233)
at io.trino.$gen.Trino_359_2247_g872830c____20220215_091452_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

2022-02-15T17:16:44.470+0800 ERROR query-execution-0 io.trino.execution.scheduler.SqlQueryScheduler Failure in distributed stage for query 20220215_091643_00002_y7wg5
java.lang.NullPointerException
at io.trino.plugin.hudi.query.HudiFileListing.close(HudiFileListing.java:81)
at io.trino.plugin.hudi.HudiSplitSource.close(HudiSplitSource.java:130)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.close(ClassLoaderSafeConnectorSplitSource.java:61)
at io.trino.split.ConnectorAwareSplitSource.close(ConnectorAwareSplitSource.java:68)
at io.trino.split.BufferingSplitSource.close(BufferingSplitSource.java:60)
at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:390)
at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:182)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.schedule(SqlQueryScheduler.java:1553)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

codope · 2022-02-18T08:58:53Z

Thanks for your answer. our difficulties is the query hive-rt table . please .

@feiwei8586 We will add support for _rt table soon. The goal is to get this PR landed and keep building incrementally. We are tracking _rt support in https://issues.apache.org/jira/browse/HUDI-2687

codope · 2022-02-18T09:01:14Z

I meet some error when selecting COW hudi table with the newest PR, but it runs well with the older version, do you have any idea about it?

@mxdzs0612 Thanks for trying it out. I did query COW and MOR _ro tables before pushing the changes. The last commit was mostly refactoring rather than any logic change. I'll debug your error and get back to you by Wednesday.

codope · 2022-02-18T09:03:09Z

@martint @findepi would love to get some feedback on this diff.
cc @caneGuy @hashhar

losipiuk

Initial set of comments based on skiming subset of classes.

plugin/trino-hive/src/main/java/io/trino/plugin/hive/util/HiveUtil.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/InternalHudiConnectorFactory.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplitSource.java

codope · 2022-03-28T20:00:39Z

@losipiuk Thanks for reviewing the PR. I'll address the comments by end of this week as we're code freezing Hudi this week in preparation for the next release. Few things have changed since I last rebased. Thanks for the tip in comments.

codope · 2022-04-02T15:19:38Z

@losipiuk I've addressed all but one comment. Need a clarification on this comment in HudiSplitSource: #10228 (comment)

instead sleeping here you should return an uncompleted future and fill it in with data later on when you have enough splits available

For this, i will have to use an AsyncQueue like how it's done in HiveSplitSource and return toCompletableFuture right? Even with that, won't we need to provide some timeout or is it internally handled?

losipiuk · 2022-04-04T08:56:09Z

@losipiuk I've addressed all but one comment. Need a clarification on this comment in HudiSplitSource: #10228 (comment)

instead sleeping here you should return an uncompleted future and fill it in with data later on when you have enough splits available

For this, i will have to use an AsyncQueue like how it's done in HiveSplitSource and return toCompletableFuture right? Even with that, won't we need to provide some timeout or is it internally handled?

Yeah - you can follow the pattern with AsyncQueue. You still need to handle error situations. I.e. when the thread which is filling up the queue notices an erroneous situation (e.g. timeout) it should mark it somewhere (on a field in HudiSplitSource). When you mark an error situation you have to call queue.finish() to unblock future returned from getNextBatch. The getNextBatch should throw if there is a error-mark.

PTAL at DeltaLakeSplitSource for good and still simple application of the pattern.

losipiuk

Thanks for addressing. some more comments.
Ultimatelly we need a review from @electrum i think.

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConnector.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiErrorCode.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplitSource.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableHandle.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSource.java

cla-bot · 2022-04-19T16:37:05Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

cla-bot · 2022-04-20T06:39:29Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

codope · 2022-08-29T16:21:52Z

@electrum @findinpath thanks for testing and going over the PR again. Regarding the reflective access issue, we plan to fix it in the next patch release of Hudi (0.12.1). It is being tracked in https://issues.apache.org/jira/browse/HUDI-4687
I will address all other comments this week.

codope · 2022-08-30T16:37:45Z

@findinpath Regarding #10228 (comment) and #10228 (comment), I have fixed the issue. Regarding the issue in MOR table #10228 (comment), it is expected as the connector does not have the snapshot query support for MOR table yet. Once the compaction/cleaning kicks in, then new base files will be produced and then the results would show the latest records. There is already a ticket to track support for snapshot query.

dain · 2022-08-30T22:41:24Z

Regarding above, I added the following to trino-hudi module only.

Please, just update the Hudi client to not do trigger the reflection exceptions, and do a patch release. I don't think we should add stuff that needs these kinds of exceptions.

dain · 2022-08-30T22:35:32Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

+    private static final Splitter COMMA_SPLITTER = Splitter.on(",").omitEmptyStrings().trimResults();
+
+    private String baseFileFormat = PARQUET.name();
+    private List<String> columnsToHide = ImmutableList.of();


Ya, that is not how this is done in Trino. These should implemented like the other connectors.

dain · 2022-08-30T22:38:35Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/page/HudiPageSourceCreator.java

+
+import java.util.List;
+
+public abstract class HudiPageSourceCreator


This abstract class only has one implementation. Please remove this (and any other base class like this). If we need the abstraction in the future, an abstraction can be added then, and it can be tuned to the actual usage. Also, this seems to be really just a interface, and the Trino code base does not use this kind of interface as an abstract class pattern... instead simply add these 4 fields to any implementation, and if you need an interface add a plain old interface

@codope please include the code from HudiParquetPageSourceCreator directly in HudiPageSourceProvider

dain · 2022-08-30T22:55:08Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java

+                tableName.getSchemaName(),
+                tableName.getTableName(),
+                table.get().getStorage().getLocation(),
+                HoodieTableType.COPY_ON_WRITE,


When you need this value, you can simply add the field to the Handle. It is better to wait until something is actually needed before adding it.

dain · 2022-08-30T22:58:28Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiSplitManager.java

+        HiveMetastore metastore = metastoreProvider.apply(session.getIdentity(), (HiveTransactionHandle) transaction);
+        Table table = metastore.getTable(hudiTableHandle.getSchemaName(), hudiTableHandle.getTableName())
+                .orElseThrow(() -> new TableNotFoundException(schemaTableName(hudiTableHandle.getSchemaName(), hudiTableHandle.getTableName())));
+        final FileSystem fs;


Don't use final on local variables

Also please don't use abbreviations like this, instead just use filesystem

dain · 2022-08-30T23:04:22Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/split/HudiBackgroundSplitLoader.java

+
+    public HudiBackgroundSplitLoader(
+            ConnectorSession session,
+            FileSystem fs,


no abbreviations

dain · 2022-08-30T23:36:50Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/HudiPartitionInfoLoader.java

+    private final int minPartitionBatchSize;
+    private final int maxPartitionBatchSize;
+    private final Deque<HudiPartitionInfo> partitionQueue;
+    private int currBatchSize;


No abbreviations. Can you scan through this PR and fix any places that have abbreviations like this?

dain · 2022-08-30T23:37:11Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/InternalHudiPartitionInfo.java

+    @Override
+    public String getHivePartitionName()
+    {
+        throw new HoodieException(


No need to wrap. Please fix all of the excessive wrapping in this class

dain · 2022-08-30T23:38:12Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+    final boolean useMetastoreForPartitions;
+    final HoodieTableFileSystemView fileSystemView;
+    final TupleDomain<String> partitionKeysFilter;
+    final List<Column> partitionColumns;


Mark all of these private

dain · 2022-08-30T23:39:17Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+                .filter(partitionInfo -> partitionInfo.getHivePartitionKeys().isEmpty() || partitionInfo.doesMatchPredicates())
+                .collect(Collectors.toList());
+
+        log.debug("Get partitions to scan in %d ms (useMetastoreForPartitions: %s): %s", timer.endTimer(), useMetastoreForPartitions, filteredPartitionInfoList);


This seems like excessive logging. If you want to collect stats, please add some @Managed stats

dain · 2022-08-30T23:40:18Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+                hiveMetastore);
+        String relativePartitionPath = partitionHiveInfo.getRelativePartitionPath();
+        List<String> partitionValues = partitionHiveInfo.getHivePartitionKeys().stream()
+                .map(HivePartitionKey::getValue).collect(Collectors.toList());


put collect on a separate line. Generally, if you chop down a statement like this, do it for each step. If it is a short one liner then it is fine to keep everything on one line

electrum · 2022-08-31T07:29:40Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiQueryMode.java

+
+package io.trino.plugin.hudi.query;
+
+public enum HudiQueryMode {


It looks like this is not used anywhere (the only usage is hard-coded to READ_OPTIMIZED), so please remove for now. We can add it back later if needed.

electrum · 2022-08-31T07:30:45Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+


Please remove the blank line between the license header and the package statement, in order to match the convention used in all other files in Trino.

electrum · 2022-08-31T07:33:53Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+                        : TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline());
+                allPartitionInfoList = relativePartitionPathList.stream()
+                        .map(relativePartitionPath ->
+                                buildHudiPartitionInfo(


It would be better to inline this method, since the useMetastoreForPartitions field is a constant here.

electrum · 2022-08-31T07:34:37Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+            }
+        }
+
+        if (isNull(allPartitionInfoList)) {


Use a normal value == null check. This method should be reserved for usage in streams, etc.

electrum · 2022-08-31T07:36:39Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

+    }
+
+    @Config("hudi.skip-metastore-for-partition")
+    @ConfigDescription("By default, partition info is fetched from the metastore. " +


How would this custom extractor be used in Trino? We don't want users to have to add custom code.

This config property needs to be covered by integration tests, so that we test both code paths. If it's too hard to test, then let's remove it for now.

electrum · 2022-08-31T07:37:04Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

+        return this.useMetastoreForPartitions;
+    }
+
+    @Config("hudi.parquet.use-column-names")


We need an integration test covering this option.

electrum · 2022-08-31T07:38:03Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java

+        return this;
+    }
+
+    @Config("hudi.metadata-enabled")


Why is this disabled by default? When would it be safe (or not be safe) to enable?

electrum · 2022-08-31T07:42:49Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/HiveHudiPartitionInfo.java

+
+    private void loadPartitionInfoFromHiveMetastore()
+    {
+        Optional<Partition> partition = hiveMetastore.getPartition(table, HiveUtil.toPartitionValues(hivePartitionName));


No need for this local variable

electrum · 2022-08-31T07:44:00Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/HudiPartitionInfoFactory.java

+{
+    private HudiPartitionInfoFactory() {}
+
+    public static HudiPartitionInfo buildHudiPartitionInfo(


Inline this method into the callers. The way it mixes the boolean useMetastoreForPartitions with Optional parameters is confusing.

electrum · 2022-08-31T07:44:15Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/partition/InternalHudiPartitionInfo.java

+    @Override
+    public String toString()
+    {
+        StringBuilder stringBuilder = new StringBuilder();


ToStringHelper

electrum · 2022-08-31T07:45:07Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiFileListerFactory.java

+
+import static java.lang.String.format;
+
+public final class HudiFileListerFactory


Remove this for now, since it's only used in one place and is just forwarding to a single implementation.

electrum · 2022-08-31T07:45:14Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedFileLister.java

+{
+    private static final Logger log = Logger.get(HudiReadOptimizedFileLister.class);
+
+    final HoodieMetadataConfig metadataConfig;


Make all these fields private

codope · 2022-09-05T11:19:03Z

@electrum @dain @findinpath I've addressed your comments. There are just two open items right now:

Add Hudi connector #10228 (comment) - blocked reflective access since jdk 17.
Add Hudi connector #10228 (comment) - convert filesystem usage in hudi connector to use the TrinoFileSystem interface.

For the first item, I would suggest to go with the current workaround of --add-opens for the time being. We will fix this in the Hudi 0.12.1 patch release (expected in late September or early October). I'll then upgrade the Hudi version and remove the workaround code from Trino.
For the second item, I looked at the corresponding changes for Iceberg and DeltaLake connector. It requires some deep dive. Since, the connector is a pressing ask in the community, I'd suggest the following way forward:

Create a separate Github issue to track this and other open items in future.
Merge the current PR after other minor comments are addressed.
Do not announce GA of Hudi connector until this issue is resolved.
Thoughts?

EDIT: Adding the corresponding tickets from Hudi JIRAs: HUDI-4687 and HUDI-4789

bvaradar · 2022-09-13T17:37:42Z

At Robinhood, we have been waiting for this Hudi connector to land for more than 6 months now. This will help us use Trino better with Hudi tables. Can we kindly prioritize landing this PR.

duanyongvictory · 2022-09-14T03:01:55Z

hi , i am eagerly to use snapshot query for mor table，could you please give me the only code for support this feature?
I could not find the snapshot code in this pull right now.
so i can test by myself.
thanks a lot.

findinpath · 2022-09-23T08:39:22Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiMetadata.java

@@ -72,6 +73,8 @@
 public class HudiMetadata
        implements ConnectorMetadata
 {
+    public static final Logger log = Logger.get(HudiMetadata.class);


How is the removal of the warning related to removing the dependency towards hudi-hive-sync ?

findinpath · 2022-09-23T08:47:47Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/page/HudiPageSourceCreator.java

+
+public abstract class HudiPageSourceCreator
+{
+    protected final HudiConfig hudiConfig;


Note that the hudiConfig is nowhere used.
Anyway the config classes shouldn't be saved, but instead only their required configurations should be retrieved and be saved.

electrum

The fixup commits look good. Can you squash them into the Add Hudi Connector commit?

I believe the last remaining question, and my biggest concern, is around integration tests against Spark. We do these as product tests for the Iceberg connector. Do you have plans to work on this as a follow up?

electrum · 2022-09-23T20:10:37Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java

-        ConnectorPageSource dataPageSource = pageSourceBuilderMap.get(hudiFileFormat).createPageSource(configuration, session, regularColumns, split);
+        TrinoFileSystem fileSystem = fileSystemFactory.create(session);
+        TrinoInputFile inputFile = fileSystem.newInputFile(path.toString(), split.getFileSize());
+        ConnectorPageSource dataPageSource = pageSourceBuilderMap.get(hudiFileFormat).createPageSource(configuration, session, regularColumns, split, inputFile);


Why does this still need Configuration?

electrum · 2022-09-23T20:10:59Z

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java

            HdfsEnvironment hdfsEnvironment,
            FileFormatDataSourceStats fileFormatDataSourceStats,
            ParquetReaderConfig parquetReaderConfig,
            HudiConfig hudiConfig)
    {
+        this.fileSystemFactory = requireNonNull(fileSystemFactory, "fileSystemFactory is null");
        this.hdfsEnvironment = requireNonNull(hdfsEnvironment, "hdfsEnvironment is null");


Do we still need to pass HdfsEnvironment to HudiParquetPageSourceCreator, or can that be removed now?

Both Configuration and HdfsEvironment can be removed now.

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Todd Gao <gaoguantao@bytedance.com>

codope · 2022-09-26T12:10:20Z

The fixup commits look good. Can you squash them into the Add Hudi Connector commit?

Done. All other comments are addressed too. Squashed them to the main commit.

I believe the last remaining question, and my biggest concern, is around integration tests against Spark. We do these as product tests for the Iceberg connector. Do you have plans to work on this as a follow up?

Absolutely! I did start on this sometime back. I have a PR for Hudi Spark3 image. I will pick up the product tests in follow-up to this PR.

codope · 2022-09-26T15:32:57Z

Also, the PR is not dependent on FileSystem anymore. Replaced all usages of FileSystem with TrinoFileSystem.

tooptoop4 · 2022-09-26T16:26:06Z

@codope any issue with bumping to 0.12 like prestodb/presto@9b36723?

codope · 2022-09-26T16:39:04Z

@tooptoop4 We're going to release 0.12.1 very soon. So, I'll directly upgrade to 0.12.1.

codope · 2022-09-29T13:21:09Z

would you mind telling me which part of the code solves this bug? I backported this PR late in July and met this problem, I want to post a hotfix for it, might you help with it?

The fix is in HudiPageSourceProvider#convertPartitionValues

cla-bot bot added the cla-signed label Dec 8, 2021

codope force-pushed the hudi-plugin branch from 3df674b to e978a22 Compare January 27, 2022 14:16

codope changed the title ~~[WIP] Add Hudi plugin~~ Add Hudi connector Jan 27, 2022

codope marked this pull request as ready for review January 27, 2022 14:34

github-actions bot added the tests:hive label Jan 27, 2022

Desmeister reviewed Feb 16, 2022

View reviewed changes

plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java Show resolved Hide resolved

codope mentioned this pull request Feb 22, 2022

[QUESTION] Example for CREATE TABLE on TRINO using HUDI apache/hudi#4437

Closed

codope force-pushed the hudi-plugin branch from e978a22 to 10cee61 Compare February 28, 2022 15:23

losipiuk reviewed Mar 18, 2022

View reviewed changes

codope force-pushed the hudi-plugin branch from 10cee61 to 220c0f8 Compare April 2, 2022 14:51

losipiuk reviewed Apr 4, 2022

View reviewed changes

cla-bot bot removed the cla-signed label Apr 19, 2022

codope force-pushed the hudi-plugin branch from 9adc06b to 7acffa5 Compare April 20, 2022 06:39

codope force-pushed the hudi-plugin branch from d0cff5c to 1793def Compare August 30, 2022 13:20

dain reviewed Aug 30, 2022

View reviewed changes

electrum reviewed Aug 31, 2022

View reviewed changes

tooptoop4 mentioned this pull request Sep 13, 2022

Hudi Connector #9877

Open

14 tasks

codope force-pushed the hudi-plugin branch from f7c969c to 60d0312 Compare September 23, 2022 07:39

findinpath reviewed Sep 23, 2022

View reviewed changes

electrum approved these changes Sep 23, 2022

View reviewed changes

7c00 and others added 3 commits September 26, 2022 16:09

Override getSchema in FileSystemWrapper

60c6ae7

Extract columnMetadataGetter to HiveUtil for reusability

4b49745

Override getScheme in TrinoS3FileSystem

c15983d

codope force-pushed the hudi-plugin branch from 60d0312 to be5289c Compare September 26, 2022 11:19

Add Hudi Connector

1d862cd

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com> Co-authored-by: Todd Gao <gaoguantao@bytedance.com>

codope force-pushed the hudi-plugin branch from be5289c to 1d862cd Compare September 26, 2022 12:03

electrum merged commit 97882fb into trinodb:master Sep 27, 2022

github-actions bot added this to the 398 milestone Sep 27, 2022

colebow mentioned this pull request Sep 28, 2022

Add Trino 398 release notes #14319

Merged


		import java.util.List;

		public abstract class HudiPageSourceCreator


		package io.trino.plugin.hudi.query;

		public enum HudiQueryMode {


		import static java.lang.String.format;

		public final class HudiFileListerFactory

Add Hudi connector #10228

Add Hudi connector #10228

Conversation

codope commented Dec 8, 2021 • edited Loading

feiwei8586 commented Dec 9, 2021

codope commented Dec 9, 2021

feiwei8586 commented Dec 9, 2021

super-sponge commented Dec 10, 2021

codope commented Dec 14, 2021

super-sponge commented Dec 14, 2021

yihua commented Dec 17, 2021

codope commented Jan 28, 2022 • edited Loading

mxdzs0612 commented Feb 15, 2022

codope commented Feb 18, 2022

codope commented Feb 18, 2022

codope commented Feb 18, 2022

losipiuk left a comment

Choose a reason for hiding this comment

codope commented Mar 28, 2022 • edited Loading

codope commented Apr 2, 2022

losipiuk commented Apr 4, 2022

losipiuk left a comment

Choose a reason for hiding this comment

cla-bot bot commented Apr 19, 2022

cla-bot bot commented Apr 20, 2022

codope commented Aug 29, 2022 • edited Loading

codope commented Aug 30, 2022 • edited Loading

dain commented Aug 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codope commented Sep 5, 2022 • edited Loading

bvaradar commented Sep 13, 2022

duanyongvictory commented Sep 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

electrum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codope commented Sep 26, 2022

codope commented Sep 26, 2022

tooptoop4 commented Sep 26, 2022

codope commented Sep 26, 2022

codope commented Sep 29, 2022

codope commented Dec 8, 2021 •

edited

Loading

codope commented Jan 28, 2022 •

edited

Loading

codope commented Mar 28, 2022 •

edited

Loading

codope commented Aug 29, 2022 •

edited

Loading

codope commented Aug 30, 2022 •

edited

Loading

codope commented Sep 5, 2022 •

edited

Loading