Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hudi connector #10228

Merged
merged 4 commits into from
Sep 27, 2022
Merged

Add Hudi connector #10228

merged 4 commits into from
Sep 27, 2022

Conversation

codope
Copy link
Contributor

@codope codope commented Dec 8, 2021

This PR adds a new connector for Hudi (Issue #9877 ). For new Hudi features within the existing Hive connector please take a look at #9641 . In order to avoid making core changes in Hive connector we wrote a new connector as suggested in #9641. The design of the connector can be viewed here.

@cla-bot cla-bot bot added the cla-signed label Dec 8, 2021
@feiwei8586
Copy link

i want to know , ”codope:hudi-plugin“ - branch is running ok on hive-rt-table?

@codope
Copy link
Contributor Author

codope commented Dec 9, 2021

i want to know , ”codope:hudi-plugin“ - branch is running ok on hive-rt-table?

@feiwei8586 Thanks for your interest in the new connector. It is not ready for hive-rt tables yet. The plan in the first phase is to get copy-on-write tables ready.

@feiwei8586
Copy link

Thanks for your answer. our difficulties is the query hive-rt table . please .

@super-sponge
Copy link

I debug in idea get some error

etc/catalog/hudi.properties
connector.name=hudi

2021-12-10T17:55:08.296+0800 INFO main io.trino.metadata.StaticCatalogStore -- Loading catalog etc/catalog/hudi.properties --
2021-12-10T17:55:08.332+0800 ERROR main io.trino.server.Server java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
java.lang.RuntimeException: java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
at io.trino.plugin.hudi.HudiConnectorFactory.create(HudiConnectorFactory.java:79)
at io.trino.connector.ConnectorManager.createConnector(ConnectorManager.java:375)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:219)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:211)
at io.trino.connector.ConnectorManager.createCatalog(ConnectorManager.java:197)
at io.trino.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:88)
at io.trino.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at io.trino.server.Server.doStart(Server.java:125)
at io.trino.server.Server.lambda$start$0(Server.java:78)
at io.trino.$gen.Trino_dev____20211210_095357_1.run(Unknown Source)
at io.trino.server.Server.start(Server.java:78)
at io.trino.server.DevelopmentServer.main(DevelopmentServer.java:41)
Caused by: java.lang.NoSuchMethodException: io.trino.plugin.hudi.InternalHudiConnectorFactory.createConnector(java.lang.String, java.util.Map, io.trino.spi.connector.ConnectorContext, com.google.inject.Module)
at java.base/java.lang.Class.getMethod(Class.java:2108)
at io.trino.plugin.hudi.HudiConnectorFactory.create(HudiConnectorFactory.java:70)
... 11 more

@codope
Copy link
Contributor Author

codope commented Dec 14, 2021

@super-sponge Can you pull the latest changes and try again? You will have to place hbase-common (version 1.2.3) jar in the classpath (inside <trino_install>/plugin/hudi/). I will resolve this manual step as well since now we have a separate hudi-trino-bundle.

cc @caneGuy

@super-sponge
Copy link

@super-sponge Can you pull the latest changes and try again? You will have to place hbase-common (version 1.2.3) jar in the classpath (inside <trino_install>/plugin/hudi/). I will resolve this manual step as well since now we have a separate hudi-trino-bundle.

cc @caneGuy

thanks!

@yihua
Copy link
Member

yihua commented Dec 17, 2021

@alexeykudinkin

@codope codope changed the title [WIP] Add Hudi plugin Add Hudi connector Jan 27, 2022
@codope codope marked this pull request as ready for review January 27, 2022 14:34
@codope
Copy link
Contributor Author

codope commented Jan 28, 2022

@findepi This PR is ready for review. Could you please approve the workflows?
cc @vinothchandar

@mxdzs0612
Copy link

I meet some error when selecting COW hudi table with the newest PR, but it runs well with the older version, do you have any idea about it?

2022-02-15T17:16:43.942+0800 INFO pool-60-thread-1 org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based view for basePath hdfs://xxxxx
2022-02-15T17:16:43.942+0800 INFO pool-60-thread-1 org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 1 ms to read 0 instants, 0 replaced file groups
2022-02-15T17:16:44.461+0800 WARN dispatcher-query-1 io.trino.execution.scheduler.SqlQueryScheduler Error closing split source
java.lang.NullPointerException
at io.trino.plugin.hudi.query.HudiFileListing.close(HudiFileListing.java:81)
at io.trino.plugin.hudi.HudiSplitSource.close(HudiSplitSource.java:130)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.close(ClassLoaderSafeConnectorSplitSource.java:61)
at io.trino.split.ConnectorAwareSplitSource.close(ConnectorAwareSplitSource.java:68)
at io.trino.split.BufferingSplitSource.close(BufferingSplitSource.java:60)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.closeSplitSources(SqlQueryScheduler.java:1448)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler$1.stateChanged(SqlQueryScheduler.java:1329)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler$1.stateChanged(SqlQueryScheduler.java:1319)
at io.trino.execution.StateMachine.fireStateChangedListener(StateMachine.java:241)
at io.trino.execution.StateMachine.lambda$fireStateChanged$0(StateMachine.java:233)
at io.trino.$gen.Trino_359_2247_g872830c____20220215_091452_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

2022-02-15T17:16:44.470+0800 ERROR query-execution-0 io.trino.execution.scheduler.SqlQueryScheduler Failure in distributed stage for query 20220215_091643_00002_y7wg5
java.lang.NullPointerException
at io.trino.plugin.hudi.query.HudiFileListing.close(HudiFileListing.java:81)
at io.trino.plugin.hudi.HudiSplitSource.close(HudiSplitSource.java:130)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.close(ClassLoaderSafeConnectorSplitSource.java:61)
at io.trino.split.ConnectorAwareSplitSource.close(ConnectorAwareSplitSource.java:68)
at io.trino.split.BufferingSplitSource.close(BufferingSplitSource.java:60)
at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:390)
at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:182)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.schedule(SqlQueryScheduler.java:1553)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

@codope
Copy link
Contributor Author

codope commented Feb 18, 2022

Thanks for your answer. our difficulties is the query hive-rt table . please .

@feiwei8586 We will add support for _rt table soon. The goal is to get this PR landed and keep building incrementally. We are tracking _rt support in https://issues.apache.org/jira/browse/HUDI-2687

@codope
Copy link
Contributor Author

codope commented Feb 18, 2022

I meet some error when selecting COW hudi table with the newest PR, but it runs well with the older version, do you have any idea about it?

@mxdzs0612 Thanks for trying it out. I did query COW and MOR _ro tables before pushing the changes. The last commit was mostly refactoring rather than any logic change. I'll debug your error and get back to you by Wednesday.

@codope
Copy link
Contributor Author

codope commented Feb 18, 2022

@martint @findepi would love to get some feedback on this diff.
cc @caneGuy @hashhar

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial set of comments based on skiming subset of classes.

@codope
Copy link
Contributor Author

codope commented Mar 28, 2022

@losipiuk Thanks for reviewing the PR. I'll address the comments by end of this week as we're code freezing Hudi this week in preparation for the next release. Few things have changed since I last rebased. Thanks for the tip in comments.

@codope
Copy link
Contributor Author

codope commented Apr 2, 2022

@losipiuk I've addressed all but one comment. Need a clarification on this comment in HudiSplitSource: #10228 (comment)

instead sleeping here you should return an uncompleted future and fill it in with data later on when you have enough splits available

For this, i will have to use an AsyncQueue like how it's done in HiveSplitSource and return toCompletableFuture right? Even with that, won't we need to provide some timeout or is it internally handled?

@losipiuk
Copy link
Member

losipiuk commented Apr 4, 2022

@losipiuk I've addressed all but one comment. Need a clarification on this comment in HudiSplitSource: #10228 (comment)

instead sleeping here you should return an uncompleted future and fill it in with data later on when you have enough splits available

For this, i will have to use an AsyncQueue like how it's done in HiveSplitSource and return toCompletableFuture right? Even with that, won't we need to provide some timeout or is it internally handled?

Yeah - you can follow the pattern with AsyncQueue. You still need to handle error situations. I.e. when the thread which is filling up the queue notices an erroneous situation (e.g. timeout) it should mark it somewhere (on a field in HudiSplitSource). When you mark an error situation you have to call queue.finish() to unblock future returned from getNextBatch. The getNextBatch should throw if there is a error-mark.

PTAL at DeltaLakeSplitSource for good and still simple application of the pattern.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing. some more comments.
Ultimatelly we need a review from @electrum i think.

@cla-bot
Copy link

cla-bot bot commented Apr 19, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot
Copy link

cla-bot bot commented Apr 20, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@codope
Copy link
Contributor Author

codope commented Aug 29, 2022

@electrum @findinpath thanks for testing and going over the PR again. Regarding the reflective access issue, we plan to fix it in the next patch release of Hudi (0.12.1). It is being tracked in https://issues.apache.org/jira/browse/HUDI-4687
I will address all other comments this week.

@codope
Copy link
Contributor Author

codope commented Aug 30, 2022

@findinpath Regarding #10228 (comment) and #10228 (comment), I have fixed the issue. Regarding the issue in MOR table #10228 (comment), it is expected as the connector does not have the snapshot query support for MOR table yet. Once the compaction/cleaning kicks in, then new base files will be produced and then the results would show the latest records. There is already a ticket to track support for snapshot query.

@dain
Copy link
Member

dain commented Aug 30, 2022

Regarding above, I added the following to trino-hudi module only.

Please, just update the Hudi client to not do trigger the reflection exceptions, and do a patch release. I don't think we should add stuff that needs these kinds of exceptions.

private static final Splitter COMMA_SPLITTER = Splitter.on(",").omitEmptyStrings().trimResults();

private String baseFileFormat = PARQUET.name();
private List<String> columnsToHide = ImmutableList.of();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, that is not how this is done in Trino. These should implemented like the other connectors.


import java.util.List;

public abstract class HudiPageSourceCreator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This abstract class only has one implementation. Please remove this (and any other base class like this). If we need the abstraction in the future, an abstraction can be added then, and it can be tuned to the actual usage. Also, this seems to be really just a interface, and the Trino code base does not use this kind of interface as an abstract class pattern... instead simply add these 4 fields to any implementation, and if you need an interface add a plain old interface

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codope please include the code from HudiParquetPageSourceCreator directly in HudiPageSourceProvider

tableName.getSchemaName(),
tableName.getTableName(),
table.get().getStorage().getLocation(),
HoodieTableType.COPY_ON_WRITE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you need this value, you can simply add the field to the Handle. It is better to wait until something is actually needed before adding it.

HiveMetastore metastore = metastoreProvider.apply(session.getIdentity(), (HiveTransactionHandle) transaction);
Table table = metastore.getTable(hudiTableHandle.getSchemaName(), hudiTableHandle.getTableName())
.orElseThrow(() -> new TableNotFoundException(schemaTableName(hudiTableHandle.getSchemaName(), hudiTableHandle.getTableName())));
final FileSystem fs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use final on local variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please don't use abbreviations like this, instead just use filesystem


public HudiBackgroundSplitLoader(
ConnectorSession session,
FileSystem fs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no abbreviations

private final int minPartitionBatchSize;
private final int maxPartitionBatchSize;
private final Deque<HudiPartitionInfo> partitionQueue;
private int currBatchSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No abbreviations. Can you scan through this PR and fix any places that have abbreviations like this?

@Override
public String getHivePartitionName()
{
throw new HoodieException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wrap. Please fix all of the excessive wrapping in this class

final boolean useMetastoreForPartitions;
final HoodieTableFileSystemView fileSystemView;
final TupleDomain<String> partitionKeysFilter;
final List<Column> partitionColumns;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark all of these private

.filter(partitionInfo -> partitionInfo.getHivePartitionKeys().isEmpty() || partitionInfo.doesMatchPredicates())
.collect(Collectors.toList());

log.debug("Get partitions to scan in %d ms (useMetastoreForPartitions: %s): %s", timer.endTimer(), useMetastoreForPartitions, filteredPartitionInfoList);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like excessive logging. If you want to collect stats, please add some @Managed stats

hiveMetastore);
String relativePartitionPath = partitionHiveInfo.getRelativePartitionPath();
List<String> partitionValues = partitionHiveInfo.getHivePartitionKeys().stream()
.map(HivePartitionKey::getValue).collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put collect on a separate line. Generally, if you chop down a statement like this, do it for each step. If it is a short one liner then it is fine to keep everything on one line


package io.trino.plugin.hudi.query;

public enum HudiQueryMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not used anywhere (the only usage is hard-coded to READ_OPTIMIZED), so please remove for now. We can add it back later if needed.

* See the License for the specific language governing permissions and
* limitations under the License.
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the blank line between the license header and the package statement, in order to match the convention used in all other files in Trino.

: TimelineUtils.getPartitionsWritten(metaClient.getActiveTimeline());
allPartitionInfoList = relativePartitionPathList.stream()
.map(relativePartitionPath ->
buildHudiPartitionInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to inline this method, since the useMetastoreForPartitions field is a constant here.

}
}

if (isNull(allPartitionInfoList)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a normal value == null check. This method should be reserved for usage in streams, etc.

}

@Config("hudi.skip-metastore-for-partition")
@ConfigDescription("By default, partition info is fetched from the metastore. " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this custom extractor be used in Trino? We don't want users to have to add custom code.

This config property needs to be covered by integration tests, so that we test both code paths. If it's too hard to test, then let's remove it for now.

return this.useMetastoreForPartitions;
}

@Config("hudi.parquet.use-column-names")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an integration test covering this option.

return this;
}

@Config("hudi.metadata-enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this disabled by default? When would it be safe (or not be safe) to enable?


private void loadPartitionInfoFromHiveMetastore()
{
Optional<Partition> partition = hiveMetastore.getPartition(table, HiveUtil.toPartitionValues(hivePartitionName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this local variable

{
private HudiPartitionInfoFactory() {}

public static HudiPartitionInfo buildHudiPartitionInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline this method into the callers. The way it mixes the boolean useMetastoreForPartitions with Optional parameters is confusing.

@Override
public String toString()
{
StringBuilder stringBuilder = new StringBuilder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToStringHelper


import static java.lang.String.format;

public final class HudiFileListerFactory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this for now, since it's only used in one place and is just forwarding to a single implementation.

{
private static final Logger log = Logger.get(HudiReadOptimizedFileLister.class);

final HoodieMetadataConfig metadataConfig;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make all these fields private

@codope
Copy link
Contributor Author

codope commented Sep 5, 2022

@electrum @dain @findinpath I've addressed your comments. There are just two open items right now:

  1. Add Hudi connector #10228 (comment) - blocked reflective access since jdk 17.
  2. Add Hudi connector #10228 (comment) - convert filesystem usage in hudi connector to use the TrinoFileSystem interface.

For the first item, I would suggest to go with the current workaround of --add-opens for the time being. We will fix this in the Hudi 0.12.1 patch release (expected in late September or early October). I'll then upgrade the Hudi version and remove the workaround code from Trino.
For the second item, I looked at the corresponding changes for Iceberg and DeltaLake connector. It requires some deep dive. Since, the connector is a pressing ask in the community, I'd suggest the following way forward:

  1. Create a separate Github issue to track this and other open items in future.
  2. Merge the current PR after other minor comments are addressed.
  3. Do not announce GA of Hudi connector until this issue is resolved.
    Thoughts?

EDIT: Adding the corresponding tickets from Hudi JIRAs: HUDI-4687 and HUDI-4789

@tooptoop4 tooptoop4 mentioned this pull request Sep 13, 2022
14 tasks
@bvaradar
Copy link

At Robinhood, we have been waiting for this Hudi connector to land for more than 6 months now. This will help us use Trino better with Hudi tables. Can we kindly prioritize landing this PR.

@duanyongvictory
Copy link

hi , i am eagerly to use snapshot query for mor table,could you please give me the only code for support this feature?
I could not find the snapshot code in this pull right now.
so i can test by myself.
thanks a lot.

@@ -72,6 +73,8 @@
public class HudiMetadata
implements ConnectorMetadata
{
public static final Logger log = Logger.get(HudiMetadata.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the removal of the warning related to removing the dependency towards hudi-hive-sync ?


public abstract class HudiPageSourceCreator
{
protected final HudiConfig hudiConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the hudiConfig is nowhere used.
Anyway the config classes shouldn't be saved, but instead only their required configurations should be retrieved and be saved.

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixup commits look good. Can you squash them into the Add Hudi Connector commit?

I believe the last remaining question, and my biggest concern, is around integration tests against Spark. We do these as product tests for the Iceberg connector. Do you have plans to work on this as a follow up?

ConnectorPageSource dataPageSource = pageSourceBuilderMap.get(hudiFileFormat).createPageSource(configuration, session, regularColumns, split);
TrinoFileSystem fileSystem = fileSystemFactory.create(session);
TrinoInputFile inputFile = fileSystem.newInputFile(path.toString(), split.getFileSize());
ConnectorPageSource dataPageSource = pageSourceBuilderMap.get(hudiFileFormat).createPageSource(configuration, session, regularColumns, split, inputFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this still need Configuration?

HdfsEnvironment hdfsEnvironment,
FileFormatDataSourceStats fileFormatDataSourceStats,
ParquetReaderConfig parquetReaderConfig,
HudiConfig hudiConfig)
{
this.fileSystemFactory = requireNonNull(fileSystemFactory, "fileSystemFactory is null");
this.hdfsEnvironment = requireNonNull(hdfsEnvironment, "hdfsEnvironment is null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to pass HdfsEnvironment to HudiParquetPageSourceCreator, or can that be removed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both Configuration and HdfsEvironment can be removed now.

Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Todd Gao <gaoguantao@bytedance.com>
@codope
Copy link
Contributor Author

codope commented Sep 26, 2022

The fixup commits look good. Can you squash them into the Add Hudi Connector commit?

Done. All other comments are addressed too. Squashed them to the main commit.

I believe the last remaining question, and my biggest concern, is around integration tests against Spark. We do these as product tests for the Iceberg connector. Do you have plans to work on this as a follow up?

Absolutely! I did start on this sometime back. I have a PR for Hudi Spark3 image. I will pick up the product tests in follow-up to this PR.

@codope
Copy link
Contributor Author

codope commented Sep 26, 2022

Also, the PR is not dependent on FileSystem anymore. Replaced all usages of FileSystem with TrinoFileSystem.

@tooptoop4
Copy link
Contributor

@codope any issue with bumping to 0.12 like prestodb/presto@9b36723?

@codope
Copy link
Contributor Author

codope commented Sep 26, 2022

@tooptoop4 We're going to release 0.12.1 very soon. So, I'll directly upgrade to 0.12.1.

@electrum electrum merged commit 97882fb into trinodb:master Sep 27, 2022
@github-actions github-actions bot added this to the 398 milestone Sep 27, 2022
@codope
Copy link
Contributor Author

codope commented Sep 29, 2022

would you mind telling me which part of the code solves this bug? I backported this PR late in July and met this problem, I want to post a hotfix for it, might you help with it?

The fix is in HudiPageSourceProvider#convertPartitionValues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet