Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hudi Connector #9877

Open
7 of 14 tasks
caneGuy opened this issue Nov 5, 2021 · 17 comments
Open
7 of 14 tasks

Hudi Connector #9877

caneGuy opened this issue Nov 5, 2021 · 17 comments
Labels
enhancement New feature or request hudi Hudi connector roadmap Top level issues for major efforts in the project

Comments

@caneGuy
Copy link
Contributor

caneGuy commented Nov 5, 2021

As we mentioned on the slack and #9641. We need a new separate connector for Hudi now. This is a parent issue we can add discussion and TODOs here.

cc @vinothchandar @findepi @hashhar @bvaradar @codope @martint @mxdzs0612

@caneGuy caneGuy added the enhancement New feature or request label Nov 5, 2021
@codope
Copy link
Contributor

codope commented Nov 5, 2021

@caneGuy I have opened an umbrella JIRA to track several tasks related to the new connector. Feel free to add more subtasks. Once we have concretely scoped out the design, let's update this issue.

Right now I am working on a few higher priority tasks related to the next Hudi release. I'll pick up the connector work again in the last week of November. Meanwhile, I would suggest to join Hudi slack if you're not already there. I am "sagar sumit" on slack. We can discuss more over there.

@caneGuy
Copy link
Contributor Author

caneGuy commented Nov 5, 2021

great @codope i will join slack now!

@caneGuy
Copy link
Contributor Author

caneGuy commented Jan 5, 2022

@codope do you have any POC for MOR table to use common api instead of inputformat?

@vinothchandar
Copy link

We don't yet. But it'll come shortly after we get the MOR/RO support landed. Are you interested in taking a swing at that. We expect to open the official PR early next week

@ebyhr ebyhr added the roadmap Top level issues for major efforts in the project label Aug 18, 2022
@duanyongvictory
Copy link

duanyongvictory commented Sep 13, 2022

As we mentioned on the slack and #9641. We need a new separate connector for Hudi now. This is a parent issue we can add discussion and TODOs here.

  • add Hudi connector

  • testing

    • typical unit tests like TestHudiPlugin
    • BaseConnectorTest
    • product tests against supported hive version(s)
  • document: type mapping, support hive versions, glue?

  • Support query COW table

  • Support query MOR table

  • Support time travel query

  • glue support?

cc @vinothchandar @findepi @hashhar @bvaradar @codope @martint @mxdzs0612

hi , i am eagerly to use snapshot query for mor table,could you please give me the only code for support this feature?
so i can test by myself.
thanks a lot.

@tooptoop4
Copy link
Contributor

@duanyongvictory see #10228

some other todo items are:

@duanyongvictory
Copy link

@duanyongvictory see #10228

some other todo items are:

thanks for you reply.
I have already checked for #10228 and the code, but i found no snapshot query supported,so could you please give me a little tip for this code ? so I can read by my self.

as you mentioned about the ddl/dml,in my case only dql are used,we do dml/ddl through spark。

thanks a lot. it will be a great helpful if you could give me the code file for snapshot query .

@duanyongvictory
Copy link

@duanyongvictory see #10228
some other todo items are:

thanks for you reply. I have already checked for #10228 and the code, but i found no snapshot query supported,so could you please give me a little tip for this code ? so I can read by my self.

as you mentioned about the ddl/dml,in my case only dql are used,we do dml/ddl through spark。

thanks a lot. it will be a great helpful if you could give me the code file for snapshot query .

@tooptoop4
sorry to interrupt you again.
as i asked, any ideas could provide for me ? thanks.

@codope
Copy link
Contributor

codope commented Sep 14, 2022

@duanyongvictory the MOR snapshot query support is planned in the next quarter once the base PR merges. https://issues.apache.org/jira/browse/HUDI-2740

Would you like to contribute to this feature? I can share details of what changes would be required.

@duanyongvictory
Copy link

@duanyongvictory the MOR snapshot query support is planned in the next quarter once the base PR merges. https://issues.apache.org/jira/browse/HUDI-2740

Would you like to contribute to this feature? I can share details of what changes would be required.

i would happy to.
what changes do i need to know? i also want to do some contribute.
these days i have been study snapshot query for prestodb, why could prestdb ahead?

@codope
Copy link
Contributor

codope commented Sep 22, 2022

@duanyongvictory Great to hear that! Your contribution will help a lot in accelerating the write support. We're discussing in Hudi slack in https://apache-hudi.slack.com/archives/C02L9R88RJP/p1663729133329139
Let's take the discussion there.

@duanyongvictory
Copy link

duanyongvictory commented Sep 26, 2022

@duanyongvictory Great to hear that! Your contribution will help a lot in accelerating the write support. We're discussing in Hudi slack in https://apache-hudi.slack.com/archives/C02L9R88RJP/p1663729133329139 Let's take the discussion there.

Could you invite me to this workplace?
My google account is not related to this workplace.
My google account is duanyong0079@gmail.com.
thanks.

@codope
Copy link
Contributor

codope commented Sep 26, 2022

@duanyongvictory I've sent an invite to your email. Please check.

@electrum electrum added the hudi Hudi connector label Sep 27, 2022
@zorrofox
Copy link
Contributor

Do we have any updates for Glue support?

@ebyhr
Copy link
Member

ebyhr commented Jan 18, 2023

I don't think we have.

@MateusCastello
Copy link

Do we have any updates for Glue support?
@zorrofox

I'm acctually able to fetch data from Glue managed Tables using "hive.metastore=glue" and passing the correct AWS role using "hive.metastore.glue.iam-role", but Only when I set retry-policy to None, disabling fault-tolerant Query Execution.

When Retry-Policy is set to "TASK" I get a Java Error saying that "this connector don't have memory accounting capabilities for ConnectorSplit"

java.lang.UnsupportedOperationException: This connector does not provide memory accounting capabilities for ConnectorSplit

at io.trino.spi.connector.ConnectorSplit.getRetainedSizeInBytes(ConnectorSplit.java:43)
at io.trino.metadata.Split.getRetainedSizeInBytes(Split.java:91)
at io.airlift.slice.SizeOf.estimatedSizeOf(SizeOf.java:194)
at io.trino.execution.scheduler.TaskDescriptor.lambda$getRetainedSizeInBytes$0(TaskDescriptor.java:99)
at io.airlift.slice.SizeOf.estimatedSizeOf(SizeOf.java:222)
at io.trino.execution.scheduler.TaskDescriptor.getRetainedSizeInBytes(TaskDescriptor.java:99)
at io.trino.execution.scheduler.TaskDescriptorStorage$TaskDescriptors.put(TaskDescriptorStorage.java:187)
at io.trino.execution.scheduler.TaskDescriptorStorage.put(TaskDescriptorStorage.java:89)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$StagePartition.seal(EventDrivenFaultTolerantQueryScheduler.java:1635)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$StageExecution.sealPartition(EventDrivenFaultTolerantQueryScheduler.java:1244)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$Scheduler.lambda$onPartitionsSealed$12(EventDrivenFaultTolerantQueryScheduler.java:1050)
at com.google.common.primitives.ImmutableIntArray.forEach(ImmutableIntArray.java:414)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$Scheduler.onPartitionsSealed(EventDrivenFaultTolerantQueryScheduler.java:1049)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$PartitionsSealedEvent.accept(EventDrivenFaultTolerantQueryScheduler.java:2066)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$Scheduler.processEvents(EventDrivenFaultTolerantQueryScheduler.java:624)
at io.trino.execution.scheduler.EventDrivenFaultTolerantQueryScheduler$Scheduler.run(EventDrivenFaultTolerantQueryScheduler.java:555)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)

@codope
Copy link
Contributor

codope commented Jun 20, 2023

@caneGuy @ebyhr Can you please edit the description of this issue and add the following TODOs?
#17976
#17977

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hudi Hudi connector roadmap Top level issues for major efforts in the project
Development

No branches or pull requests

9 participants