-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] Speeding up S3 Multipart Uploads using Hudi on AWS EMR with EMRFS #59
Comments
Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds |
Yes, it is possible to speed up S3 Multipart Uploads using Hudi on AWS EMR. When using EMRFS, you may notice Multipart Upload parts being written in serial by default, causing potential performance bottlenecks. However, there is a solution available for speeding up writes by configuring Hudi jars in EMR clusters. To override Hudi jars in your EMR clusters and potentially enhance write speeds, follow these steps:
By following these steps, you can optimize S3 Multipart Uploads using Hudi on AWS EMR, potentially enhancing write performance. (Source: Apache Hudi - FAQ Integrations) |
Please configure the GitHub URL to enable the creation of GitHub issues. |
2 similar comments
Please configure the GitHub URL to enable the creation of GitHub issues. |
Please configure the GitHub URL to enable the creation of GitHub issues. |
Ticket created successfully. Here is the link to the GitHub issue: #59 |
1 similar comment
Ticket created successfully. Here is the link to the GitHub issue: #59 |
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
Is there a way to speed up S3 Multipart Uploads using Hudi on AWS EMR with EMRFS? Currently, the Multipart Upload parts seem to be written in serial.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
The text was updated successfully, but these errors were encountered: