New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spark3-hudi image #136
Conversation
@codope Is this PR ready for review? |
Not yet. I'll update it by tomorrow. |
@ebyhr @findinpath I have updated the PR. When I try to build locally using
I also tried to first build the dependent centos first using |
@@ -20,6 +20,7 @@ jobs: | |||
test: spark3-iceberg | |||
- image: spark3-delta | |||
test: spark3-delta | |||
- image: spark3-hudi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you enable tests for this image? We just recently added them to almost all images. See bin/test.sh
. The test is just a simple smoke test to see if a container using this image will start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@codope I can't reproduce the issue, can you paste the full output from |
@codope I just tried to build the image locally and the building process went just fine. |
|
||
FROM testing/centos7-oj11:unlabelled | ||
|
||
ARG SPARK_VERSION=3.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://hudi.apache.org/docs/quick-start-guide/
Hudi | Supported Spark 3 version |
---|---|
0.12.x | 3.3.x (default build), 3.2.x, 3.1.x |
0.11.x | 3.2.x (default build, Spark bundle only), 3.1.x |
Given that hudi is a fresh connector in the Trino ecosystem, let's make use of the latest hudi 0.12.x
which has support for the latest Spark version 3.3.0
which can also execute on Java 8/11/17
versions.
Let's build therefore on top of testing/centos7-oj17:unlabelled
base image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have kept the same version as in trino-hudi. The plan is to upgrade to Hudi 0.12.1 which will be out very soon. Then, i'll make the changes here as well in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codope Hudi doesn't support different version between server and client?
@nineinchnick @findinpath I still get the error after rebase. Here's the gist of full output of make command: https://gist.github.com/codope/921469ef910e3a76314d251b35060a6e |
Thanks, can you also include the output of |
Can you also include the output of |
@nineinchnick Thanks for the pointers. Looks like i'm running out of space. I have just 0.5gb left. I'll clean up and retry in some time. How much space is required typically?
|
These images are pretty heavy, so, unfortunately, multiple gigabytes. You probably need at least 5gb or more. |
I pruned all older images, removed the build cache, and reclaimed enough space. This time it's a different error: make testing/spark3-hudi
|
The warning there looks related. Your docker server version looks like the latest one, but your buildx plugin is not. Can you check if you could update it? |
I am still facing the same issue after upgrading buildx. Let me check with others. Maybe it has got something to do with my local environment. Just to ensure I'm doing it right, there is no step other than running |
Also, I see that the spark3-hudi image was built and smoke test ran successfully: https://github.com/trinodb/docker-images/actions/runs/3235524333/jobs/5300075791 |
Yes, but we have to wait for a maintainer to do so |
That's right, there are no other steps required. I saw some issues reported in Docker that buildx does not load the image into the local repo after building it, but it is supposed to be fixed. If you have the latest veesion, as us, you should not be affected. Building the image locally is only required if you want to test it to run product tests in Trino, which i assume you do want, since you're working on adding new tests. You could try building the images without using make, it prints all the docker commands it executes. |
Thanks for the help @nineinchnick . |
@codope can you please add a basic product test creating a table via spark hudi and reading it via trino in trinodb/trino#14365 with the newly created image (It is ok to reference in the code the image you have locally - I just want to test it myself as well) so that we can actually verify that the image indeed basically works? Otherwise, we may end up that something is missing and we need to do yet another release. |
@codope trinodb/trino#14669 draft PR is a working draft which you can use as a sample template for building the tests required for trinodb/trino#14365 I based the tests (same as for Delta Lake OSS) on a MinIO data lake because I think (@codope correct me if I'm wrong) this kind of setup is close to the integration with AWS S3 which we want to test with Trino for Hudi. @ebyhr please do test the draft PR on your machine as well and if you're ok with the results, I think we can release a new version of the docker images so that @codope can continue the work on the product tests. |
@codope pls squash the commits and add a relevant commit message. |
I'm testing the draft PR based on this. Let me merge once it's complete. |
@findinpath I got the below exception in my environment. Is the command same as you executed?
|
@ebyhr it was a copy-paste issue in |
Merged, thanks! |
Add spark3-hudi image for product tests