Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify spark image to include azure and gcs dependency jars #198

Merged
merged 1 commit into from
May 24, 2024

Conversation

anusudarsan
Copy link
Member

Modify the spark testing image to include Azure and Google cloud storage hadoop dependency jars to enable querying their object stores

@anusudarsan anusudarsan force-pushed the anu/modify-spark-for-azure-gc branch from 94441e1 to a25e72f Compare May 22, 2024 13:41
@anusudarsan
Copy link
Member Author

any ideas on the failures? I was able to build locally by adding a trusted host flag for pypi, not sure if this is what we want for CI or we have an expired ssl cert

@nineinchnick
Copy link
Member

Looks like it started failing on the builds 2 weeks ago: https://github.com/trinodb/docker-images/actions/runs/9024908079

It's probably caused by using CentOS 7 as the base image, which might not getting any updates anymore. We should update pip, but I'm not yet sure what's the best way to do that.

@anusudarsan
Copy link
Member Author

I added a new commit to trinodb/trino#21958 for testing gcs/azure for spark compatibility

@ebyhr ebyhr changed the base branch from master to ebi/fix May 24, 2024 02:31
@ebyhr ebyhr changed the base branch from ebi/fix to master May 24, 2024 02:32
@ebyhr ebyhr force-pushed the anu/modify-spark-for-azure-gc branch from a25e72f to ba3828c Compare May 24, 2024 06:39
@ebyhr
Copy link
Member

ebyhr commented May 24, 2024

Just rebased on master to run with #199

@ebyhr ebyhr merged commit a4f6a1f into trinodb:master May 24, 2024
12 checks passed
@ebyhr
Copy link
Member

ebyhr commented May 24, 2024

Started release job: https://github.com/trinodb/docker-images/actions/runs/9220333116

RUN wget -nv https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-azure-datalake/3.3.6/hadoop-azure-datalake-3.3.6.jar
RUN wget -nv https://repo1.maven.org/maven2/com/microsoft/azure/azure-storage/8.6.6/azure-storage-8.6.6.jar
# install Google Hadoop connector so we can access gcs
RUN wget -nv https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use a pinned version of this dependency?

@anusudarsan anusudarsan deleted the anu/modify-spark-for-azure-gc branch May 24, 2024 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants