Skip to content

Improved ARM64 support #14569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
joeybenamy opened this issue May 7, 2025 · 6 comments
Open

Improved ARM64 support #14569

joeybenamy opened this issue May 7, 2025 · 6 comments
Assignees
Labels

Comments

@joeybenamy
Copy link

joeybenamy commented May 7, 2025

We are interested in running Spark NLP healthcare models on ARM64-based Graviton instances in Amazon EMR. With the following dependencies, we are able to build our docker image and start an Amazon EMR cluster and submit the Spark job, but it fails with runtime errors. We understand that ARM64 support is experimental, so using this ticket to track progress towards better ARM64 support.

Relevant application versions:

  • Amazon EMR emr-7.5.0
  • Hadoop 3.4.0
  • Hive 3.1.3
  • JupyterEnterpriseGateway 2.6.0
  • Livy 0.8.0
  • Spark 3.5.2
  • TensorFlow 2.16.1

pyproject.toml:

[project]
name = "jsl"
version = "0.1.0"
readme = "README.md"
dependencies = [
    "boto3",
    "cryptography",
    "johnsnowlabs==5.5.5", # 5.5.2 release was removed/unpublished
    "spark-nlp-jsl==5.5.3",
    "spark-nlp-display",
    "setuptools>=76.1.0",
]
# Python 3.9 (set in .python-version, pyproject.toml) is the common, minimum version in both EMR on EC2 and JohnSnowLabs
# https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-app-versions-7.x.html
# https://nlp.johnsnowlabs.com/licensed/api/python/getting_started/index.html#requirements
requires-python = ">= 3.9, < 3.10"

[[tool.uv.index]]
name = "johnsnowlabs"
url = "https://pypi.johnsnowlabs.com/5.5.3-SECRET"

[dependency-groups]
dev = [
    "venv-pack>=0.2.0",
]

Some logs below:
stderr.txt
controller.txt
stdout.txt

@maziyarpanahi
Copy link
Member

Hi @joeybenamy
Thanks for creating the issue, I have tested AWS Graviton with Databricks Runtime a while back and it was working. We might need some upgrade or something has changed.

Let me tag @DevinTDHa who actually built the arm support to test. Obviously, we will remove the license stuff in that config and only focus on the open-source spark-nlp just in case.

@maziyarpanahi
Copy link
Member

which version of EMR are you using? Just so we know, because some of the old ones and the new ones are not compatible. need to make sure we work on them on other hardware first.

@joeybenamy
Copy link
Author

joeybenamy commented May 7, 2025

which version of EMR are you using? Just so we know, because some of the old ones and the new ones are not compatible. need to make sure we work on them on other hardware first.

My apologies. I've added the application versions for Amazon EMR to the issue. We were using EMR 7.4.0 and discovered it did not have ARM64 support so we upgraded to 7.5.0 and got further but hit runtime issues. For x86_64 we are running Spark NLP on EMR 7.4.0 with no issues.

@maziyarpanahi
Copy link
Member

thanks @joeybenamy, we will have a look and get back to you

@joeybenamy
Copy link
Author

thanks @joeybenamy, we will have a look and get back to you

Appreciate it!

@DevinTDHa
Copy link
Member

Hi @joeybenamy, thanks for reporting.

Would it be possible for you to share more information? In particular:

  1. What are the steps for reproduction?
  2. What kind of runtime error are you getting?

I'm afraid the attached logs are not very descriptive. There is only an indication of an error in a user application, but no specifics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants