These examples show how to run against a Kubernetes cluster:
- spark-on-k8s-operator for running on Kubernetes with the spark-on-k8s-operator.
- kubernetes-argo for running on Kubernetes and deploy jobs using Argo workflows. Argo allows the definition of dependencies between jobs which can be useful for complex workflows or jobs which require different service-account permissions to execute.
- jupyterhub-for-kubernetes for how to run JupyterHub on Kubernetes to allow end-users to start their own JupyterLab instances to build Arc jobs.
These are example Terraform scripts to demonstrate how to execute Arc against a remote cluster.
- aws-single-master for a single instance.
- aws-cluster for a multi-instance cluster.
- aws-fargate-single for a serverless option.
These both assume the default
security group has access to SSH to your EC2 instances. There are sample user-data-*
scripts in the ./templates
which have commands for mounting the local SSD of certain instance types to /data
. If used the following docker run
commands should be used:
-v /data/local:/local \
-e "SPARK_LOCAL_DIRS=/local" \
-e "SPARK_WORKER_DIR=/local" \