Apache Spark with XtreemFS

In order to use Spark with XtreemFS you have to configure an Hadoop 2.x installation with XtreemFS as file system (see "Hadoop integration" in XtreemFS user guide) and a Spark installation with YARN support (see https://spark.apache.org/docs/latest/running-on-yarn.html).

Use bin/spark-submit with the following option to run a job:

  1. --jars <Path to XtreemFSHadoopClient.jar>
  2. --master yarn-client or --master yarn-cluster

You can test your installation by running a Spark example, e.g:

./bin/spark-submit --jars lib/XtreemFSHadoopClient.jar --master yarn-client examples/src/main/python/wordcount.py /test.txt

For more configuration details for Spark-on-YARN see https://spark.apache.org/docs/latest/running-on-yarn.html