Apache Spark with XtreemFS

Christoph Kleineweber edited this page Feb 3, 2015 · 4 revisions

In order to use Spark with XtreemFS you have to configure an Hadoop 2.x installation with XtreemFS as file system (see "Hadoop integration" in XtreemFS user guide) and a Spark installation with YARN support (see https://spark.apache.org/docs/latest/running-on-yarn.html).

Use bin/spark-submit with the following option to run a job:

  1. --jars <Path to XtreemFSHadoopClient.jar>
  2. --master yarn-client or --master yarn-cluster

You can test your installation by running a Spark example, e.g:

./bin/spark-submit --jars lib/XtreemFSHadoopClient.jar --master yarn-client examples/src/main/python/wordcount.py /test.txt

For more configuration details for Spark-on-YARN see https://spark.apache.org/docs/latest/running-on-yarn.html