Apache Spark with XtreemFS

Christoph Kleineweber edited this page Feb 3, 2015 · 4 revisions

In order to use Spark with XtreemFS you have to configure an Hadoop 2.x installation with XtreemFS as file system (see "Hadoop integration" in XtreemFS user guide) and a Spark installation with YARN support (see https://spark.apache.org/docs/latest/running-on-yarn.html).

Use bin/spark-submit with the following option to run a job:

  1. --jars <Path to XtreemFSHadoopClient.jar>
  2. --master yarn-client or --master yarn-cluster

You can test your installation by running a Spark example, e.g:

./bin/spark-submit --jars lib/XtreemFSHadoopClient.jar --master yarn-client examples/src/main/python/wordcount.py /test.txt

For more configuration details for Spark-on-YARN see https://spark.apache.org/docs/latest/running-on-yarn.html

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.