Apache Flink with XtreemFS

Robert Schmidtke edited this page May 11, 2015 · 4 revisions

Flink running on YARN

Requires:

  • Hadoop 2.x installation with XtreemFS as file system (see "Hadoop integration" in XtreemFS user guide)
  • Flink 0.8 or higher

Follow these steps to use Flink with XtreemFS:

  1. Copy the XtreemFSHadoopClient.jar to $FLINK_HOME/lib/
  2. Start Hadoop YARN servers:$HADOOP_HOME/sbin/start-yarn.sh
  3. Start a Flink YARN session: $FLINK_HOME/bin/yarn-session.sh -n <number of YARN containers>

Now you can run a Flink example to check if everything is running correctly:

$FLINK_HOME/bin/flink run examples/flink-java-examples-0.8.0-WordCount.jar \ 
   xtreemfs://localhost:32638/test.txt \
   xtreemfs://localhost:32638/result.txt

Flink Standalone

Requires:

  • Flink 0.8 or higher

  1. Copy the XtreemFSHadoopClient.jar to $FLINK_HOME/lib/
  2. Create a core-site.xml file as described in the "Hadoop integration" section in the XtreemFS user guide (the fs.default.namekey is not needed in this setup)
  3. Point the fs.hdfs.hadoopconf key in $FLINK_HOME/conf/flink-conf.yaml to the directory containing core-site.xml (note this path has to be absolute)
  4. Start a Flink Standalone session, e.g. $FLINK_HOME/bin/start-local.sh

Now you can run a Flink example to check if everything is running correctly:

$FLINK_HOME/bin/flink run examples/flink-java-examples-0.8.0-WordCount.jar \ 
   xtreemfs://localhost:32638/test.txt \
   xtreemfs://localhost:32638/result.txt

When running a Flink job on its own (i.e. executing its main method directly), in order for Flink to correctly handle xtreemfs://DIR:PORT/... URIs in your source code, you need to make sure that XtreemFSHadoopClient.jar is in the classpath, and you need to manually load the flink-conf.yaml file using GlobalConfiguration.loadConfiguration("path/to/flink-conf.yaml").