Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
57 lines (41 sloc) 1.81 KB

TensorFlow on Hadoop

This document describes how to run TensorFlow on Hadoop using HDFS. You should know how to import data.


To use HDFS with TensorFlow, Use HDFS paths for reading and writing data, for example:

filename_queue = tf.train.string_input_producer([

To use the namenode specified in your HDFS configuration files, change the file prefix to hdfs://default/.

Set the following environment variables:

  • JAVA_HOME —Location of the Java installation.

  • HADOOP_HDFS_HOME —Location of the HDFS installation. The variable is optional if is available in LD_LIBRARY_PATH. This can also be set using:

    source ${HADOOP_HOME}/libexec/
  • LD_LIBRARY_PATH —Include the path to and, optionally, the path to, if your Hadoop distribution did not install in ${HADOOP_HDFS_HOME}/lib/native. On Linux:

    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server
  • CLASSPATH —The Hadoop jars must be added to the class path before running TensorFlow. It's not enough to set CLASSPATH using: ${HADOOP_HOME}/libexec/ Globs must be expanded, as described in the libhdfs documentation:

    CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath --glob) python

If the Hadoop cluster is in secure mode, set the following environment variable:

  • KRB5CCNAME —Path of Kerberos ticket cache file. For example:

    export KRB5CCNAME=/tmp/krb5cc_10002

If using Distributed TensorFlow, all workers must have Hadoop installed and the environment variables set.

You can’t perform that action at this time.