Skip to content
Branch: master
Find file History
skavulya and jhseu Revert artifact IDs for Hadoop and Spark jars (#98)
Revert the artifact IDs to tensorflow-hadoop and tensorflow-spark-connector
Latest commit 76a7cd0 Aug 21, 2018
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
src Fixing javaDocs errors (#96) Aug 14, 2018
README.md
pom.xml

README.md

Hadoop MapReduce InputFormat/OutputFormat for TFRecords

This directory contains a Apache Hadoop MapReduce InputFormat/OutputFormat implementation for TensorFlow's TFRecords format. This can also be used with Apache Spark.

Prerequisites

  1. Apache Maven

  2. Tested with Hadoop 2.6.0. Patches are welcome if there are incompatibilities with your Hadoop version.

Breaking changes

  • 08/20/2018 - Reverted artifactId back to org.tensorflow.tensorflow-hadoop
  • 05/29/2018 - Changed the artifactId from org.tensorflow.tensorflow-hadoop to org.tensorflow.hadoop

Build and install

  1. Compile the code

    mvn clean package

    Alternatively, if you would like to build jars for a different version of TensorFlow, e.g., 1.5.0:

    mvn versions:set -DnewVersion=1.5.0
    mvn clean package
  2. Optionally install (or deploy) the jars

    mvn install

    After installation (or deployment), the package can be used with the following dependency:

    <dependency>
      <groupId>org.tensorflow</groupId>
      <artifactId>tensorflow-hadoop</artifactId>
      <version>1.10.0</version>
    </dependency>

Use with MapReduce

The Hadoop MapReduce example can be found here.

Use with Apache Spark

The Spark-TensorFlow-Connector uses TensorFlow Hadoop to load and save TensorFlow's TFRecords format using Spark DataFrames.

You can’t perform that action at this time.