Hadoop MapReduce InputFormat/OutputFormat for TFRecords

This directory contains a Apache Hadoop MapReduce InputFormat/OutputFormat implementation for TensorFlow's TFRecords format. This can also be used with Apache Spark.

Prerequisites

Apache Maven
Tested with Hadoop 2.6.0. Patches are welcome if there are incompatibilities with your Hadoop version.

Breaking changes

08/20/2018 - Reverted artifactId back to org.tensorflow.tensorflow-hadoop
05/29/2018 - Changed the artifactId from org.tensorflow.tensorflow-hadoop to org.tensorflow.hadoop

Build and install

Compile the code
```
mvn clean package
```
Alternatively, if you would like to build jars for a different version of TensorFlow, e.g., 1.5.0:
```
mvn versions:set -DnewVersion=1.5.0
mvn clean package
```

Optionally install (or deploy) the jars

mvn install

After installation (or deployment), the package can be used with the following dependency:

<dependency>
  <groupId>org.tensorflow</groupId>
  <artifactId>tensorflow-hadoop</artifactId>
  <version>1.10.0</version>
</dependency>

Use with MapReduce

The Hadoop MapReduce example can be found here.

Use with Apache Spark

The Spark-TensorFlow-Connector uses TensorFlow Hadoop to load and save TensorFlow's TFRecords format using Spark DataFrames.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Hadoop MapReduce InputFormat/OutputFormat for TFRecords

Prerequisites

Breaking changes

Build and install

Use with MapReduce

Use with Apache Spark

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hadoop MapReduce InputFormat/OutputFormat for TFRecords

Prerequisites

Breaking changes

Build and install

Use with MapReduce

Use with Apache Spark