Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset c++ extend documentation is outdated for tf 2.0 & DatasetV2 #27355

Closed
vrince opened this issue Apr 1, 2019 · 5 comments
Closed

Dataset c++ extend documentation is outdated for tf 2.0 & DatasetV2 #27355

vrince opened this issue Apr 1, 2019 · 5 comments
Assignees
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:docs-bug Document issues

Comments

@vrince
Copy link

vrince commented Apr 1, 2019

System information

Describe the documentation issue

The C++ code to extend Dataset (especially since DatasetV2) is outdated.

Here is a working version of files needed in the documentation : https://github.com/vrince/tensorflow_addons/tree/master/tensorflow_addons/dataset

NOTE: the only part I am not really sure about is this one : https://github.com/vrince/tensorflow_addons/blob/master/tensorflow_addons/dataset/cc/my_dataset.cpp#L76 ... basically let it as it was but I don't see the point.

There is also and external test running from python and bazel files.

Not sure where or if I even can do a pull request to change the doc.

@achandraa achandraa self-assigned this Apr 3, 2019
@achandraa achandraa added TF 2.0 Issues relating to TensorFlow 2.0 type:docs-bug Document issues comp:ops OPs related issues labels Apr 3, 2019
@achandraa achandraa assigned jvishnuvardhan and unassigned achandraa Apr 5, 2019
@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Apr 5, 2019
@jvishnuvardhan
Copy link
Contributor

Thanks @vrince. Could you be little more specific and explain which part of the doc need to be changed? Thanks!

@jvishnuvardhan jvishnuvardhan added comp:data tf.data related issues and removed comp:ops OPs related issues labels Apr 5, 2019
@vrince
Copy link
Author

vrince commented Apr 15, 2019

Hi ! Sorry for the delay ... It a little hard for me to provide meaningful diff in issues comment. Can you point me to the source of the doc so I'll patch it and send you the difference ?

@jvishnuvardhan
Copy link
Contributor

@vrince Do you want to modify the text in this webpage ? or change codes like dataset_ops.py, etc.? Thanks!

@vrince
Copy link
Author

vrince commented Apr 16, 2019

Basically what need to be changed is the webpage itself. Here is what I changed for the dataset.cpp file :

 #include "tensorflow/core/framework/op.h"
 #include "tensorflow/core/framework/shape_inference.h"
 
-namespace myproject
-{
-namespace
-{
-
 using ::tensorflow::DT_STRING;
 using ::tensorflow::PartialTensorShape;
 using ::tensorflow::Status;
 
-class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
+class MyReaderDatasetOp : public tensorflow::data::DatasetOpKernel
 {
   public:
-    MyReaderDatasetOp(tensorflow::OpKernelConstruction *ctx)
+    explicit MyReaderDatasetOp(tensorflow::OpKernelConstruction *ctx)
         : DatasetOpKernel(ctx)
     {
         // Parse and validate any attrs that define the dataset using
@@ -23,7 +18,7 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
     }
 
     void MakeDataset(tensorflow::OpKernelContext *ctx,
-                     tensorflow::DatasetBase **output) override
+                     tensorflow::data::DatasetBase **output) override
     {
         // Parse and validate any input tensors that define the dataset using
         // `ctx->input()` or the utility function
@@ -35,13 +30,13 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
     }
 
   private:
-    class Dataset : public tensorflow::GraphDatasetBase
+    class Dataset : public tensorflow::DatasetBase
     {
       public:
-        Dataset(tensorflow::OpKernelContext *ctx) : GraphDatasetBase(ctx) {}
+        Dataset(tensorflow::OpKernelContext *ctx) : tensorflow::data::DatasetBase(tensorflow::data::DatasetContext(ctx)) {}
 
         std::unique_ptr<tensorflow::IteratorBase> MakeIteratorInternal(
-            const string &prefix) const override
+            const std::string &prefix) const
         {
             return std::unique_ptr<tensorflow::IteratorBase>(new Iterator(
                 {this, tensorflow::strings::StrCat(prefix, "::MyReader")}));
@@ -57,6 +52,7 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
             static auto *const dtypes = new tensorflow::DataTypeVector({DT_STRING});
             return *dtypes;
         }
+
         const std::vector<PartialTensorShape> &output_shapes() const override
         {
             static std::vector<PartialTensorShape> *shapes =
@@ -64,15 +60,16 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
             return *shapes;
         }
 
-        string DebugString() const override { return "MyReaderDatasetOp::Dataset"; }
+        std::string DebugString() const override { return "MyReaderDatasetOp::Dataset"; }
 
       protected:
         // Optional: Implementation of `GraphDef` serialization for this dataset.
         //
         // Implement this method if you want to be able to save and restore
         // instances of this dataset (and any iterators over it).
-        Status AsGraphDefInternal(DatasetGraphDefBuilder *b,
-                                  tensorflow::Node **output) const override
+        Status AsGraphDefInternal(tensorflow::SerializationContext *ctx,
+                                  DatasetGraphDefBuilder *b,
+                                  tensorflow::Node **output) const
         {
             // Construct nodes to represent any of the input tensors from this
             // object's member variables using `b->AddScalar()` and `b->AddVector()`.
@@ -85,8 +82,8 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
         class Iterator : public tensorflow::DatasetIterator<Dataset>
         {
           public:
-            explicit Iterator(const Params &params)
-                : DatasetIterator<Dataset>(params), i_(0) {}
+            explicit Iterator(const Params &params) : DatasetIterator<Dataset>(params),
+                                                      i_(0) {}
 
             // Implementation of the reading logic.
             //
@@ -111,7 +108,7 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
                 {
                     // Create a scalar string tensor and add it to the output.
                     tensorflow::Tensor record_tensor(ctx->allocator({}), DT_STRING, {});
-                    record_tensor.scalar<string>()() = "MyReader!";
+                    record_tensor.scalar<std::string>()() = "MyReader!";
                     out_tensors->emplace_back(std::move(record_tensor));
                     ++i_;
                     *end_of_sequence = false;
@@ -145,7 +142,7 @@ class MyReaderDatasetOp : public tensorflow::DatasetOpKernel
 
           private:
             tensorflow::mutex mu_;
-            int64 i_ GUARDED_BY(mu_);
+            tensorflow::int64 i_ GUARDED_BY(mu_);
         };
     };
 };
@@ -164,6 +161,3 @@ REGISTER_OP("MyReaderDataset")
 // Register the kernel implementation for MyReaderDataset.
 REGISTER_KERNEL_BUILDER(Name("MyReaderDataset").Device(tensorflow::DEVICE_CPU),
                         MyReaderDatasetOp);
-
-} // namespace
-} // namespace myproject

Here the dataset_ops.py :

@@ -1,46 +1,25 @@
-import tensorflow as tf
 
-# Assumes the file is in the current working directory.
-my_reader_dataset_module = tf.load_op_library("./my_reader_dataset_op.so")
+"""Dataset ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
 
+import tensorflow as tf
+from tensorflow.python.platform import resource_loader
+from tensorflow.python.data.ops import dataset_ops
+from tensorflow.python.data.util import structure
+from tensorflow.python.framework import dtypes
 
-class MyReaderDataset(tf.data.Dataset):
+my_reader_dataset_module = tf.load_op_library(
+    resource_loader.get_path_to_datafile("_dataset_ops.so"))
 
-    def __init__(self):
-        super(MyReaderDataset, self).__init__()
-        # Create any input attrs or tensors as members of this class.
 
-    def _as_variant_tensor(self):
-        # Actually construct the graph node for the dataset op.
-        #
-        # This method will be invoked when you create an iterator on this dataset
-        # or a dataset derived from it.
-        return my_reader_dataset_module.my_reader_dataset()
-
-    # The following properties define the structure of each element: a scalar
-    # <a href="../../api_docs/python/tf#string"><code>tf.string</code></a> tensor. Change these properties to match the `output_dtypes()`
-    # and `output_shapes()` methods of `MyReaderDataset::Dataset` if you modify
-    # the structure of each element.
-    @property
-    def output_types(self):
-        return tf.string
+class MyReaderDataset(dataset_ops.DatasetSource):
 
-    @property
-    def output_shapes(self):
-        return tf.TensorShape([])
+    def __init__(self):
+        super(MyReaderDataset, self).__init__(
+            my_reader_dataset_module.my_reader_dataset())
 
     @property
-    def output_classes(self):
-        return tf.Tensor
-
-
-if __name__ == "__main__":
-    # Create a MyReaderDataset and print its elements.
-    with tf.Session() as sess:
-        iterator = MyReaderDataset().make_one_shot_iterator()
-        next_element = iterator.get_next()
-        try:
-            while True:
-                print(sess.run(next_element))  # Prints "MyReader!" ten times.
-        except tf.errors.OutOfRangeError:
-            pass
+    def _element_structure(self):
+        return structure.TensorStructure(dtypes.string, [])

To follow the rest of the documentation I created two files one to test the thing one to build it with bazel :

dataset_ops_test.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import tensorflow as tf

from tensorflow_addons.utils.python import test_utils

from tensorflow_addons.dataset import dataset_ops


class DatasetOpsTest(tf.test.TestCase):
    def test_dataset(self):
        dataset = dataset_ops.MyReaderDataset()
        i = 0
        for d in dataset:
            self.assertAllEqual(d, tf.constant("MyReader!"))
            i += 1
        self.assertEquals(i, 10)


if __name__ == "__main__":
    tf.test.main()

BUILD file

licenses(["notice"])  # Apache 2.0

package(default_visibility = ["//visibility:public"])

cc_binary(
    name = "_dataset_ops.so",
    srcs = [
        "cc/my_dataset.cpp"
    ],
    linkshared = 1,
    deps = [
        "@local_config_tf//:libtensorflow_framework",
        "@local_config_tf//:tf_header_lib",
    ],
    # see why -DNDEBUG https://github.com/tensorflow/tensorflow/issues/17316
    copts = ["-pthread", "-std=c++11", "-D_GLIBCXX_USE_CXX11_ABI=0", "-DNDEBUG"]
)

py_library(
    name = "dataset_ops_py",
    srcs = ([
        "__init__.py",
        "dataset_ops.py",
    ]),
    data = [
        ":_dataset_ops.so",
        "//tensorflow_addons/utils:utils_py",
    ],
    srcs_version = "PY2AND3",
)

py_test(
    name = "dataset_ops_test",
    size = "small",
    srcs = [
        "dataset_ops_test.py",
    ],
    main = "dataset_ops_test.py",
    deps = [
        ":dataset_ops_py",
    ],
    srcs_version = "PY2AND3"
)

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Apr 16, 2019
@jvishnuvardhan jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 16, 2019
@kumariko kumariko removed the TF 2.0 Issues relating to TensorFlow 2.0 label Dec 24, 2021
@MarkDaoust
Copy link
Member

This doc doesn't exist anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:data tf.data related issues stat:awaiting tensorflower Status - Awaiting response from tensorflower type:docs-bug Document issues
Projects
None yet
Development

No branches or pull requests

6 participants