-
Notifications
You must be signed in to change notification settings - Fork 306
Description
Hi,
I have been using tfio.experimental.serialization.decode_json along with tf.data.TextLineDataset to effectively train using newline separated json files. It feels to me like a slightly less efficient, but human readable, version of tf.data.TFRecordDataset. This has been a huge boon to my workflow, so first off thank you for this contribution! I have, however, run into a minor issue recently. I currently have data with variable length elements in the values. for example:
{"foo": [1, 2, 3, 4]}
{"foo": [1, 2, 3, 4, 5]}
In order to parse these records, I would expect to be able to do something like the following:
import json
import tensorflow as tf
import tensorflow_io as tfio
r = json.dumps({"foo": [1, 2, 3, 4, 5]})
def parse_json(json_text):
specs = {
"foo": tf.TensorSpec(tf.TensorShape([None]), tf.int32)
}
parsed = tfio.experimental.serialization.decode_json(json_text, specs)
return parsed["foo"]
parse_json(r)However, I receive the following error:
2020-04-24 15:56:41.529651: W tensorflow/core/framework/op_kernel.cc:1632] OP_REQUIRES failed at serialization_kernels.cc:36 : Invalid argument: Shape [?] is not fully defined
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in parse_json
File "/opt/miniconda/miniconda3/envs/insurance/lib/python3.6/site-packages/tensorflow_io/core/python/experimental/serialization_ops.py", line 74, in decode_json
values = core_ops.io_decode_json(data, names, shapes, dtypes, name=name)
File "<string>", line 6397, in io_decode_json
File "<string>", line 6460, in io_decode_json_eager_fallback
File "/opt/miniconda/miniconda3/envs/insurance/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [?] is not fully defined [Op:IO>DecodeJSON]
If I specify the full length of the list element, parsing will work as expected:
import json
import tensorflow as tf
import tensorflow_io as tfio
r = json.dumps({"foo": [1, 2, 3, 4, 5]})
def parse_json(json_text):
specs = {
"foo": tf.TensorSpec(tf.TensorShape([5]), tf.int32)
}
parsed = tfio.experimental.serialization.decode_json(json_text, specs)
return parsed["foo"]
parse_json(r)results in
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([1, 2, 3, 4, 5], dtype=int32)>
I can currently hack around this by preprocessing my data ahead of time and padding everything to the same length then masking the padding elements, but having decode_json handle undefined shapes would save me time and just generally be much nicer :)
Environment information (in case it matters/helps):
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] on linux
---
Name: tensorflow
Version: 2.1.0
---
Name: tensorflow-io
Version: 0.12.0
Thanks again!