support partially-known and unknown shape specification in decode_json

Hi,

I have been using `tfio.experimental.serialization.decode_json` along with `tf.data.TextLineDataset` to effectively train using newline separated json files. It feels to me like a slightly less efficient, but human readable, version of `tf.data.TFRecordDataset`. This has been a huge boon to my workflow, so first off thank you for this contribution! I have, however, run into a minor issue recently. I currently have data with variable length elements in the values. for example:

```
{"foo": [1, 2, 3, 4]}
{"foo": [1, 2, 3, 4, 5]}
```

In order to parse these records, I would expect to be able to do something like the following:

```python
import json

import tensorflow as tf
import tensorflow_io as tfio

r = json.dumps({"foo": [1, 2, 3, 4, 5]})


def parse_json(json_text):
    specs = {
        "foo": tf.TensorSpec(tf.TensorShape([None]), tf.int32)
    }
    parsed = tfio.experimental.serialization.decode_json(json_text, specs)
    return parsed["foo"]

parse_json(r)
```

However, I receive the following error:

```
2020-04-24 15:56:41.529651: W tensorflow/core/framework/op_kernel.cc:1632] OP_REQUIRES failed at serialization_kernels.cc:36 : Invalid argument: Shape [?] is not fully defined
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in parse_json
  File "/opt/miniconda/miniconda3/envs/insurance/lib/python3.6/site-packages/tensorflow_io/core/python/experimental/serialization_ops.py", line 74, in decode_json
    values = core_ops.io_decode_json(data, names, shapes, dtypes, name=name)
  File "<string>", line 6397, in io_decode_json
  File "<string>", line 6460, in io_decode_json_eager_fallback
  File "/opt/miniconda/miniconda3/envs/insurance/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Shape [?] is not fully defined [Op:IO>DecodeJSON]
```

If I specify the full length of the list element, parsing will work as expected:

```python
import json

import tensorflow as tf
import tensorflow_io as tfio

r = json.dumps({"foo": [1, 2, 3, 4, 5]})


def parse_json(json_text):
    specs = {
        "foo": tf.TensorSpec(tf.TensorShape([5]), tf.int32)
    }
    parsed = tfio.experimental.serialization.decode_json(json_text, specs)
    return parsed["foo"]

parse_json(r)
```
results in
```
<tf.Tensor: shape=(5,), dtype=int32, numpy=array([1, 2, 3, 4, 5], dtype=int32)>
```

I can currently hack around this by preprocessing my data ahead of time and padding everything to the same length then masking the padding elements, but having `decode_json` handle undefined shapes would save me time and just generally be much nicer :)

Environment information (in case it matters/helps):

```
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54)
[GCC 7.3.0] on linux
---
Name: tensorflow
Version: 2.1.0
---
Name: tensorflow-io
Version: 0.12.0
```

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support partially-known and unknown shape specification in decode_json #918

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support partially-known and unknown shape specification in decode_json #918

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions