Skip to content

Enable parsing arrays of arbitrary dimensions from json files#1125

Merged
yongtang merged 3 commits intotensorflow:masterfrom
djl11:master
Sep 19, 2020
Merged

Enable parsing arrays of arbitrary dimensions from json files#1125
yongtang merged 3 commits intotensorflow:masterfrom
djl11:master

Conversation

@djl11
Copy link
Copy Markdown

@djl11 djl11 commented Sep 17, 2020

Continues on from this PR, where this comment suggested to add higher dimensions at a later stage. This PR was originally made following this comment.

I'm not sure how much the recursive design will sacrifice speed with the extra function calls, but I think the readability and simplicity is improved, and I think it's safe to assume that json files are unlikely to contain very large arrays, as this would be an inefficient storage method for such data.

@djl11
Copy link
Copy Markdown
Author

djl11 commented Sep 17, 2020

It's worth noting that this doesn't work for nested variable length lists, to be converted to either padded Tensors or RaggedTensors. But I think this use case is rare.

Copy link
Copy Markdown
Member

@kvignesh1420 kvignesh1420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djl11 Thanks for the PR. Can you please fix the lint issues. You can use:
bazel run //tools/lint:lint from the root of the directory.

@djl11
Copy link
Copy Markdown
Author

djl11 commented Sep 17, 2020

@kvignesh1420 no problems. Sorry was not familiar with lint. Should be good now.

@djl11 djl11 changed the title Enabled parsing arrays from json files of arbitrary dimensions Enable parsing arrays from json files of arbitrary dimensions Sep 17, 2020
Comment on lines +99 to +105
static void getTensorShape(rapidjson::Value* entry,
std::vector<int64>& tensor_shape_vector) {
if (entry->IsArray()) {
tensor_shape_vector.push_back(entry->Size());
getTensorShape(&(*entry)[0], tensor_shape_vector);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djl11 if the entry is not an array, shouldn't we assign it a constant shape?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I missed this. So originally this was just set as 1. I will make the change now.

Comment thread tests/test_serialization_eager.py Outdated
Comment on lines +211 to +220
r = '{"x": [[[1.0], [2.0]]]}'

@tf.function(autograph=False)
def parse_json(json_text):
specs = {"x": tf.TensorSpec(tf.TensorShape([1, 1]), tf.float32)}
specs = {"x": tf.TensorSpec(tf.TensorShape([1, 2, 1]), tf.float32)}
parsed = tfio.experimental.serialization.decode_json(json_text, specs)
return parsed["x"]

v = parse_json(r)
assert np.array_equal(v, [[1.0]])
assert np.array_equal(v, [[[1.0], [2.0]]])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add additional test cases if possible? Maybe a combination of nested tensors with different data types within a single json. For example:

r = '{"x": [[[1.0], [2.0]]], "y": ['index', 'count'], "z": 0.5}'

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Will push a new commit shortly.

Copy link
Copy Markdown
Member

@kvignesh1420 kvignesh1420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Others PTAL.

@djl11 djl11 changed the title Enable parsing arrays from json files of arbitrary dimensions Enable parsing arrays of arbitrary dimensions from json files Sep 18, 2020
Copy link
Copy Markdown
Member

@yongtang yongtang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for the PR!

@yongtang yongtang merged commit c0db161 into tensorflow:master Sep 19, 2020
i-ony pushed a commit to i-ony/io that referenced this pull request Feb 8, 2021
…flow#1125)

* enabled parsing arrays from json files of arbitrary dimensions.

* fixed lint issues.

* increased scope of multi-dim tensor test, and allocated size 1 tensors for non-array json entries.
zheolong pushed a commit to zheolong/io-1 that referenced this pull request Jul 24, 2025
…flow#1125)

* enabled parsing arrays from json files of arbitrary dimensions.

* fixed lint issues.

* increased scope of multi-dim tensor test, and allocated size 1 tensors for non-array json entries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants