Enable parsing arrays of arbitrary dimensions from json files#1125
Enable parsing arrays of arbitrary dimensions from json files#1125yongtang merged 3 commits intotensorflow:masterfrom
Conversation
|
It's worth noting that this doesn't work for nested variable length lists, to be converted to either padded Tensors or RaggedTensors. But I think this use case is rare. |
There was a problem hiding this comment.
@djl11 Thanks for the PR. Can you please fix the lint issues. You can use:
bazel run //tools/lint:lint from the root of the directory.
|
@kvignesh1420 no problems. Sorry was not familiar with lint. Should be good now. |
| static void getTensorShape(rapidjson::Value* entry, | ||
| std::vector<int64>& tensor_shape_vector) { | ||
| if (entry->IsArray()) { | ||
| tensor_shape_vector.push_back(entry->Size()); | ||
| getTensorShape(&(*entry)[0], tensor_shape_vector); | ||
| } | ||
| } |
There was a problem hiding this comment.
@djl11 if the entry is not an array, shouldn't we assign it a constant shape?
There was a problem hiding this comment.
Yes I missed this. So originally this was just set as 1. I will make the change now.
| r = '{"x": [[[1.0], [2.0]]]}' | ||
|
|
||
| @tf.function(autograph=False) | ||
| def parse_json(json_text): | ||
| specs = {"x": tf.TensorSpec(tf.TensorShape([1, 1]), tf.float32)} | ||
| specs = {"x": tf.TensorSpec(tf.TensorShape([1, 2, 1]), tf.float32)} | ||
| parsed = tfio.experimental.serialization.decode_json(json_text, specs) | ||
| return parsed["x"] | ||
|
|
||
| v = parse_json(r) | ||
| assert np.array_equal(v, [[1.0]]) | ||
| assert np.array_equal(v, [[[1.0], [2.0]]]) |
There was a problem hiding this comment.
can you add additional test cases if possible? Maybe a combination of nested tensors with different data types within a single json. For example:
r = '{"x": [[[1.0], [2.0]]], "y": ['index', 'count'], "z": 0.5}'There was a problem hiding this comment.
Sure! Will push a new commit shortly.
…s for non-array json entries.
…flow#1125) * enabled parsing arrays from json files of arbitrary dimensions. * fixed lint issues. * increased scope of multi-dim tensor test, and allocated size 1 tensors for non-array json entries.
…flow#1125) * enabled parsing arrays from json files of arbitrary dimensions. * fixed lint issues. * increased scope of multi-dim tensor test, and allocated size 1 tensors for non-array json entries.
Continues on from this PR, where this comment suggested to add higher dimensions at a later stage. This PR was originally made following this comment.
I'm not sure how much the recursive design will sacrifice speed with the extra function calls, but I think the readability and simplicity is improved, and I think it's safe to assume that json files are unlikely to contain very large arrays, as this would be an inefficient storage method for such data.