New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix remaining tfds datasets bugs on windows #1914
Conversation
New results See pytest results
|
New results See pytest results
|
New results See pytest results
|
New results See pytest results
|
New results See pytest results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you !
@@ -106,7 +106,8 @@ def _info(self): | |||
'probe': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'scanner': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'target': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'timestamp_id': tfds.features.Tensor(shape=(), dtype=tf.uint32), | |||
# Use tf.uint64 to prevent possible overflow on windows `sys.maxsize` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide more context on this one ? uint32 should be system independent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the following stack trace
See pytest results
ERROR: test_download_and_prepare_as_dataset (__main__.DukeUltrasoundTest)
test_download_and_prepare_as_dataset (__main__.DukeUltrasoundTest)
Run the decorated test method.
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\testing\test_utils.py", line 198, in decorated
f(self, *args, **kwargs)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\testing\dataset_builder_testing.py", line 298, in test_download_and_prepare_as_dataset
self._download_and_prepare_as_dataset(self.builder)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\testing\dataset_builder_testing.py", line 359, in _download_and_prepare_as_dataset
builder.download_and_prepare(download_config=download_config)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\api_utils.py", line 69, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\dataset_builder.py", line 363, in download_and_prepare
download_config=download_config)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\dataset_builder.py", line 996, in _download_and_prepare
max_examples_per_split=download_config.max_examples_per_split,
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\dataset_builder.py", line 928, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\dataset_builder.py", line 1012, in _prepare_split
example = self.info.features.encode_example(record)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\features\features_dict.py", line 170, in encode_example
in utils.zip_dict(self._feature_dict, example_dict)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\features\features_dict.py", line 169, in <dictcomp>
for k, (feature, example_value)
File "C:\Users\VIJAY\Desktop\GitHub_Repos\datasets\tensorflow_datasets\core\features\feature.py", line 541, in encode_example
example_data = np.array(example_data, dtype=np_dtype)
OverflowError: Python int too large to convert to C long
----------------------------------------------------------------------
Ran 5 tests in 1.021s
I am using Windows 10 64-bit Operating System. python3.7.7
The dataset prepared successfully when replaced tf.uint32
with tf.uint64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seems to be the right place to fix this.
The error is raised in feature.py", line 541, in encode_example, so might be an issue with np_dtype, or similar. What is the int value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this. I'm gonna do a partial merge of this to fix the os
issues. The uint seems a different problem so I would prefer better understand the issue first before fixing it.
@@ -106,7 +106,8 @@ def _info(self): | |||
'probe': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'scanner': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'target': tfds.features.Tensor(shape=(), dtype=tf.string), | |||
'timestamp_id': tfds.features.Tensor(shape=(), dtype=tf.uint32), | |||
# Use tf.uint64 to prevent possible overflow on windows `sys.maxsize` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seems to be the right place to fix this.
The error is raised in feature.py", line 541, in encode_example, so might be an issue with np_dtype, or similar. What is the int value ?
Fixed
tfds\image
,tfds\obj_dec
,tfds\structured
,tfds\text
,tfds\translate
.Fix #1901
See comments of #1911 for old results.