Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with file reading when using TPU enabled colab machine #64117

Closed
arnava13 opened this issue Mar 21, 2024 · 3 comments
Closed

Issue with file reading when using TPU enabled colab machine #64117

arnava13 opened this issue Mar 21, 2024 · 3 comments
Assignees
Labels
comp:tpus tpu, tpuestimator stat:awaiting response Status - Awaiting response from author TF 2.15 For issues related to 2.15.x type:bug Bug

Comments

@arnava13
Copy link

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15.0

Custom code

Yes

OS platform and distribution

Google Colab

Mobile device

No response

Python version

Python 3.10.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'm encountering an UnimplementedError related to the file system scheme '[local]' not being implemented when attempting to read files from the local filesystem in a Google Colab environment while using TPUs. The error occurs when trying to execute tf.io.read_file(file_path) within a data generator class, even when explicitly setting tf.device('/cpu:0') around the file reading operation.

This issue arises despite attempts to ensure file operations are directed to execute on the CPU within a TPU environment. The expected behavior is for TensorFlow to successfully read the local file when specified to do so on the CPU, even in a TPU context, without encountering a file system scheme error.

Full repository: https://github.com/arnava13/SeniorHonoursProject
(Training run in training_colab.ipynb on a colab machine)

Standalone code to reproduce the issue

import tensorflow as tf

class DataGenerator(tf.compat.v2.keras.utils.Sequence):
    def __init__(self, data_root, norm_data_name):
        self.data_root = data_root
        self.norm_data_path = tf.io.gfile.join(self.data_root, norm_data_name)
        
        # Reading and processing the file
        self.all_ks = self.read_file(self.norm_data_path, column_indices=column_indices, dtype=dtype)
        self.all_ks = tf.cast(self.all_ks, dtype=tf.float32)
        self.original_k_len = tf.cast(tf.size(self.all_ks).numpy(), tf.int32)
    
    @tf.function
    def read_file(self, file_path, *, column_indices=None, dtype=tf.float32):
        with tf.device('/cpu:0'):
            file_content = tf.io.read_file(file_path)

        # Process file content
        file_content = tf.strings.regex_replace(file_content, "\r\n", "\n")
        file_content = tf.strings.regex_replace(file_content, "\r", "\n")
        lines = tf.strings.split([file_content], '\n').values
        lines = tf.cond(tf.equal(lines[-1], ""), lambda: lines[:-1], lambda: lines)

        def extract_columns(line):
            # Normalize spaces
            line = tf.strings.regex_replace(line, "\s+", " ")
            line = tf.strings.strip(line)  # Strip whitespace
            columns = tf.strings.split([line], ' ').values
            if column_indices is not None:
                selected_columns = tf.gather(columns, column_indices)
            else:
                selected_columns = columns
            return tf.strings.to_number(selected_columns, out_type=dtype)

        columns_values = tf.map_fn(extract_columns, lines, fn_output_signature=dtype)
        if isinstance(columns_values, tf.RaggedTensor):
            columns_values = columns_values.to_tensor()
        return columns_values


tpu_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
tf.config.experimental_connect_to_cluster(tpu_resolver)
tf.tpu.experimental.initialize_tpu_system(tpu_resolver) 
strategy = tf.distribute.TPUStrategy(tpu_resolver)
FLAGS.TPU = True
('/data)
   

if FLAGS.TPU:
    with tf.device('/cpu:0'):
        training_generator = DataGenerator('/data', 'planck.txt')

Relevant log output

err: File "/content/SeniorHonoursProject/BaCoN-II/data_generator.py", line 824, in create_generators
err: training_generator = DataGenerator(partition['train'], labels, labels_dict, data_root = FLAGS.DIR, save_indexes=False, seed = seed, strategy=strategy, **params)
err: File "/content/SeniorHonoursProject/BaCoN-II/data_generator.py", line 170, in __init__
err: self.original_k_len = tf.cast(tf.size(self.all_ks).numpy(), tf.int32)
err: File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
err: raise e.with_traceback(filtered_tb) from None
err: File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/array_ops.py", line 861, in size_internal
err: num_elements = np.prod(input._shape_tuple(), dtype=np_out_type)  # pylint: disable=protected-access
err: tensorflow.python.framework.errors_impl.UnimplementedError: {{function_node __inference_read_file_213}} File system scheme '[local]' not implemented (file: 'data/ds_ee2_train_1k_equal_examples_rands/planck_ee2.txt')
err: [[{{node ReadFile}}]]
err: Exception ignored in atexit callback: <function async_wait at 0x7de5be890670>
err: Traceback (most recent call last):
err: File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/context.py", line 2833, in async_wait
err: context().sync_executors()
err: File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/context.py", line 749, in sync_executors
err: pywrap_tfe.TFE_ContextSyncExecutors(self._context_handle)
err: tensorflow.python.framework.errors_impl.UnimplementedError: {{function_node __inference_read_file_213}} File system scheme '[local]' not implemented (file: 'data/ds_ee2_train_1k_equal_examples_rands/planck_ee2.txt')
err: [[{{node ReadFile}}]]
@google-ml-butler google-ml-butler bot added the type:bug Bug label Mar 21, 2024
@tilakrayal tilakrayal added TF 2.15 For issues related to 2.15.x comp:tpus tpu, tpuestimator labels Mar 21, 2024
@tilakrayal
Copy link
Contributor

@arnava13,
Could you please try loading the file from local file when using TPU - read them as a normal python file.read() not with tf.io.

Also please have a look at this issue which is similar https://stackoverflow.com/questions/62870656/file-system-scheme-local-not-implemented-in-google-colab-tpu

tensorflow/models#8265 (comment)

Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label Mar 22, 2024
@arnava13
Copy link
Author

Thank you. Reverting to numpy I/O.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tpus tpu, tpuestimator stat:awaiting response Status - Awaiting response from author TF 2.15 For issues related to 2.15.x type:bug Bug
Projects
None yet
Development

No branches or pull requests

2 participants