Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLA: Cannot get IR on TPU #49702

Open
hjmus opened this issue May 25, 2021 · 3 comments
Open

XLA: Cannot get IR on TPU #49702

hjmus opened this issue May 25, 2021 · 3 comments
Assignees
Labels
comp:tpus tpu, tpuestimator comp:xla XLA stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.5 Issues related to TF 2.5 type:bug Bug

Comments

@hjmus
Copy link

hjmus commented May 25, 2021

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab environment
  • TensorFlow installed from (source or binary): Google Colab environment
  • TensorFlow version (use command below): 2.5.0
  • Python version: 3.7

Describe the current behavior
Code is in https://github.com/hjmus/mock/blob/master/TPUs_in_Colab.ipynb. I was trying to get HLO IR when running tf.function on TPU, by using experimental_get_compiler_ir(), but run into the following error:

ValueError: No matching device found for '/job:worker/replica:0/task:0/device:TPU:1'

Describe the expected behavior
Be able to get IR when running on TPU.

Standalone code to reproduce the issue
https://github.com/hjmus/mock/blob/master/TPUs_in_Colab.ipynb

Other info / logs
The purpose of this exercise is to check out the IR generated for GSPMD (https://arxiv.org/pdf/2105.04663.pdf).

#comp:xla #XLA

@hjmus hjmus added the type:bug Bug label May 25, 2021
@hjmus hjmus changed the title Cannot get IR on TPU XLA: Cannot get IR on TPU May 25, 2021
@tilakrayal tilakrayal added comp:xla XLA comp:tpus tpu, tpuestimator TF 2.5 Issues related to TF 2.5 labels May 26, 2021
@tilakrayal
Copy link
Contributor

I was able to reproduce the issue in tf v 2.5 and nightly,in v2.4 i am facing different error.Please find the gist of it here.

@tilakrayal tilakrayal assigned Saduf2019 and unassigned tilakrayal May 26, 2021
@Saduf2019
Copy link
Contributor

@hjmus
Can you please refer to this comment and let us know if it helps.

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label May 26, 2021
@hjmus
Copy link
Author

hjmus commented May 26, 2021

@Saduf2019

Hi Saduf2019, I was already using the suggested API:

try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
  raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

tf.config.experimental_connect_to_cluster(tpu)  # <<= here
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

@Saduf2019 Saduf2019 removed the stat:awaiting response Status - Awaiting response from author label May 26, 2021
@Saduf2019 Saduf2019 assigned rmothukuru and unassigned Saduf2019 May 26, 2021
@rmothukuru rmothukuru assigned saeta and unassigned rmothukuru May 27, 2021
@rmothukuru rmothukuru added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:tpus tpu, tpuestimator comp:xla XLA stat:awaiting tensorflower Status - Awaiting response from tensorflower TF 2.5 Issues related to TF 2.5 type:bug Bug
Projects
None yet
Development

No branches or pull requests

6 participants