Skip to content

TF2 OD Training on AI Platform - TPU Distribution Strategy Failure. #9110

@Agiledom

Description

@Agiledom

Prerequisites

Please answer the following question for yourself before submitting an issue.

  • I checked to make sure that this issue has not been filed already.

1. The entire URL of the documentation with the issue

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md

2. Describe the issue(s)

  1. Under Google Cloud AI Platform, the documentation has "--python-version 3.6" as flags for the training and evaluation commands. Python 3.6 is not compatible with runtime version 2.1 (https://cloud.google.com/ai-platform/training/docs/runtime-version-list) and thus using this command throws a python compatibility error.

  2. The following was highlighted, but not solved in issue TPU distribution strategy fail: NodeDef expected inputs 'string' do not match 0 inputs specified #8457. I have compounded it with this documentation issue for brevity's sake, as both issues relate the same command. Upon running the google AI platform TPU training command with python 3.7 + Tensorflow (2.2+), one receive's the following error: InvalidArgumentError: NodeDef expected inputs 'string' do not match 0 inputs specified and the TPU distribution strategy fails. Full stack trace can be found in NotFoundError: models/research/object_detection/configs/tf2/centernet_hg104_512x512_kpts_coco17_tpu-32.config; No such file or directory #8839

Anyone any ideas? :D

Update: 15/08

From what I can infer from the errors, this is an incompatibility issue with the TPU’s using Runtime 2.1, which uses TF 2.1 and no runtime being available for TF2.2. Given the OD API uses TF2.2^, this seems to be incompatible with running on the AI platform, in contradiction with what the documentation describes.

Am I barking up the wrong tree? If not, maybe this issue should be tagged as a bug.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions