Skip to content

Error while running distributed inception tensorflow on 2 machines #803

@deepali-c

Description

@deepali-c

Please let us know which model this issue is about (specify the top-level directory)

inception

I executed inception on a distributed TF cluster with 1 ps and 2 workers on localhost.
Next I am trying to run the sample on a distributed TF cluster with ps on one machine and 2 workers on another machine. In this case I am getting errors.

The following error is seen on worker 0:
tensorflow.python.framework.errors.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/imagenet_train/model.ckpt-1362

The errors on worker 1 are:

W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/imagenet_train/model.ckpt-1362
W tensorflow/core/framework/op_kernel.cc:975] Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /tmp/imagenet_train/model.ckpt-1362

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions