-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed FIFOQueue with shared_name is not shared #17050
Comments
Whew. How silly.
It's simple to change to work with MonitoredTrainingSession using FinalOpsHook, although it certainly isn't the prettiest way to go about doing things. |
@illeatmyhat (excellent username, BTW) have you resolved your own bug, or is there a remaining issue that you need help with? |
Yes, the issue is resolved. It can be closed now. |
System information
Shared Cluster
RHEL Server 7.2
pip install tensorflow-gpu
v1.5.0-0-g37aa430d84 1.5.0
3.6
9.0/7.0
N/A -- GPU not allocated
I am attempting to use a
FIFOQueue
to signal the parameter servers to shut down on a multi-machine shared cluster, based on this example. After some testing, I believe thatshared_name
simply doesn't seem to do anything--even after removing thedequeue()
operations, the number of elements in theFIFOQueue
don't correlate to the number of workers.Minimum Reproducible Code
The text was updated successfully, but these errors were encountered: