Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the example for spark-tensorflow-distributor #166

Merged
merged 2 commits into from
Jul 15, 2020

Conversation

liangz1
Copy link
Contributor

@liangz1 liangz1 commented Jul 15, 2020

This PR fixes the data downloading issue in the example code.

Reproduce: On a cluster with multiple GPUs per worker node, with spark.resources.tasks.gpu.amount set to 1, running the original example will trigger an error related to data downloading.

Cause: There will be multiple tasks running on the same worker and each task will try to write the data to the same path, which will corrupt the data.

Fix: Randomize the file path.

@jhseu jhseu merged commit 8d96a9f into tensorflow:master Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants