Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TF example #98

Merged
merged 2 commits into from
Apr 26, 2019
Merged

Add TF example #98

merged 2 commits into from
Apr 26, 2019

Conversation

TommyLike
Copy link
Contributor

See above

@TommyLike
Copy link
Contributor Author

@asifdxtreme Related image volcanosh/example-tf has been uploaded to docker hub already, please have a try.

@asifdxtreme
Copy link
Contributor

@TommyLike Thanks..I will check it now. Do we need to take the latest volcano or 0.1.0 version of volcano should be fine?

@TommyLike
Copy link
Contributor Author

@asifdxtreme version is not strict limited, the example only needs the basic volcano feature and the env plugins.

@asifdxtreme
Copy link
Contributor

/lgtm

@asifdxtreme
Copy link
Contributor

asifdxtreme commented Apr 25, 2019

I tried this example and it works fine

k log tensorflow-benchmark-worker-0 --tail=20 -f
log is DEPRECATED and will be removed in a future version. Use logs instead.
2019-04-25 06:23:26.245134: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
E0425 06:23:26.245643820      15 ev_epoll1_linux.c:1051]     grpc epoll fd: 3
2019-04-25 06:23:26.252983: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> tensorflow-benchmark-ps-0.tensorflow-benchmark:2222}
2019-04-25 06:23:26.253090: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:2222}
2019-04-25 06:23:26.253464: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:2222
TensorFlow:  1.5
Model:       resnet50
Mode:        training
SingleSess:  False
Batch size:  32 global
             32 per device
Devices:     ['/job:worker/task:0/cpu:0']
Data format: NHWC
Optimizer:   sgd
Variables:   parameter_server
Sync:        True
==========
Generating model
2019-04-25 06:23:40.545398: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:23:50.545612: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:00.545830: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:10.546048: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:20.546275: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:30.546500: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:40.546731: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:24:50.546958: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:00.547178: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:10.547410: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:20.547638: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:30.547854: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:40.548082: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:25:50.548296: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:26:00.548519: I tensorflow/core/distributed_runtime/master.cc:221] CreateSession still waiting for response from worker: /job:ps/replica:0/task:0
2019-04-25 06:26:10.381635: I tensorflow/core/distributed_runtime/master_session.cc:1008] Start master session cdcdb85bfcadfc7e with config: intra_op_parallelism_threads: 1 gpu_options { force_gpu_compatible: true } allow_soft_placement: true
Running warm up
Done warm up
Step	Img/sec	loss
1	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	10.224
10	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.759
20	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.227
30	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.199
40	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	9.441
50	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.111
60	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.722
70	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	7.964
80	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	7.839
90	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	7.953
100	images/sec: 0.6 +/- 0.0 (jitter = 0.0)	8.004
----------------------------------------------------------------
total images/sec: 0.57
----------------------------------------------------------------

@k82cn
Copy link
Member

k82cn commented Apr 25, 2019

/lgtm

@k82cn k82cn merged commit dd2bb60 into volcano-sh:master Apr 26, 2019
kevin-wangzefeng pushed a commit to kevin-wangzefeng/volcano that referenced this pull request Apr 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants