-
Notifications
You must be signed in to change notification settings - Fork 942
Closed
Description
Environment:
- Python version 3.6
- Spark version 2.3.1
- TensorFlow version 1.7.0
- TensorFlowOnSpark version 1.3.2
- Cluster version Standalone
Describe the bug:
There is no chief node in TF_CONFIG
Logs:
[2019-01-09 17:07:15.496] [ERROR] [Executor task launch worker for task 7] [org.apache.spark.executor.Executor] >>> [spark-] msg=Exception in task 7.0 in stage 0.0 (TID 7)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 234, in main
process()
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 229, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
return func(split, prev_func(split, iterator))
File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
return func(split, prev_func(split, iterator))
File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
return func(split, prev_func(split, iterator))
File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 370, in func
return f(iterator)
File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 819, in func
r = f(it)
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 353, in _mapfn
wrapper_fn(tf_args, ctx)
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 310, in wrapper_fn
fn(args, context)
File "<ipython-input-2-858eadfad2b6>", line 51, in train
File "<ipython-input-2-858eadfad2b6>", line 9, in build_estimator
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 465, in __init__
self._init_distributed_setting_from_environment_var(tf_config)
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 481, in _init_distributed_setting_from_environment_var
self._cluster_spec, task_env, TaskType.CHIEF)
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 154, in _validate_task_type_and_task_id
chief_task_type)
ValueError: If "cluster" is set in TF_CONFIG, it must have one "chief" node.
Metadata
Metadata
Assignees
Labels
No labels