Skip to content

ValueError: If "cluster" is set in TF_CONFIG, it must have one "chief" node. #386

@manuzhang

Description

@manuzhang

Environment:

  • Python version 3.6
  • Spark version 2.3.1
  • TensorFlow version 1.7.0
  • TensorFlowOnSpark version 1.3.2
  • Cluster version Standalone

Describe the bug:
There is no chief node in TF_CONFIG

Logs:

[2019-01-09 17:07:15.496] [ERROR] [Executor task launch worker for task 7] [org.apache.spark.executor.Executor] >>> [spark-] msg=Exception in task 7.0 in stage 0.0 (TID 7)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 234, in main
    process()
  File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 229, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
    return func(split, prev_func(split, iterator))
  File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
    return func(split, prev_func(split, iterator))
  File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 2457, in pipeline_func
    return func(split, prev_func(split, iterator))
  File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 370, in func
    return f(iterator)
  File "/home/vipshop/platform/spark/python/pyspark/rdd.py", line 819, in func
    r = f(it)
  File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 353, in _mapfn
    wrapper_fn(tf_args, ctx)
  File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 310, in wrapper_fn
    fn(args, context)
  File "<ipython-input-2-858eadfad2b6>", line 51, in train
  File "<ipython-input-2-858eadfad2b6>", line 9, in build_estimator
  File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 465, in __init__
    self._init_distributed_setting_from_environment_var(tf_config)
  File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 481, in _init_distributed_setting_from_environment_var
    self._cluster_spec, task_env, TaskType.CHIEF)
  File "/home/mlp/.local/lib/python3.6/site-packages/tensorflow/python/estimator/run_config.py", line 154, in _validate_task_type_and_task_id
    chief_task_type)
ValueError: If "cluster" is set in TF_CONFIG, it must have one "chief" node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions