Problem with InfraValidator: RPC Error StatusCode.UNAVAILABLE #2914

dbustosp · 2020-12-03T05:16:12Z

I have been trying to use InfraValidator component on local environment. I have tried 3 different ways: Interactive, Airflow and Beam. For all of them I am getting the same stack error.

It is strange because the container is triggered, the model is loaded successfully but I got confusing error. Finally the model is blessed by the component.

How I am using it:

infra_validator = InfraValidator(
        model = trainer.outputs['model'],
        serving_spec = infra_validator_pb2.ServingSpec(
            tensorflow_serving = infra_validator_pb2.TensorFlowServing(
                tags=['latest']
            ),
            local_docker = infra_validator_pb2.LocalDockerConfig(
            )
        ),
        validation_spec = infra_validator_pb2.ValidationSpec(
            max_loading_time_seconds=60,
            num_tries=2
        ),
        request_spec = infra_validator_pb2.RequestSpec(
            tensorflow_serving = infra_validator_pb2.TensorFlowServingRequestSpec(
                signature_names = ['classification']
            ),
            num_examples=10  # How many requests to make.
        )
)

The logs and error in between: "<_InactiveRpcError.."

INFO:absl:Starting infra validation (attempt 1/2).
INFO:absl:Starting LocalDockerRunner(image: tensorflow/serving:latest).
INFO:absl:Running container with parameter {'auto_remove': True, 'detach': True, 'publish_all_ports': True, 'image': 'tensorflow/serving:latest', 'environment': {'MODEL_NAME': 'infra-validation-model', 'MODEL_BASE_PATH': '/model'}, 'mounts': [{'Target': '/model/infra-validation-model/1', 'Source': '/Users/home/.temp/466/infra-validation-model/1606971601', 'Type': 'bind', 'ReadOnly': True}]}
INFO:absl:Error while obtaining model status:
<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1606971601.677635000","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4166,"referenced_errors":[{"created":"@1606971601.677632000","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":398,"grpc_status":14}]}"
>
INFO:absl:Waiting for model to be loaded...
INFO:absl:Model is successfully loaded.
INFO:absl:Stopping LocalDockerRunner(image: tensorflow/serving:latest).
INFO:absl:Stopping container.
INFO:absl:Running publisher for InfraValidator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component InfraValidator is finished.

Could you please tell me why I am seeing this behaviour. The model is still blessed even getting this error.

Thanks!

The text was updated successfully, but these errors were encountered:

Arghya999 · 2020-12-03T08:53:36Z

Please check "docker service logs" output as well as "docker info" and see the status of the container.
Seems to be a memory related issue.
Restarting the docker daemon after flushing the changes might help.

dbustosp · 2020-12-03T19:49:01Z

Please check "docker service logs" output as well as "docker info" and see the status of the container.
Seems to be a memory related issue.
Restarting the docker daemon after flushing the changes might help.

I already did and everything looks normal.

chongkong · 2020-12-04T00:34:04Z

Hi, this is an INFO log (not WARNING or ERROR) which was intentionally turned on for verbosity. When InfraValidator is checking whether model is successfully loaded, it does it by periodically (1s?) polling to the model server, and the server might respond UNAVAILABLE until the model is actually loaded, and the log shows that particular response. This polling will continue until model becomes available or timeout, where the former case model is blessed, and the latter it isn't.

Source code:

tfx/tfx/components/infra_validator/model_server_clients/tensorflow_serving_client.py

Line 78 in 1c7d6c9

logging.info('Error while obtaining model status:\n%s', e)

rmothukuru · 2020-12-09T06:55:49Z

@selknamintech,
Can you please respond to @chongkong's comment above. Thanks!

dbustosp · 2020-12-10T03:52:32Z

Hi @chongkong

Sorry for the delay in my response.
It absolutely make sense. Appreciate your answer and your clarification.

That message is confusing though. It will be great if at least the word "ERROR" can be changed because it can lead to misunderstanding

Marking this help as done ✅

google-ml-butler · 2020-12-10T03:52:33Z

Are you satisfied with the resolution of your issue?
Yes
No

Context: #2914 PiperOrigin-RevId: 350056182

chongkong · 2021-01-05T01:04:26Z

Will update the INFO log message format in the following PR

Context: #2914 PiperOrigin-RevId: 350056182

Context: #2914 PiperOrigin-RevId: 350293099

dbustosp mentioned this issue Dec 3, 2020

Why Infra Validator Component is not even covered? Building-ML-Pipelines/building-machine-learning-pipelines#42

Closed

rmothukuru self-assigned this Dec 3, 2020

rmothukuru added stat:awaiting response type:support labels Dec 3, 2020

aidandunlop mentioned this issue Dec 7, 2020

Help using InfraValidator locally on OS X #2912

Closed

dbustosp closed this as completed Dec 10, 2020

copybara-service bot pushed a commit that referenced this issue Jan 5, 2021

Edit confusing INFO message (that seems like an ERROR)

b6711d0

Context: #2914 PiperOrigin-RevId: 350056182

copybara-service bot pushed a commit that referenced this issue Jan 5, 2021

Edit confusing INFO message (that seems like an ERROR)

4307480

Context: #2914 PiperOrigin-RevId: 350056182

copybara-service bot mentioned this issue Jan 5, 2021

Edit confusing INFO message (that seems like an ERROR) #3019

Merged

copybara-service bot pushed a commit that referenced this issue Jan 5, 2021

Edit confusing INFO message (that seems like an ERROR)

4b04bf4

Context: #2914 PiperOrigin-RevId: 350056182

copybara-service bot pushed a commit that referenced this issue Jan 6, 2021

Edit confusing INFO message (that seems like an ERROR)

b1fc591

Context: #2914 PiperOrigin-RevId: 350056182

copybara-service bot pushed a commit that referenced this issue Jan 6, 2021

Edit confusing INFO message (that seems like an ERROR)

71ed0a5

Context: #2914 PiperOrigin-RevId: 350293099

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with InfraValidator: RPC Error StatusCode.UNAVAILABLE #2914

Problem with InfraValidator: RPC Error StatusCode.UNAVAILABLE #2914

dbustosp commented Dec 3, 2020

Arghya999 commented Dec 3, 2020

dbustosp commented Dec 3, 2020

chongkong commented Dec 4, 2020

rmothukuru commented Dec 9, 2020

dbustosp commented Dec 10, 2020 •

edited

Loading

google-ml-butler bot commented Dec 10, 2020

chongkong commented Jan 5, 2021

Problem with InfraValidator: RPC Error StatusCode.UNAVAILABLE #2914

Problem with InfraValidator: RPC Error StatusCode.UNAVAILABLE #2914

Comments

dbustosp commented Dec 3, 2020

Arghya999 commented Dec 3, 2020

dbustosp commented Dec 3, 2020

chongkong commented Dec 4, 2020

rmothukuru commented Dec 9, 2020

dbustosp commented Dec 10, 2020 • edited Loading

google-ml-butler bot commented Dec 10, 2020

chongkong commented Jan 5, 2021

dbustosp commented Dec 10, 2020 •

edited

Loading