I have noticed #385, and have read https://ai.googleblog.com/2017/11/latest-innovations-in-tensorflow-serving.html.
Now, even if loading new models in a isolated thread pool, but the main operation process is RestoreOpsV2, which is run in the thread pool of session run, This means that the load operation is not completely run in a separate thread pool. So, the serving query may be blocked by load operation.
In our environment, the model file are stored on HDFS, when loading new model, the latency from a few milliseconds to thousands of milliseconds, I confirm this does not include model warm up time.
I have noticed #385, and have read https://ai.googleblog.com/2017/11/latest-innovations-in-tensorflow-serving.html.
Now, even if loading new models in a isolated thread pool, but the main operation process is RestoreOpsV2, which is run in the thread pool of session run, This means that the load operation is not completely run in a separate thread pool. So, the serving query may be blocked by load operation.
In our environment, the model file are stored on HDFS, when loading new model, the latency from a few milliseconds to thousands of milliseconds, I confirm this does not include model warm up time.