You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Triton Information
Running Triton 24.04-py3 with a combination of Python backend models, TF serving and ONNX runtime backend.
The triton container is pulled as-is from NGC.
To Reproduce
We are unable to reproduce/identify the root cause for some of the models not loading on restart. Different models fail to load randomly on start-up even though these models load and execute fine when the model-control-model is set to explicit.
We haven't identified any specific model pattern (for example, a type of models, or specific backends). The majority of these models are running on an ONNX runtime backend. It looks like it's completely random. Every time the server restarts, different models fail!
We are unable to consistently reproduce on our side to provide any reproducible examples.
Expected behavior
All models should load successfully on server start-up if they have been tested and validated.
Traceback
I0705 14:49:55.104296 1 python_be.cc:2383] TRITONBACKEND_ModelFinalize: delete model state
I0705 14:49:55.104363 1 model_lifecycle.cc:620] successfully unloaded 'MYMODEL' version 1
I0705 14:49:55.718565 1 server.cc:347] Timeout 26: Found 0 live models and 0 in-flight non-inference requests
I0705 14:49:55.778384 1 backend_manager.cc:138] unloading backend 'onnxruntime'
I0705 14:49:55.779157 1 backend_manager.cc:138] unloading backend 'tensorflow'
I0705 14:49:55.779181 1 backend_manager.cc:138] unloading backend 'python'
I0705 14:49:55.779189 1 python_be.cc:2340] TRITONBACKEND_Finalize: Start
I0705 14:50:01.208390 1 python_be.cc:2345] TRITONBACKEND_Finalize: End
error: creating server: Internal - failed to load all models
Failures at timestamp 0
Failures at timestamp 1
The text was updated successfully, but these errors were encountered:
Description
Triton Information
Running Triton 24.04-py3 with a combination of Python backend models, TF serving and ONNX runtime backend.
The triton container is pulled as-is from NGC.
To Reproduce
We are unable to reproduce/identify the root cause for some of the models not loading on restart. Different models fail to load randomly on start-up even though these models load and execute fine when the model-control-model is set to explicit.
We haven't identified any specific model pattern (for example, a type of models, or specific backends). The majority of these models are running on an ONNX runtime backend. It looks like it's completely random. Every time the server restarts, different models fail!
We are unable to consistently reproduce on our side to provide any reproducible examples.
Expected behavior
All models should load successfully on server start-up if they have been tested and validated.
Traceback
Failures at timestamp 0
![image](https://private-user-images.githubusercontent.com/17963295/346163069-0c627a43-28a0-44fd-b686-aed891cfee56.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE5NTA2NTksIm5iZiI6MTcyMTk1MDM1OSwicGF0aCI6Ii8xNzk2MzI5NS8zNDYxNjMwNjktMGM2MjdhNDMtMjhhMC00NGZkLWI2ODYtYWVkODkxY2ZlZTU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI1VDIzMzIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFmMWY3NDRjYTI2MTM2OWNiMjdjOWVhMjlkOGIyMzEwMzEyZmM4MTdkZjJhYzI5NzRiZDY1MzkwZDYzYjUzOGImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.mDw8TCSL_GUdRJKLonX0dTLQt9F_WIKkzpakJi9HqUA)
Failures at timestamp 1
![image](https://private-user-images.githubusercontent.com/17963295/346163400-3bc4b42d-371d-418e-8d7f-3fa215e1b50b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE5NTA2NTksIm5iZiI6MTcyMTk1MDM1OSwicGF0aCI6Ii8xNzk2MzI5NS8zNDYxNjM0MDAtM2JjNGI0MmQtMzcxZC00MThlLThkN2YtM2ZhMjE1ZTFiNTBiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI1VDIzMzIzOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ0MjU4NTM3NGI5YmRhZjlmNjU3NGE2ZGRmOGQ1MDE5ZmE2MTUyYzdmZTRhMzI0YjUwOTljNDY0YTE0YjBhODUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.AZFLGFMg7HD49c81DVMD1WFANa4TxAb1M9dyoqSfF5U)
The text was updated successfully, but these errors were encountered: