Support for post_init lifecycle/lifespan hook. #10810
Unanswered
msteiner-google
asked this question in
Questions
Replies: 1 comment
-
|
For those interested, the workaround I use for achieving this can be described as follow:
In code: # server creation
...
server = create_app(...)
# Start load the model asynchronously and only attach the router to the server once
# the load is completed.
e = ThreadPoolExecutor(max_workers=1)
f = e.submit(load_model, injector)
f.add_done_callback(attach_routes(injector, server))
uvicorn.run(server, host=_HOST.value, port=_AIP_HTTP_PORT.value)
e.shutdown()and # Router and endpoints creation and attach them to running server
def get_health_route() -> Any:
async def _helthz() -> Response:
return Response(status_code=status.HTTP_200_OK)
return _helthz
def get_predict_route(model: tf.keras.Model) -> Any:
async def _predict() -> Response:
return Response(status_code=status.HTTP_200_OK)
return _predict
def get_router(
healthz_path: str, predict_path: str, model: tf.keras.Model
) -> APIRouter:
router = APIRouter(on_startup=[])
router.add_api_route(
path=healthz_path, endpoint=get_health_route(), methods={"GET"}
)
router.add_api_route(
path=predict_path, endpoint=get_predict_route(model=model), methods={"POST"}
)
return router
def load_model(injector: Injector) -> tf.keras.Model:
model_uri = injector.get(types.ModelStorageURI)
...
logging.info("Loading model.")
...
model = tf.keras.models.load_model(dir)
logging.info(f"Loaded model: {model}")
return model
def attach_routes(injector: Injector, server: FastAPI) -> Any:
def _callback(future: Future[tf.keras.Model]) -> None:
healtz = injector.get(types.HealthZRoute)
predict = injector.get(types.PredictRoute)
router = get_router(
healthz_path=healtz, predict_path=predict, model=future.result()
)
server.include_router(router)
return _callbackIt works, but it feels like I am working against the framework to achieve what I want |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello all!
First of all, thank you for the work you folks are putting in.
Going back to the FR, as per the title, this discussion is about supporting for some
post_inithook. In this state the server is already running and serving requests but some resources are still being initialized and we should have a way to make this information available from within the endpoints' methods.Why would that be helpful
If we look at Google's VertexAI support for custom containers (documentation here) we see that two different checks happen. The
livenessone only check that the server is running and if it fails 5 times in a row (~50sec) the container is restarted. Thehealth check, instead, can fail as long we want and only when the models are loaded it should start return 200s so that the load balancer know that the instance is ready to receive predict requests.So, summarizing, FastAPI can't be used to serve models whose load time take longer than 50 seconds since the container will be restarted.
Therefore, I suggest to enable a state in which the server starts and so it can listen to the liveness probes, but provide a mechanism to check if a resource is fully loaded from within the endpoint methods. I am not sure this should be called
post_init, but I don't have much phantasy :).Happy to discuss this further.
Beta Was this translation helpful? Give feedback.
All reactions