fix: defer service registration until app is serving requests#12
Merged
fix: defer service registration until app is serving requests#12
Conversation
Registration previously happened during FastAPI lifespan startup, before uvicorn began accepting connections. When the orchestrator called back to fetch configs, the service could not respond. Now registration runs in a background task that polls the local /health endpoint first, ensuring the app is fully ready before announcing itself to the orchestrator.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- fail_on_error=True registers inline (blocking startup) to preserve existing fail-fast semantics - fail_on_error=False (default) defers registration to background task - readiness check uses the actual health endpoint path from the builder instead of hardcoded /health; falls back to TCP connect check when no health endpoint is configured - registration info stored on app.state instead of module global, preventing stale state across sequential app instances - bump version to 0.10.0
Do not register with the orchestrator if the app never became reachable. Previously the timeout was logged as a warning but registration proceeded anyway, defeating the purpose of the readiness gate.
- fail_on_error=True no longer registers inline during lifespan startup; registration always waits for the app to be serving. The background task is awaited during shutdown so exceptions still propagate. - Shield _register_and_start_keepalive from task cancellation so that a successful POST always writes app.state.registration_info, ensuring shutdown can deregister properly instead of leaking the registration.
Cover readiness check (health path, TCP fallback, timeout), registration abort on readiness failure, app.state storage, custom health prefix, and cancellation-safe registration via asyncio.shield.
Add tests for _resolve_port (env, default, invalid), _register_and_start_keepalive (success, keepalive, failure), and lifespan integration (deferred registration + deregistration on shutdown, skip registration when readiness fails).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fail_on_error=TrueandFalse); the background task is awaited during shutdown so exceptions still propagate/health); falls back to TCP connect check when no health endpoint is configuredapp.state.registration_infofor proper deregistrationapp.stateinstead of module globals, safe for sequential app instancesContext
with_registration()previously calledregister_service()inside FastAPI's lifespan startup phase (beforeyield), which fires before uvicorn mounts routes. When chap-core received the registration and immediately called back to/api/v1/configs, the service wasn't ready yet -- the callback failed and the configured model row was not created.Workaround in chap-core: retry config fetch up to 3x with 2s gaps (PR #278). This PR is the proper fix on the servicekit side.
Test plan
make lintpassesmake testpasses (391 passed, 21 skipped)_wait_until_ready: health 200, TCP fallback, timeout, custom path_register_after_ready: skip when not ready, app.state storage, cancellation safety_resolve_port: from options, from env, default, invalid env_register_and_start_keepalive: success, with keepalive, failureexamples/registration/against a mock orchestrator