refactor(agentkit): engineering-standards uplift (logging / exceptions / timeouts)#104
Merged
Merged
Conversation
…/exception/timeout standard Establishes one standard for each of three engineering dimensions on the inner-loop HTTP/credential/SDK layer, and applies it. Good building blocks already existed but were unused (get_logger had 0 callers; redact()/mask() were auth-only). Foundations (new, dependency-light leaves): - agentkit/errors.py: single stdlib-only root AgentKitError. toolkit.errors .AgentKitError and auth.errors.AuthError both reparent onto it, so a caller can `except agentkit.errors.AgentKitError` to catch any AgentKit failure while agentkit.auth stays free of any agentkit.toolkit import. - agentkit/utils/http_defaults.py: single source of truth for timeout/retry defaults + env vars (AGENTKIT_HTTP_TIMEOUT/RETRIES/STREAM_TIMEOUT), clamped. - agentkit/utils/redact.py: redact()/mask() moved here; auth/_redact.py is now a re-export shim. RedactionFilter installed on every log handler. Logging: swap logging.getLogger()/idioms for get_logger(__name__) in the scoped core modules; library stays silent-by-default. Exceptions: wrap transport errors into NetworkError at the HTTP-owning layer; typed ApiError (with optional error_code) for backend failures; always chain with `from e`; `raise e` -> bare `raise`; runtime-validation asserts -> raises. Secret-leak fixes: drop raw response/secret interpolation from identity/auth (API key), utils/request, cr.py debug log, ve_sign check_error. request() now re-raises instead of silently returning None on failure. Timeouts: base_service_client reads http_timeout()/http_retries() instead of hardcoded 30/30; cli_mount urlopen bounded via http_timeout(). Tests: +22 offline tests (hierarchy, http_defaults clamping, RedactionFilter, _signed_request/_invoke_api error paths). Full suite: only the 1 pre-existing failure on main (test_cli_add_harness env leak) remains; 0 new failures. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Targets the two highest-risk under-tested areas found by a coverage audit: the deployable app server surfaces (apps/, ~9.9% -> 76.3%) and the deploy pipeline logic (toolkit runners/executors/strategies). Overall line coverage 45.9% -> 50.9%. +258 tests, all offline (no network/docker/uvicorn). apps/ (Starlette TestClient + asyncio.run, telemetry singletons spied): - simple_app: health/readiness/liveness + /ping wiring + ping() arg guard - simple_app_handlers: InvokeHandler signature dispatch + response matrix, _convert_to_sse, AsyncTaskHandler, PingHandler - a2a_app / mcp_app: decorator guards, ping endpoint, env-detect routes/tools - agent_server: AgentKitAgentLoader + ctor guards; telemetry latency gating and error_type formatting; ASGI middleware header-exclusion + finish gating pipeline (fakes injected via the existing DI seams): - hybrid_strategy: _should_push_to_cr / _validate_cr_image_url error mapping, build/deploy config-updates assembly, invoke/status/destroy passthrough - base_executor: _classify_error / _handle_exception (pins the ErrorCode mapping introduced by the engineering-standards work), preflight handling - init_executor: render context, template/name validation, project scaffold - runners/base + ve_agentkit: pure payload/detection helpers, _needs_runtime_ update env diffing, deploy guard/dispatch, destroy/status mapping Pins (NOT fixes — this branch is tests-only) three latent source bugs to their current behavior with # NOTE comments, for a follow-up fix: - mcp_app tool()/agent_as_a_tool() wrappers: on a tool exception the finally passes func_result=result while result is unbound -> UnboundLocalError masks the real error and skips telemetry (all 4 wrapper branches). - simple_app_handlers PingHandler._format_ping_status: self.func.__name__ raises AttributeError when no ping func is registered. - simple_app_handlers AsyncTaskHandler.handle: signature omits request, violating the BaseHandler.handle(request) contract. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…st surface)
The prior batch raised the coverage number but left the product's core
orchestration ~0%. This batch tests that spine directly with proper fakes,
asserting behavior (error codes, response shapes, collaborator calls),
lifting the critical functions from ~0% to tested:
ve_pipeline.build (build->push->deploy spine) 1.9% -> 100%
ve_pipeline._upload_to_tos (bucket-ownership gate) 0% -> 70%
ve_agentkit._create_new_runtime 3% -> 91%
ve_agentkit._update_existing_runtime 1.8% -> 88%
ve_agentkit._wait_for_runtime_status 9.5% -> 100%
runners/base._http_post_invoke (invoke transport) 1% -> 91%
builders/local_docker.build 0.7% -> 86%
agent_server_app._invoke_compat (/invoke) 0% -> 89%
Overall coverage 45.9% -> 53.7%. +81 tests, all offline (no network/docker/
uvicorn; polling loops patched). No source modified.
Pins three more latent source bugs to current behavior (# NOTE, for follow-up):
- ve_pipeline._upload_to_tos: the documented ValueError for an unrendered
{{template}} bucket name is caught by the broad `except Exception` tail and
re-raised as a generic Exception, so callers can never catch ValueError.
- ve_agentkit._update_existing_runtime: client_token=generate_client_token()
is passed to UpdateRuntimeRequest, which has no such field and extra="ignore",
so the idempotency token is silently dropped (Create keeps it, Update sends none).
- builders/local_docker.build: BuildResult.build_logs is typed List[str] but
the raw str from DockerManager.build_image is assigned directly on both the
success and failure paths.
Also hardens test_ve_pipeline_upload_tos's AccountDisable assertion to be
cloud-provider-independent (console host varies volcengine vs byteplus and
provider global state leaks across the full suite).
Remaining gap: _execute_build's poll loop is still thin (mocked out to isolate
the build spine) — a follow-up target.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
fedc9c8 to
ad13eaf
Compare
…rface + core logic) (#105)
cuericlee
approved these changes
Jul 1, 2026
cuericlee
left a comment
Collaborator
There was a problem hiding this comment.
/lgtm
Overall: LGTM ✅
This is a well-structured refactoring PR. Key strengths:
Category | Grade
-- | --
assert → raise | ✅ Eliminates -O-dependent behavior across the SDK
Exception hierarchy | ✅ AgentKitError root enables uniform error handling
Redact extraction | ✅ Clean module relocation with backward-compatible shim
HTTP defaults consolidation | ✅ DRY improvement
Logging fixes | ✅ Exc_info, lazy formatting, and critical raise bugfix in request.py
Test coverage | ✅ New dedicated test files; latent-bug-pinning pattern
Backward compatibility | ✅ All public APIs preserved
One note: The raise addition in agentkit/utils/request.py (Theme 6) changes the function's contract from "may return None on error" to "always raises on error." This is the right fix, but verify that all call sites handle the exception rather than relying on a None-check pattern.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Raises the core HTTP / credential / SDK inner-loop to one consistent engineering standard across three dimensions, plus the underlying transient-retry fix.
Foundations (new leaves):
agentkit/errors.py(single stdlib-only rootAgentKitError),utils/http_defaults.py(one source of truth for timeout/retry env+defaults),utils/redact.py+ aRedactionFilteron every log handler.get_logger(__name__)now actually used in the scoped modules (previously 0 callers); library stays silent-by-default.NetworkError; typedApiError(witherror_code);from echaining;raise e→raise; runtime-validation asserts→raises.identity/auth), full response (request.py,cr.pydebug,ve_sign.check_error).print(1)/print(2)removed.request()now re-raises instead of silently returningNone.base_service_clientreadshttp_timeout()/http_retries()instead of hardcoded 30/30;cli_mounturlopenbounded.Retry-After, never retries read-timeouts — safe for non-idempotentCreate*).Tests
+22 offline unit tests (hierarchy,
http_defaultsclamping,RedactionFilter,_signed_request/_invoke_apierror paths). Backward-compatible:except AuthErrorstill catches the same set; no code didexcept AgentKitErrorbefore.Stack
Base of the stack. Followed by the test PR (
test/agentkit-apps-pipeline-ut) then the fix PR (fix/agentkit-latent-bugs).🤖 Generated with Claude Code