investigate(vcon): log caller stack + coerce str arg in Vcon.__init__#186
Merged
Conversation
A production conserver has been emitting a steady stream of
`conserver.link.count{link_name=tag, outcome=error}` with this exception:
TypeError: string indices must be integers, not 'str'
File "conserver/links/tag/__init__.py", in run
vCon.add_tag(tag_name=tag, tag_value=tag)
File "common/vcon.py", in add_tag
tags_attachment = self.find_attachment_by_purpose("tags")
File "common/vcon.py", in find_attachment_by_purpose
for a in self.vcon_dict["attachments"]
TypeError: string indices must be integers, not 'str'
`self.vcon_dict` is a `str` at failure time, yet every sampled vCon in
Redis is `JSON.TYPE = object` and a local repro with the same dict
shape succeeds. The deployed code is current with #182 and #184.
So a caller is reaching `Vcon.__init__` with a JSON-encoded string
instead of a parsed dict; the static call graph doesn't show one.
`Vcon.__init__` does `json.loads(json.dumps(vcon_dict))`, which
silently round-trips a `str` — leaving `self.vcon_dict` as a `str` so
the crash only surfaces on the first dict-style access (typically
`find_attachment_by_purpose` from the tag link).
This change:
- Detects `isinstance(vcon_dict, str)` at the boundary and logs an
ERROR with `stack_info=True` so the originating call site is visible
in the next failure that lands. The message echoes a head of the
payload so we can see what flavor of bad input it is.
- Defensively `json.loads`'s the string when it's valid JSON, so the
link doesn't crash and the vCon doesn't get parked in the DLQ for
the duration of the investigation. Non-JSON input falls back to `{}`
rather than poisoning downstream access.
The defensive coercion is intended to stay only as long as it takes to
find and fix the upstream caller. Two regression tests cover both
branches and assert that the caller stack is captured.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9cdd299 to
daa4c4b
Compare
CI failed the two new __init__ tests:
tests/core/test_vcon.py::test_init_coerces_json_string_arg_and_logs_caller FAILED
tests/core/test_vcon.py::test_init_bails_to_empty_dict_for_non_json_string FAILED
Captured stdout showed the ERROR record being emitted by the project's
JSON stdout handler, but caplog.records was empty so the assertion
`any("received a str" in rec.message for rec in caplog.records)` was
False.
Root cause: `common/logging.conf` (the default when LOGGING_CONFIG_FILE
is unset, loaded transitively whenever any module imports
lib.logging_utils) sets `[logger_vcon] propagate = 0`. With propagation
off, records from the "vcon" logger flow only to that logger's own
handlers and never reach the root logger that pytest's caplog attaches
to — so caplog.records stays empty even though the record is real.
A local pytest run that imports only `from vcon import Vcon` never
loads `lib.logging_utils`, so the config never applies, propagate
stays True (default), and caplog sees the record — which is why the
tests passed locally but failed in CI.
Fix: a small fixture flips `propagate = True` (and `disabled = False`)
on the vcon logger for the duration of each test and restores them
after. The two __init__ tests now use the fixture and pass under both
local and CI configurations (verified by pre-importing
lib.logging_utils to force the CI shape).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
A production conserver has been emitting a continuous stream of
conserver.link.count{link_name=tag, outcome=error}with the same exception:self.vcon_dictis astrat failure time. But:JSON.TYPE) isobject, with.attachmentsanarray.So a caller is reaching
Vcon.__init__with a JSON-encoded string instead of the parsed dict. The current__init__(self.vcon_dict = json.loads(json.dumps(vcon_dict))) silently round-trips astr, leavingself.vcon_dictas astr. The crash only surfaces on the first dict-style access — typicallyfind_attachment_by_purposefrom the tag link.Static analysis didn't find the bad caller. This PR adds the instrumentation that will.
What this change does
isinstance(vcon_dict, str)at theVcon.__init__boundary and log anERRORwithstack_info=True. The next failure to land will show the originating call site.{}rather than poisoning downstream access.The defensive coercion is meant to live only as long as it takes to identify the caller and fix it at source.
Test plan
pytest tests/core/test_vcon.py— 36/36 pass, including two new regression tests:test_init_coerces_json_string_arg_and_logs_callertest_init_bails_to_empty_dict_for_non_json_stringVcon.__init__ received a strlines + their stack — should identify the bad caller within minutes.conserver.link.count{link_name=tag, outcome=error}rate drops to ~0 after deploy (defensive coercion takes over).🤖 Generated with Claude Code