feat(shared/a2a): graph entry partial state sanity (#73) by hagyutae · Pull Request #74 · vonkernel/dev-team

hagyutae · 2026-05-08T07:00:15Z

Summary

LangGraph 그래프가 응답을 끝내지 못한 상태에서 (cancel / 외부 proxy timeout 등) 다음 turn 시작 시 messages tail 의 dangling 잔재를 marker 로 정리. shared/a2a 의 핸들러 layer 에 적용 — 모든 에이전트가 즉시 보호받음.

3 dangling 패턴

#39 검증 thread 의 messages 채널 디코드에서 발견:

패턴	위험	Marker
A user-user adjacency	LLM 이 첫 user 의도 합치거나 무시 — felt "꼬임"	`SystemMessage("[직전 turn 응답 중단됨, 함께 종합해 답하라]")`
B trailing ToolMessage	닫는 AIMessage 부재 → LLM context 형태 깨짐	`AIMessage("[직전 turn 응답이 도구 결과 후 중단됨]")`
C trailing AIMessage(tool_calls) without matching ToolMessage	Anthropic API 의 tool_use ↔ tool_result hard 요구사항 — 위반 시 LLM 호출 자체 실패	unanswered tool_call 별 `ToolMessage("[tool call interrupted before completion]", tool_call_id=...)`

C 가 가장 critical (API 차원 hard-fail), A/B 는 LLM tolerable 하지만 의미 명확화.

설계 — tail-only / append-only / handler layer

shared/src/dev_team_shared/a2a/server/graph_handlers/sanity.py 신설:

detect_tail_markers(msgs) -> list[AnyMessage] — 순수 함수, tail 검사. dangling 패턴별 marker 반환 (또는 []).
apply_tail_sanity(graph, config) -> int — graph state 의 latest checkpoint 에 marker append. 추가 개수 반환 (0 = 깨끗).

Tail-only: 매 turn 시작 시 호출되므로 mid-history 까지 dangling 이 누적될 수 없음. middle 검사 / 복잡한 RemoveMessage 불필요.

Append-only: add_messages reducer 의 자연 동작과 충돌 없음. marker 가 새 user 입력 앞에 위치 → LLM context 정상화.

Handler layer: graph 노드가 아니라 핸들러에서 graph 호출 직전 적용 — 모든 에이전트 (Primary / Librarian / 향후 Architect / Engineer / QA) 가 무료로 받음. graph 코드 수정 0.

통합 지점

send_streaming.py — stream_artifact_events 호출 직전
send_message.py — graph.ainvoke 직전

검증

단위 + 통합 (14 테스트)

shared/tests/test_a2a_graph_sanity.py:

detect_tail_markers (8) — 3 패턴 + 정상 tail / 빈 history / 단일 user / clean AI
apply_tail_sanity InMemorySaver 통합 (6) — no-state / clean state / 각 패턴 / idempotent

회귀 (기존 테스트)

shared 69 통과 (sanity 14 + 기존 55)
primary 23 통과
librarian 20 통과

실 stack 부팅

docker compose --profile agents up -d primary librarian — 정상 부팅
Primary lifespan: primary tools wired: doc_store=on, issue_tracker=on, wiki=on, librarian=on, total=23
양쪽 /healthz 200

비-스코프

Mid-history dangling 정리 — tail-only 정책으로 sufficient (turn 마다 sanity 가 호출돼 누적될 수 없음). 만약 본 PR land 전 누적된 mid-history 잔재가 있다면 다음 turn 시작 시 tail 부분만 정리됨 — 이전 잔재는 LLM 이 persona 가이드 기반 처리.
LangGraph pre_model_hook (매 LLM 호출 직전 sanity) — handler layer 적용으로 충분. ReAct 루프 안 dangling 이 발견되면 추후 추가 가능.

Closes #73

🤖 Generated with Claude Code

Cancel / 외부 proxy timeout 등으로 LangGraph 가 응답을 끝내지 못한 상태에서 다음 turn 시작 시 messages tail 의 dangling 잔재를 marker 로 정리. 3 패턴 감지 + tail-only append-only: - Pattern A — user-user adjacency: assistant 응답 누락된 채 user 가 연속. SystemMessage marker 추가 ("이전 turn 응답 중단됨, 함께 종합해 답하라"). - Pattern B — trailing ToolMessage: tool 결과 후 닫는 AIMessage 부재. AIMessage marker 추가 (LLM context 정상화). - Pattern C — trailing AIMessage(tool_calls) without matching ToolMessage: Anthropic API 의 tool_use ↔ tool_result 형식 검사 hard-fail. unanswered tool_call 별 placeholder ToolMessage 추가 (critical fix). 설계: - shared/a2a 의 새 모듈 sanity.py — `detect_tail_markers` (순수 함수) + `apply_tail_sanity(graph, config)` (graph state mutation). - 핸들러 (send_message / send_streaming) 가 graph 호출 직전에 호출. - tail-only 검사 — 매 turn 시작 시 호출되므로 mid-history 까지 누적될 수 없음. 매번 호출돼도 새 dangling 만 정리. 테스트: - 14 단위/통합 — `detect_tail_markers` 의 3 패턴 + 정상 tail / 빈 history, `apply_tail_sanity` 의 InMemorySaver 통합 (각 패턴 + idempotent). - 기존 shared 69 / primary 23 / librarian 20 통과 (회귀 없음). - 실 docker compose 부팅 확인 — Primary / Librarian 정상. Closes #73 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(m3): UG↔P/A chat tier 분리 + schema 재정의 (#75) PR #68 (#39) 검증 중 발견된 awkwardness 의 root cause 가 A2A 와 chat 의 본질 mismatch 임이 확인되어 (#75) 두 tier 명시 분리: - Chat tier (UG↔P/A): REST POST + 영속 SSE per session, 자체 chat protocol - A2A tier (에이전트 간): 기존 A2A 스펙 그대로, 위임 / 협상 의도 핵심 변경: - 신규 docs/proposal/architecture-chat-protocol.md — chat protocol spec - knowledge-model.md — Episodic Layer schema 8 테이블로 재정의 (sessions / chats / assignments / a2a_contexts / a2a_messages / a2a_tasks / a2a_task_status_updates / a2a_task_artifacts). Atlas 의 Task 노드 → Assignment 로 rename. task_id 참조 → assignment_id - shared/a2a/messaging.md — A2A 가 inter-agent 한정임 명시, Message ↔ Task 관계 (응답 alternative + Task.history 누적) 보강, 자동 Task wrap 이 스펙 요구 아님 명시 - architecture-event-pipeline.md — Chronicler event types 를 3 layer (chat / assignment / A2A) 로 재정의 - architecture-user-gateway.md — chat protocol routing 역할 명시 - architecture-agent-internals.md — chat endpoint (P/A 만) + A2A endpoint 분리 표시 - agents-roles.md — chat / A2A 별 상호작용 분리, P/A 가 chat 후 A2A 위임 - proposal-main.md — 두 tier mermaid 표시, 어휘 / 컬렉션 표 갱신 - 루트 CLAUDE.md — 통신 프로토콜 우선순위 표에 chat protocol row 추가 Migration: 기존 데이터 폐기, 새 schema 로 cut-over (#75 §migration). 본 PR 은 docs only — 코드 / migration 은 후속 implementation PR. 승인 후 진행 순서: 1. 본 docs PR 머지 2. 본 재설계 implementation (schema migration + UG protocol + handler split + Chronicler split) 3. #73 / PR #74 (graph entry sanity) 재개 4. #69 ~ #72 의 body 재작성 + 각 implementation Refs #75 (umbrella issue) 일시 정지: feature/73 / PR #74 (graph entry sanity) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(m3): "별 ..." → "별도 ..." 표현 정정 (#75) `별` 이 단독으로 "별도" 의 줄임으로 쓰인 곳 4 군데 수정. suffix 형태 (`역할별`, `단계별` 등) 는 정상이라 그대로 둠. - architecture-chat-protocol.md: "별 프로토콜", "별 이벤트" → "별도 ..." - knowledge-model.md: "별 개념 / 별 영속 컬렉션" → "별도 ..." - proposal-main.md: "별 컨테이너" → "별도 컨테이너" Refs #75 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: "별 ..." 패턴 추가 정정 (architecture-external-research / CLAUDE.md) #75 의 chat-protocol.md / knowledge-model.md / proposal-main.md 정정과 같은 패턴이 본 PR scope 외 파일에도 있어 일관 적용: - architecture-external-research.md: "별 작업", "별 이슈", "별 docs" → "별도 ..." - CLAUDE.md (SOLID I): "사용자 별 좁은" → "사용자별 좁은" (suffix 띄어쓰기) Refs #75 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(m3): chat protocol 분리 사유 framing 정정 (#75) 이전 framing ("A2A 어휘를 chat 에 욱여넣은 게 source / mismatch") 정정. A2A 스펙은 trivial 응답을 Message 로 두는 것을 허용 (공식 가이드: "Messages for Trivial Interactions, Tasks for Stateful Interactions") — A2A 가 chat 에 부적합한 게 아님. 직접 원인은 우리 구현이 무조건 Task 로 자동 wrap 한 것. Chat protocol 을 별도로 둔 이유는 다음 두 가지를 함께 짚는 framing 으로 정리: 1. 자동 Task wrap fix 가 필요 (직접 원인) 2. 사용자 ↔ 에이전트는 에이전트 ↔ 에이전트와 다른 영역이라 자체 어휘로 별도 정의하는 게 자연스러움 (의미상 깔끔) 수정 위치: - architecture-chat-protocol.md §1 — 본 reframing 의 가장 자세한 설명 - architecture-user-gateway.md — 한 줄 framing - messaging.md (shared/a2a) — 경고 박스 framing - proposal-main.md — 두 군데 framing - 루트 CLAUDE.md — 통신 프로토콜 우선순위 표 + 결정 가이드 - #75 issue body 도 같은 framing 으로 정정 (별도 update) Refs #75 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(m3): chat protocol 흐름을 Mermaid sequence 로 변경 (#75) CLAUDE.md "ASCII 도식 금지, Mermaid 사용" 규칙 위반. architecture-chat-protocol.md 의 §3 통신 흐름을 plain text 에서 mermaid sequenceDiagram 으로 전환 — UG / agent / FE 의 SSE 영속 + POST 흐름이 이제 명시적으로 표현됨. Refs #75 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(m3): "사이드바" UI 결정 제거 — chat list 만 명시 (#75) protocol spec 에 UI 구현 결정 (사이드바) 을 박은 게 잘못. UI 형태는 FE 구현 영역이라 본 spec 은 데이터 흐름 / localStorage 캐시 구조만 정의해야. 수정: - architecture-chat-protocol.md: "사이드바 chat list" → "chat list", UI 구현 영역은 §4 끝에 명시 노트로 분리 - architecture-user-gateway.md: 동일 패턴 정정 Refs #75 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(m3): 메시지 큐 책임을 Primary / Architect 양쪽으로 (#75) §5 의 framing 이 Primary 만으로 한정되어 있었음. Architect 도 M4+ 부터 사용자 와 직접 chat 가능 (architecture-chat-protocol §1 / agents-roles.md §3.2) 이라 동일 큐 정책 적용 대상 — "Primary / Architect" 양쪽으로 명시. 수정: - §5 제목 / 본문 / persona 가이드 framing 을 P / A 공통으로 - §9 관련 항목 framing 정정 Issue #72 의 title / body 도 같은 framing 으로 정정 (별도 update — A2A handler → chat handler, Primary → P/A). Refs #75 * docs(m3): Assignment / A2A Task 의 임의 시간 길이 표현 제거 (#75) "며칠~몇 주", "짧음" 같은 임의 추정 시간 표현 삭제. Assignment 가 1 개 이상의 A2A Task 로 구성된다는 구조 관계만 명시. 수정: - knowledge-model.md (자주 헷갈리는 점 박스) - architecture-event-pipeline.md (자주 헷갈리는 점) - messaging.md (Message ↔ Task 관계 섹션) Refs #75 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hagyutae · 2026-05-11T13:00:21Z

#75 재설계 (PR 1~4 머지) 로 환경이 크게 달라져 본 PR 의 변경 base 가 stale. 최신 main 형상에서 #73 의 sanity 노드를 다시 작성 예정. 본 PR 의 의도는 #73 body (갱신됨) 에 반영.

This was referenced May 8, 2026

M3 재설계 — UG↔P/A chat tier 분리 + schema 재정의 + 어휘 정렬 #75

Closed

docs(m3): UG↔P/A chat tier 분리 + schema 재정의 (#75) #76

Merged

hagyutae mentioned this pull request May 8, 2026

feat(doc-store): #75 PR 1 — schema 재정의 (8 collections + assignment_id rename) #77

Merged

hagyutae closed this May 11, 2026

hagyutae deleted the feat/73-graph-entry-sanity branch May 11, 2026 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(shared/a2a): graph entry partial state sanity (#73)#74

feat(shared/a2a): graph entry partial state sanity (#73)#74
hagyutae wants to merge 1 commit into
mainfrom
feat/73-graph-entry-sanity

hagyutae commented May 8, 2026

Uh oh!

hagyutae commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hagyutae commented May 8, 2026

Summary

3 dangling 패턴

설계 — tail-only / append-only / handler layer

통합 지점

검증

단위 + 통합 (14 테스트)

회귀 (기존 테스트)

실 stack 부팅

비-스코프

Uh oh!

hagyutae commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant