Skip to content

[FEAT] Add stateful responses layer with history rehydration and DB persistence#21

Closed
maralbahari wants to merge 16 commits into
vllm-project:mainfrom
EmbeddedLLM:responses-flow
Closed

[FEAT] Add stateful responses layer with history rehydration and DB persistence#21
maralbahari wants to merge 16 commits into
vllm-project:mainfrom
EmbeddedLLM:responses-flow

Conversation

@maralbahari
Copy link
Copy Markdown
Collaborator

@maralbahari maralbahari commented Apr 21, 2026

Summary

Implements the stateful responses layer for agentic-api: a full request orchestration pipeline that adds conversation history, protocol translation, and a three-table persistence store on top of the existing vLLM proxy gateway.

What's in this PR

  • Stateful POST /v1/responsesprevious_response_id rehydration chains turns from prior responses; conversation_id rehydration loads full multi-turn sessions from the DB
  • Three-table SQLAlchemy schemaItem, Response, and Conversation tables (SQLite default, PostgreSQL for multi-worker); SchemaManager handles DDL lifecycle
  • pydantic_ai–backed orchestrationEngine drives the full request lifecycle; Pipeline runs the pydantic_ai agent against vLLM's Responses Model per request
  • Protocol translation pipelinePydanticAINormalizer converts pydantic_ai events → internal NormalizedEvents; ResponseComposer converts them → OpenAI Responses API SSE events
  • Input translationRequestInputTranslator converts InputItem/OutputItem → pydantic_ai ModelMessages; StoreInputTranslator normalizes history items before persistence
  • Response store (--response-store-enabled, default on) — ResponseStore saves/loads response checkpoints; ConversationStore manages multi-turn session state
  • Dual-mode proxyresponse_store_enabled=false falls back to raw HTTP passthrough (the original proxy); response_store_enabled=true routes through the full managed pipeline
  • --conversation-store-enabled flag — opt-in multi-turn conversation tracking (default off)
  • VCR cassette-based E2E teststest_responses_api.py and test_conversation_api.py replay responses recorded against the OpenAI API; no GPU or real vLLM needed

Implementation Detail

Layer 3 — Core Orchestration (ADR-01 §4)

  • core/engine.py: Engine orchestrates the full request: rehydration → translation → agent run → normalization → composition → persistence
  • core/pipeline.py: Pipeline wraps the pydantic_ai agent run for a single turn, yielding NormalizedEvent stream
  • core/normalizer.py: PydanticAINormalizer: maps pydantic_ai StreamEvent subtypes → MessageStarted, MessageDelta, ReasoningDelta, FunctionCallStarted, FunctionCallDelta, FunctionCallDone, MessageDone.
  • core/composer.py: ResponseComposer: maps NormalizedEvent → OpenAI Responses API SSE frames (response.created, response.output_item.added, response.output_text.delta, response.output_item.done, response.completed, etc.)
  • core/translator.py: RequestInputTranslator: converts InputMessage/OutputMessage (including tool calls and results) → pydantic_ai ModelMessages
  • core/normalized_events.py : internal dataclass hierarchy for the normalizer↔composer contract
  • core/sse.py: SSE frame encoder

Layer 4 — Persistence / Store (ADR-02)

  • store/response.pyResponseStore: saves completed responses, rehydrates history from previous_response_id chain
  • store/conversation.pyConversationStore: saves/loads conversation state, appends new items per turn
  • store/translator.pyStoreInputTranslator: normalises raw input items from the DB before passing to the translator

Layer 5 — Database (ADR-02)

  • database/db_engine.py — async SQLAlchemy engine; PostgreSQL advisory lock helpers for future multi-worker support
  • database/schema.pySchemaManager: creates/drops all tables; called during FastAPI lifespan
  • database/session.py — async session factory; @session_transaction and @run_in_session decorators
  • database/item.py / database/response.py / database/conversation.py — ORM models for the three-table schema

Test Plan

uv run pytest

Tests cover stateful multi-turn rehydration (previous_response_id and conversation_id), streaming vs non-streaming, protocol normalizer/composer unit tests, store persistence, and the full pipeline — all in-process via VCR cassettes recorded against the OpenAI API; no GPU or real vLLM needed.

Test Results

100 passed, 1 warning in 1.43s
============================= test session starts ==============================
platform linux -- Python 3.12.11, pytest-9.0.2, pluggy-1.6.0 --
cachedir: .pytest_cache
configfile: pyproject.toml
plugins: anyio-4.13.0
collecting ... collected 100 items

tests/core/test_composer.py::test_start_emits_created_and_in_progress PASSED
tests/core/test_composer.py::test_feed_before_start_raises PASSED
tests/core/test_composer.py::test_message_started_emits_output_item_added_and_content_part_added PASSED
tests/core/test_composer.py::test_message_done_emits_text_done_content_part_done_output_item_done PASSED
tests/core/test_composer.py::test_message_output_index_and_item_id_stable PASSED
tests/core/test_composer.py::test_function_call_started_emits_output_item_added PASSED
tests/core/test_composer.py::test_function_call_done_emits_arguments_done_and_output_item_done PASSED
tests/core/test_composer.py::test_function_call_arguments_deltas_attributed_to_function_item PASSED
tests/core/test_composer.py::test_reasoning_events_emit_nothing PASSED
tests/core/test_composer.py::test_usage_final_emits_response_completed PASSED
tests/core/test_composer.py::test_completed_response_omits_reasoning_item_when_no_thinking_part PASSED
tests/core/test_composer.py::test_incomplete_response_sets_incomplete_details_for_max_output_tokens PASSED
tests/core/test_composer.py::test_make_error_events_emits_error_and_failed PASSED
tests/core/test_composer.py::test_sequence_numbers_are_monotonically_increasing PASSED
tests/core/test_normalizer.py::test_text_part_start_emits_message_started PASSED
tests/core/test_normalizer.py::test_text_part_start_with_content_emits_started_and_delta PASSED
tests/core/test_normalizer.py::test_text_part_delta_emits_message_delta PASSED
tests/core/test_normalizer.py::test_text_part_delta_empty_content_emits_nothing PASSED
tests/core/test_normalizer.py::test_text_part_end_emits_message_done PASSED
tests/core/test_normalizer.py::test_thinking_part_start_emits_reasoning_started PASSED
tests/core/test_normalizer.py::test_thinking_part_start_with_content_emits_started_and_delta PASSED
tests/core/test_normalizer.py::test_thinking_part_delta_emits_reasoning_delta PASSED
tests/core/test_normalizer.py::test_thinking_part_end_emits_reasoning_done PASSED
tests/core/test_normalizer.py::test_tool_call_part_start_emits_function_call_started PASSED
tests/core/test_normalizer.py::test_tool_call_part_start_with_args_emits_started_and_delta PASSED
tests/core/test_normalizer.py::test_tool_call_part_delta_emits_arguments_delta PASSED
tests/core/test_normalizer.py::test_tool_call_part_end_emits_function_call_done PASSED
tests/core/test_normalizer.py::test_agent_run_result_event_emits_usage_final PASSED
tests/core/test_normalizer.py::test_unknown_event_emits_nothing PASSED
tests/core/test_normalizer.py::test_delta_for_unknown_index_emits_nothing PASSED
tests/core/test_normalizer.py::test_multiple_parts_get_distinct_item_keys PASSED
tests/core/test_pipeline.py::test_pipeline_emits_completed_event_for_text_output PASSED
tests/core/test_pipeline.py::test_pipeline_response_status_completed_after_run PASSED
tests/core/test_pipeline.py::test_pipeline_output_contains_text_content PASSED
tests/core/test_pipeline.py::test_pipeline_emits_function_call_events PASSED
tests/core/test_pipeline.py::test_pipeline_handles_empty_stream PASSED
tests/core/test_pipeline.py::test_pipeline_sequence_numbers_monotonically_increasing PASSED
tests/core/test_translator.py::test_user_message_becomes_model_request_with_user_prompt_part PASSED
tests/core/test_translator.py::test_system_message_becomes_model_request_with_system_prompt_part PASSED
tests/core/test_translator.py::test_developer_message_becomes_system_prompt_part PASSED
tests/core/test_translator.py::test_assistant_input_message_becomes_model_response_with_text_part PASSED
tests/core/test_translator.py::test_input_message_with_content_list_joins_text PASSED
tests/core/test_translator.py::test_tool_result_becomes_model_request_with_tool_return_part PASSED
tests/core/test_translator.py::test_output_message_becomes_model_response_with_text_part PASSED
tests/core/test_translator.py::test_output_message_with_multiple_content_parts_joins_text PASSED
tests/core/test_translator.py::test_function_tool_call_becomes_model_response_with_tool_call_part PASSED
tests/core/test_translator.py::test_translate_preserves_order_of_mixed_items PASSED
tests/core/test_translator.py::test_empty_list_returns_empty PASSED
tests/core/test_translator.py::test_translator_is_singleton PASSED
tests/store/test_conversation_store.py::test_get_returns_none_for_missing PASSED
tests/store/test_conversation_store.py::test_get_or_create_creates_new_conversation PASSED
tests/store/test_conversation_store.py::test_get_or_create_returns_existing PASSED
tests/store/test_conversation_store.py::test_put_turn_appends_items PASSED
tests/store/test_conversation_store.py::test_put_turn_accumulates_across_turns PASSED
tests/store/test_conversation_store.py::test_put_turn_raises_for_missing_conversation PASSED
tests/store/test_conversation_store.py::test_put_turn_raises_for_duplicate_response_id PASSED
tests/store/test_conversation_store.py::test_rehydrate_raises_for_missing_conversation PASSED
tests/store/test_conversation_store.py::test_rehydrate_empty_conversation PASSED
tests/store/test_conversation_store.py::test_rehydrate_restores_items_in_order PASSED
tests/store/test_conversation_store.py::test_rehydrate_multi_turn_order PASSED
tests/store/test_response_store.py::test_get_returns_none_for_missing PASSED
tests/store/test_response_store.py::test_get_or_raise_raises_for_missing PASSED
tests/store/test_response_store.py::test_put_and_get_round_trip PASSED
tests/store/test_response_store.py::test_put_skipped_when_store_false PASSED
tests/store/test_response_store.py::test_put_skipped_when_status_not_persistable PASSED
tests/store/test_response_store.py::test_duplicate_response_id_raises_bad_input PASSED
tests/store/test_response_store.py::test_previous_response_id_stored PASSED
tests/store/test_response_store.py::test_rehydrate_restores_items_in_order PASSED
tests/store/test_response_store.py::test_rehydrate_multi_turn_accumulates_history PASSED
tests/store/test_translator.py::test_normalize_input_str_wraps_in_user_message PASSED
tests/store/test_translator.py::test_normalize_input_list_returned_unchanged PASSED
tests/store/test_translator.py::test_normalize_input_empty_list_returned_unchanged PASSED
tests/store/test_translator.py::test_resolve_tools_returns_request_tools_when_explicitly_set PASSED
tests/store/test_translator.py::test_resolve_tools_returns_stored_tools_when_not_explicitly_set PASSED
tests/store/test_translator.py::test_resolve_tools_returns_none_when_effective_is_none PASSED
tests/store/test_translator.py::test_resolve_tools_request_none_falls_back_to_stored PASSED
tests/store/test_translator.py::test_resolve_tools_stored_none_explicitly_set_returns_none PASSED
tests/store/test_translator.py::test_resolve_tool_choice_returns_request_when_explicitly_set PASSED
tests/store/test_translator.py::test_resolve_tool_choice_returns_stored_when_not_explicitly_set PASSED
tests/store/test_translator.py::test_resolve_tool_choice_function_choice_preserved PASSED
tests/store/test_translator.py::test_store_translator_is_singleton PASSED
tests/test_conversation_api.py::test_two_turn_nonstreaming PASSED
tests/test_conversation_api.py::test_two_turn_streaming PASSED
tests/test_conversation_api.py::test_conversation_isolation PASSED
tests/test_conversation_api.py::test_branch_off_turn_1 PASSED
tests/test_conversation_api.py::test_multi_branch PASSED
tests/test_proxy.py::test_proxy_responses_non_stream_passthrough PASSED
tests/test_proxy.py::test_proxy_responses_stream_passthrough PASSED
tests/test_proxy.py::test_proxy_hop_by_hop_headers_stripped PASSED
tests/test_proxy.py::test_proxy_authorization_env_key_injected PASSED
tests/test_proxy.py::test_proxy_authorization_client_header_takes_precedence PASSED
tests/test_proxy.py::test_proxy_upstream_http_error_passthrough PASSED
tests/test_proxy.py::test_proxy_stream_mid_stream_failure_closes_without_synthetic_events PASSED
tests/test_proxy.py::test_proxy_connect_error_maps_to_502 PASSED
tests/test_proxy.py::test_proxy_timeout_maps_to_504 PASSED
tests/test_responses_api.py::test_single_turn_nonstreaming PASSED
tests/test_responses_api.py::test_single_turn_streaming PASSED
tests/test_responses_api.py::test_two_turn_nonstreaming_previous_response_id PASSED
tests/test_responses_api.py::test_two_turn_streaming_previous_response_id PASSED
tests/test_responses_api.py::test_store_disabled_not_reusable PASSED

=============================== warnings summary ===============================
.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1428
 /agentic-api/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1428: PytestConfigWarning: Unknown config option: asyncio_mode

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================== 100 passed, 1 warning in 1.43s ========================

Signed-off-by: maral <maralbahari.98@gmail.com>
Co-authored-by: Tan Jia Huei tanjiahuei@gmail.com
Co-authored-by: noobHappylife aratar1991@hotmail.com
Co-authored-by: Claude
Signed-off-by: maralbahari maralbahari.98@gmail.com
Signed-off-by: maral <maralbahari.98@gmail.com>
Co-authored-by: Claude
Signed-off-by: maralbahari maralbahari.98@gmail.com
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Co-authored-by: Claude
Signed-off-by: maral <maralbahari.98@gmail.com>
Copy link
Copy Markdown
Collaborator

@noobHappylife noobHappylife left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the test cassettes, should we also use a model with reasoning too?

Comment thread src/agentic_api/config/runtime.py Outdated
default="sqlite+aiosqlite:///./agentic_api.db",
description="SQLAlchemy async database URL.",
)
db_dialect: str = Field(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? we should be able to know it's sqlite or postgres from the db_url already?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in the case that user created their Postgres database hosted and passed the url.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I meant the db_dialect can be derived from db_url directly so we don't need to set it manually?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh but need to make sure they pass with posgres keyword along as well cause a url might contain host and port. So i guess for the url field need to add a verify check to pass the schema postgresql://

elif isinstance(event, MessageDone):
yield from self._message_done(event)
elif isinstance(event, ReasoningStarted):
pass # Reasoning items are not emitted as output events in this implementation
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this means? not emitted as output events

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for now. This PR doesnt focus on that. in another PR the reasoning events would be added.

effective_tool_choice=hydrated_body.tool_choice,
effective_instructions=hydrated_body.instructions,
)
await self._conversation_store.put_turn( # type: ignore[union-attr]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a response failed halfway, does the emitted events still write into DB? and the full history list will be "hanging" with a failed event?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is all handled by the conversation CRUD ConversationStore if something goes wrong there is error being displayed and nothing would be stored. since these are async function and the CURD transaction session would roll back and the nothing would be stored.

Copy link
Copy Markdown
Collaborator

@noobHappylife noobHappylife Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, say the upstream got an error. We should treat it as an error event right https://www.openresponses.org/specification#errors? So from agentic-api PoV there shouldn't be "error", so user just see error event, and response.failed.

In this case, is the response stored?

Copy link
Copy Markdown
Collaborator Author

@maralbahari maralbahari Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something goes wrong on upstream that would be an event error. The _persist function is only called to store the responses on successful events therefore there the data is stored.

from sqlalchemy.orm import DeclarativeBase


class Base(DeclarativeBase):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define the fields that exists in all tables here? Also should we consider using sqlmodel?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the base table with no fields other than default builtin. each Response, Item and Conversation tables would write their own field/column. The sqlmodel is built on top of declarative sqlalchemy and it actually slower in query since it uses pydantic it add overhead and we dont want that overhead in CRUD usually. since we use dataclasses and querying the tables directly it is much faster. than sqlmodel and pydantic.

proxy_client_manager: ProxyClientManager = (
request.app.state.proxy_client_manager
)
return await proxy_responses(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also use the engine even if disabled response_store? so the behavior is consistent (since it goes through the same compose/normalize steps.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can use the engine too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noobHappylife added a TODO for this to handle properly in another PR.

Signed-off-by: maral <maralbahari.98@gmail.com>
franciscojavierarceo

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough work here — the architecture is clean and the test coverage is great. Left inline comments on the items worth addressing, grouped roughly by severity.

conversation_id=row.id,
history_item_ids=[],
created_at=row.created_at,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: TOCTOU race in get_or_create

get and create_conversation run in separate sessions. Under concurrent requests with the same conversation_id, two coroutines can both observe None and both attempt to create — causing an unhandled IntegrityError (500).

A proven pattern for this is to use an atomic upsert (INSERT ... ON CONFLICT DO NOTHING + RETURNING, or catch IntegrityError and retry with a get). Other implementations of the Responses API use dialect-specific insert().on_conflict_do_nothing() for exactly this reason — it makes the create-if-not-exists operation atomic at the DB level without needing application-level locking.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo implemented dialect-specific insert().on_conflict_do_nothing().

metadata=metadata_,
)
except IntegrityError as e:
raise BadInputError(f"Response id already exists: {response_id}") from e
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: Lost-update race in put_turn

put_turn reads history_item_ids via self.get() (session 1), appends new IDs in Python, then writes the full list via _persist_conversation_turn (session 2). Two concurrent turns for the same conversation will both read the same history_item_ids and each will append their own items — the second write silently overwrites the first's items.

Two approaches that work well here:

  1. Single session with SELECT ... FOR UPDATE — serialize concurrent writes to the same conversation at the row level.
  2. Append-only item table — instead of maintaining a mutable history_item_ids list on the Conversation row, store each item with a conversation_id FK and an ordering column (e.g., created_at + sequence). Rehydration becomes a simple SELECT ... WHERE conversation_id = ? ORDER BY seq. This eliminates the read-modify-write cycle entirely and is the pattern other Responses API implementations use successfully.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo I have applied your second suggestion. where we retrieve history from item tables for conversation api.

Comment thread src/agentic_api/store/conversation.py Outdated
ItemPayload.model_validate(items_by_id[item_id].data).item
for item_id in stored.history_item_ids
if item_id in items_by_id
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Silent data loss during rehydration

if item_id in items_by_id

Missing Item rows (due to data corruption, partial cleanup, or eventual consistency) are silently skipped. The conversation history will be truncated without any error or log warning, which could lead to incorrect LLM context and confusing downstream behavior.

At minimum, log a warning for each missing item. Ideally, raise if len(items_by_id) != len(stored.history_item_ids) — a count mismatch indicates data integrity issues that shouldn't be silently swallowed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo this wouldn't be necessary anymore here. after changing to second suggestion above.

"""
global _session_factory
_session_factory = async_sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: configure_session_factory is not idempotent

This unconditionally replaces the module-level _session_factory global every time it's called. Both ResponseStore.__init__ and ConversationStore.__init__ call it. The comment on the ConversationStore side claims idempotency, but the implementation doesn't enforce it.

Suggestion — add a guard:

if _session_factory is not None:
    return

Or do an identity check on the engine to catch misuse early.

Comment thread src/agentic_api/core/sse.py Outdated


DONE_MARKER = "data: [DONE]\n\n"
TERMINAL_EVENT_TYPES = {"response.completed", "response.failed"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: response.incomplete missing from TERMINAL_EVENT_TYPES

The composer can emit "response.incomplete" (see composer.py around line 277), and the engine treats it as terminal. But this set only contains completed and failed, so the data: [DONE] marker won't be emitted immediately after an incomplete event.

TERMINAL_EVENT_TYPES = {"response.completed", "response.failed", "response.incomplete"}

tool_choice: ToolChoice = Field(default_factory=AutoToolChoice)
stream: bool = False
response_store_enabled: bool = True
conversation_store_enabled: bool = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: conversation_store_enabled not gated server-side

This is a per-request field — any client can enable conversation tracking by setting it in their request body. There's no server-level config to disallow it.

Consider adding a RuntimeConfig.conversation_store_enabled flag that gates whether the per-request field is honored. When the server flag is off, ignore the client field (or return 400). This gives operators control over whether the feature is available.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo we're planning to split conversation handling into its own dedicated router in another PR. Once that separation is in place, the per-request flag will be removed and server-side control will live naturally in the conversation router's config. added a TODO for this

Comment thread src/agentic_api/database/db_engine.py Outdated
_cached_text_clause("SELECT pg_advisory_unlock(:k)"), {"k": key}
)
except Exception:
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Advisory lock unlock failure silently swallowed

except Exception:
    return

If the unlock fails (e.g., connection dropped), the PostgreSQL advisory lock leaks with no log output. At minimum, log a warning here so operators can detect lock leaks.

return ModelRequest(
parts=[
ToolReturnPart(
tool_name="", content=item.output, tool_call_id=item.call_id
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Empty tool_name passed to ToolReturnPart

ToolReturnPart(tool_name="", content=item.output, tool_call_id=item.call_id)

This is an inherent limitation of the OpenAI Responses API contract (function_call_output doesn't include the tool name). Worth adding a short comment explaining why, so future readers don't try to "fix" it. If the history contains a prior FunctionToolCall item with the matching call_id, the tool name could be looked up from there.

ItemPayload.model_validate(items_by_id[item_id].data).item
for item_id in stored.history_item_ids
if item_id in items_by_id
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: Silent data loss during rehydration (same issue as conversation store)

if item_id in items_by_id

Same concern as ConversationStore.rehydrate — missing items are silently skipped. Should log a warning or raise on count mismatch.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo fixed added a warning log.

effective_tool_choice=hydrated_body.tool_choice,
effective_instructions=hydrated_body.instructions,
)
await self._conversation_store.put_turn( # type: ignore[union-attr]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Consider upsert-based streaming persistence

Currently, the response is persisted only on completion via _persist. If the process crashes mid-stream, the response is lost entirely.

A pattern that works well for streaming is checkpoint-based upsert persistence: INSERT the response with in_progress status when streaming begins, then UPDATE (upsert) at key events (output_item.done, response.completed). This lets clients poll GET /v1/responses/{id} to see partial progress, and ensures incomplete responses are at least partially recoverable after crashes.

Not necessarily required for this PR, but worth considering for a follow-up — especially if background/async response execution is planned.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo added a #TODO for this.

Copy link
Copy Markdown
Collaborator

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two additional follow-up comments on streaming resilience and future-proofing.

Comment thread src/agentic_api/core/engine.py Outdated
async for event in self._iter_events(run_settings, pipeline, stream=True):
if event.type in {"response.completed", "response.incomplete"}:
await self._persist(
hydrated_body=hydrated_body,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Persistence failure kills the SSE stream

await self._persist(...) is in the hot path of the streaming response. If the DB write throws (connection drop, constraint violation, etc.), the exception propagates and kills the SSE stream — the client gets an abrupt disconnect instead of the response they were already receiving.

A more resilient pattern is best-effort persistence: wrap _persist in a try/except, log the failure as a warning, and let the stream complete. The client still gets their full response even if the DB hiccups. The response just won't be rehydratable for future turns.

try:
    await self._persist(...)
except Exception:
    logger.warning("Failed to persist response %s", pipeline.composer.response.id, exc_info=True)

request_tool_choice=self._body.tool_choice,
stored_tool_choice=stored.metadata.effective_tool_choice,
tool_choice_explicitly_set="tool_choice" in fields_set,
),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low (future-proofing): No status validation on previous_response_id

_rehydrate fetches the stored response via get_or_raise but doesn't check whether the referenced response has a terminal status (completed, incomplete, failed). Today this is fine because responses are only persisted on completion. But once streaming persistence lands (where responses are INSERT'd as in_progress), a client could reference an in-progress response and get partial/inconsistent history.

Worth adding a status check here when that work arrives:

if stored.status not in {"completed", "incomplete", "failed"}:
    raise BadInputError(f"Cannot chain from response with status '{stored.status}'")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo added a a #TODO for this

Signed-off-by: maral <maralbahari.98@gmail.com>
…t_turn

Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Copy link
Copy Markdown
Collaborator

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given our recent direction to move to rust, i believe this should be closed.

@maralbahari
Copy link
Copy Markdown
Collaborator Author

given our recent direction to move to rust, i believe this should be closed.

@leseb Sure. will open PR converting this PR from python to rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants