Skip to content

added a nice chat interface component to play with#3

Merged
Infrared1029 merged 1 commit into
mainfrom
start_chat
May 5, 2025
Merged

added a nice chat interface component to play with#3
Infrared1029 merged 1 commit into
mainfrom
start_chat

Conversation

@Infrared1029
Copy link
Copy Markdown
Contributor

No description provided.

@Infrared1029 Infrared1029 merged commit cd23824 into main May 5, 2025
@djl11 djl11 deleted the start_chat branch December 21, 2025 12:51
djl11 added a commit that referenced this pull request May 28, 2026
…k failures

Four E2E spending tests have been failing in CI:
  - test_assistant_limit_check (DID NOT RAISE SpendingLimitExceededError)
  - test_inflight_cancellation_on_limit_exceeded (timing wrong)
  - test_limit_check_callback_allows_under_limit (allowed=False, cap=0.0, spend=$10)
  - test_limit_exceeded_blocks_llm_call (DID NOT RAISE)

All four share a single root cause: state leaking through a
SHARED "SpendingTest Assistant" record reused across every test
in the file. The old e2e_config fixture did a "find-by-name then
reuse, else create" lookup. Every test in TestE2ESpendingLimits
got the same agent_id, so:

  1. Cumulative spend (current_spend) is NEVER reset by Orchestra
     once an LLM call lands on it. Once any test makes a real LLM
     call, the assistant carries that spend for the rest of the
     session. test_limit_check_callback_allows_under_limit fails
     when it sees current_spend=$10 from earlier tests, even
     though it asserts the assistant "starts fresh".

  2. The PATCH-based cap restore in test bodies
     (test_limit_exceeded_blocks_llm_call etc.) reads the
     *current* cap then restores it. If a previous test leaked
     cap=0, that becomes the "original" for the next test,
     making the leak permanent.

  3. The fixture-level cap=None reset is best-effort with
     bare-except and silently fails on any Orchestra hiccup,
     leaving the cap unreset.

The previous "await the reset PATCH" fix (c583ab2) addressed
fragility #3 but couldn't address #1 (spend accumulation) or #2
(test-body restore racing the reset).

Fix: each test gets its OWN freshly-created assistant with a
unique surname (test-node-slug + 8-char UUID). The fixture:
  - Always POSTs a new assistant at setup (no find-by-name reuse)
  - Raises loudly on create failure (was: silently leaving
    test_agent_id=None then propagating to SESSION_DETAILS)
  - DELETEs the assistant at teardown via /assistant/{id}

No state survives between tests:
  - Fresh agent_id per test → spend starts at 0
  - Fresh cap=25 per test → no cap-leak between tests
  - Delete in teardown → no residual rows accumulate

The non-E2E tests in the file (TestAtomicUpsert,
TestUpdateCumulativeSpend, …) don't use e2e_config — they mock
SESSION_DETAILS and are unaffected.

Side effects:
  - Each test creates + deletes an assistant: ~2 extra HTTP
    round-trips per test. Acceptable cost given the
    correctness win.
  - Local DB rows accumulate transiently if a teardown DELETE
    fails (bare-except), but local.sh's docker-volume rebuild
    on restart clears them; CI runs are fresh per matrix job
    anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant