added a nice chat interface component to play with by Infrared1029 · Pull Request #3 · unifyai/unity

Infrared1029 · 2025-05-05T19:58:17Z

No description provided.

…k failures Four E2E spending tests have been failing in CI: - test_assistant_limit_check (DID NOT RAISE SpendingLimitExceededError) - test_inflight_cancellation_on_limit_exceeded (timing wrong) - test_limit_check_callback_allows_under_limit (allowed=False, cap=0.0, spend=$10) - test_limit_exceeded_blocks_llm_call (DID NOT RAISE) All four share a single root cause: state leaking through a SHARED "SpendingTest Assistant" record reused across every test in the file. The old e2e_config fixture did a "find-by-name then reuse, else create" lookup. Every test in TestE2ESpendingLimits got the same agent_id, so: 1. Cumulative spend (current_spend) is NEVER reset by Orchestra once an LLM call lands on it. Once any test makes a real LLM call, the assistant carries that spend for the rest of the session. test_limit_check_callback_allows_under_limit fails when it sees current_spend=$10 from earlier tests, even though it asserts the assistant "starts fresh". 2. The PATCH-based cap restore in test bodies (test_limit_exceeded_blocks_llm_call etc.) reads the *current* cap then restores it. If a previous test leaked cap=0, that becomes the "original" for the next test, making the leak permanent. 3. The fixture-level cap=None reset is best-effort with bare-except and silently fails on any Orchestra hiccup, leaving the cap unreset. The previous "await the reset PATCH" fix (c583ab2) addressed fragility #3 but couldn't address #1 (spend accumulation) or #2 (test-body restore racing the reset). Fix: each test gets its OWN freshly-created assistant with a unique surname (test-node-slug + 8-char UUID). The fixture: - Always POSTs a new assistant at setup (no find-by-name reuse) - Raises loudly on create failure (was: silently leaving test_agent_id=None then propagating to SESSION_DETAILS) - DELETEs the assistant at teardown via /assistant/{id} No state survives between tests: - Fresh agent_id per test → spend starts at 0 - Fresh cap=25 per test → no cap-leak between tests - Delete in teardown → no residual rows accumulate The non-E2E tests in the file (TestAtomicUpsert, TestUpdateCumulativeSpend, …) don't use e2e_config — they mock SESSION_DETAILS and are unaffected. Side effects: - Each test creates + deletes an assistant: ~2 extra HTTP round-trips per test. Acceptable cost given the correctness win. - Local DB rows accumulate transiently if a teardown DELETE fails (bare-except), but local.sh's docker-volume rebuild on restart clears them; CI runs are fresh per matrix job anyway.

added a nice chat interface component to play with

95467da

Infrared1029 merged commit cd23824 into main May 5, 2025

djl11 deleted the start_chat branch December 21, 2025 12:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added a nice chat interface component to play with#3

added a nice chat interface component to play with#3
Infrared1029 merged 1 commit into
mainfrom
start_chat

Infrared1029 commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Infrared1029 commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant