Custom planner v0 by YushaArif99 · Pull Request #2 · unifyai/unity

YushaArif99 · 2025-05-03T06:25:59Z

No description provided.

… of the custom planner implementation

…ives.py

…nctionNode, and Plan

…on logic

…mponents

…r generated plans

…cation in planner

…lans based on natural language tasks

…flow, including support for pausing and resuming plans, handling task events, and integrating browser state broadcasting. **NOTE**: still a WIP...

…ation logic

…ution

…c module loading logic

…anner and threading for command acknowledgment

… function retrieval from the reimplement queue

…n assertions in the verification process

…n planner primitives

…d modification functions.

…ion flow

…tion and exploration scenarios

…ath in BrowserWorker

…lement XPath

…until the event is set and returns the correct Primitive object.

… Planner, streamlining test suite.

…e on_hvr

…ification context, enhancing error handling and bridge function execution.

… the test suite.

… exports.

…ion only occurs if a valid main tab is returned, improving robustness of exploration flow.

…g the structure and content of the payload produced for function modifications.

…ker to enhance browser state communication.

…fying snapshot processing and data retrieval from the broadcast queue.

…ilitate efficient comparison of DOM structures and generate stable hashes for DOM subtrees.

…ner module, along with a helper function to retrieve it.

…ed versions of snapshots, including truncation of elements and optional DOM hashing.

…ducing a dispatcher for primitive and source code verification. Replace legacy fingerprinting and heuristic methods with streamlined checks for navigation and interactive primitives. Update payload structure for LLM verification to include detailed primitive information and DOM change summaries.

…cluding stability checks for _hash_dom, diff summary generation, and tiered verification logic for navigation, scrolling, and button interactions.

…rowser state during plan execution.

…fying plan execution and course correction handling through mocked dependencies and queues.

…ce stubbing of subsequent helper calls in zero_shot.py

…tralized definitions from sys_msg

…tricter assertions FAILING TEST - requires cancel_running fix to pass. Changes: - Use 2s simulated LLM time (realistic ratio vs 0.1s utterance interval) - Assert cancelled_count == 0 (no running tasks should be cancelled) - Remove dependency on actual LLM calls for deterministic timing - Improve docstring to clearly document the bug and required fixes This test now properly validates that BOTH fixes are required: 1. Debouncer asyncio.shield() fix (already committed) 2. cancel_running=False for voice mode (NOT YET COMMITTED) Without fix #2, test fails with 4 cancellations. With both fixes, test passes with 0 cancellations.

…k failures Four E2E spending tests have been failing in CI: - test_assistant_limit_check (DID NOT RAISE SpendingLimitExceededError) - test_inflight_cancellation_on_limit_exceeded (timing wrong) - test_limit_check_callback_allows_under_limit (allowed=False, cap=0.0, spend=$10) - test_limit_exceeded_blocks_llm_call (DID NOT RAISE) All four share a single root cause: state leaking through a SHARED "SpendingTest Assistant" record reused across every test in the file. The old e2e_config fixture did a "find-by-name then reuse, else create" lookup. Every test in TestE2ESpendingLimits got the same agent_id, so: 1. Cumulative spend (current_spend) is NEVER reset by Orchestra once an LLM call lands on it. Once any test makes a real LLM call, the assistant carries that spend for the rest of the session. test_limit_check_callback_allows_under_limit fails when it sees current_spend=$10 from earlier tests, even though it asserts the assistant "starts fresh". 2. The PATCH-based cap restore in test bodies (test_limit_exceeded_blocks_llm_call etc.) reads the *current* cap then restores it. If a previous test leaked cap=0, that becomes the "original" for the next test, making the leak permanent. 3. The fixture-level cap=None reset is best-effort with bare-except and silently fails on any Orchestra hiccup, leaving the cap unreset. The previous "await the reset PATCH" fix (c583ab2) addressed fragility #3 but couldn't address #1 (spend accumulation) or #2 (test-body restore racing the reset). Fix: each test gets its OWN freshly-created assistant with a unique surname (test-node-slug + 8-char UUID). The fixture: - Always POSTs a new assistant at setup (no find-by-name reuse) - Raises loudly on create failure (was: silently leaving test_agent_id=None then propagating to SESSION_DETAILS) - DELETEs the assistant at teardown via /assistant/{id} No state survives between tests: - Fresh agent_id per test → spend starts at 0 - Fresh cap=25 per test → no cap-leak between tests - Delete in teardown → no residual rows accumulate The non-E2E tests in the file (TestAtomicUpsert, TestUpdateCumulativeSpend, …) don't use e2e_config — they mock SESSION_DETAILS and are unaffected. Side effects: - Each test creates + deletes an assistant: ~2 extra HTTP round-trips per test. Acceptable cost given the correctness win. - Local DB rows accumulate transiently if a teardown DELETE fails (bare-except), but local.sh's docker-volume rebuild on restart clears them; CI runs are fresh per matrix job anyway.

YushaArif99 force-pushed the custom-planner branch 3 times, most recently from d2dab4e to 6e1941e Compare May 6, 2025 11:07

YushaArif99 added 27 commits May 7, 2025 23:12

Remove Planner class implementation from planner_proposal.py in favor…

ea2a6d9

… of the custom planner implementation

Add helper functions for creating Primitive actions in planner/primit…

aabd09d

…ives.py

Add model module for planner with core data structures: Primitive, Fu…

618358f

…nctionNode, and Plan

Add verifier module to planner with BubbleUp exception and verificati…

cc12071

…on logic

Add PlannerContext singleton to manage shared state across planner co…

971d218

…mponents

Add Unify client wrapper for planner to centralize LLM calls

f99f14f

Add code rewriting functionality and sandbox execution environment fo…

a435571

…r generated plans

Add update handler to process user updates for exploration and modifi…

5fa8176

…cation in planner

Add zero-shot planning functionality to generate and execute Python p…

04173d3

…lans based on natural language tasks

Made a start on the Planner class with task management and execution …

689ce47

…flow, including support for pausing and resuming plans, handling task events, and integrating browser state broadcasting. **NOTE**: still a WIP...

Add browser state broadcasting queue and new navigation commands

39e93c2

Add mock tests for planner primitives, sandbox execution, and verific…

c9bb326

…ation logic

black formatting

4e89536

Implement line caching for dynamic module source code in sandbox exec…

8471765

…ution

Refactor Planner class replacing the AST logic in favor of the dynami…

1c8c29d

…c module loading logic

Add a mock unit test for planner execution flow, including a dummy pl…

3091776

…anner and threading for command acknowledgment

Refactor function handling in Planner and Verifier to support dynamic…

3220535

… function retrieval from the reimplement queue

Refactor test_verify to simplify rewrite tracking and enhance functio…

7a04fee

…n assertions in the verification process

Enhance rewrite_function to accept optional source code

041bc4f

Add runtime control mechanisms for pausing and scheduling callables i…

f90806f

…n planner primitives

Refactor update handling in planner to support dynamic exploration an…

aab9beb

…d modification functions.

Add pause and resume functionality in Planner class to control execut…

683a417

…ion flow

Refactor test_verify to utilize monkeypatching for mocking dependencies

ac61f0c

Add mock unit tests for update handling in Planner, covering modifica…

acdeb3c

…tion and exploration scenarios

removing obsolete code and doing some cleanups

876ccdd

Add DOM state tracking in BrowserState and compute focused element XP…

ca5022a

…ath in BrowserWorker

Add additional state tracking in PlannerContext for DOM and focused e…

aa948d5

…lement XPath

YushaArif99 added 22 commits May 7, 2025 23:12

Add unit test for wait_for_user_signal primitive, ensuring it blocks …

fc0ae3c

…until the event is set and returns the correct Primitive object.

Remove redundant unit test for exploration scheduling and resuming in…

d57b65c

… Planner, streamlining test suite.

Fix button handling in _list_valid_actions by removing unused variabl…

ebfa8d8

…e on_hvr

Add diff payload generation in update_handler to include function mod…

aaa7f4f

…ification context, enhancing error handling and bridge function execution.

Remove obsolete unit test for update handling in Planner, cleaning up…

5625313

… the test suite.

Add public aliases for primitives in planner module and update module…

c0ff571

… exports.

Fix exploration completion handling in Planner by ensuring tab select…

4408e7a

…ion only occurs if a valid main tab is returned, improving robustness of exploration flow.

Add unit test for diff payload generation in update_handler, verifyin…

50a4104

…g the structure and content of the payload produced for function modifications.

Add broadcast queue support in BusManager, Controller, and BrowserWor…

c6ec2f3

…ker to enhance browser state communication.

Add unit test for context.get_snapshot method in planner module, veri…

765d327

…fying snapshot processing and data retrieval from the broadcast queue.

Add DOM diffing and hashing utilities in verifier_utils module to fac…

46123b9

…ilitate efficient comparison of DOM structures and generate stable hashes for DOM subtrees.

Add context variable for tracking current primitive execution in plan…

f96384d

…ner module, along with a helper function to retrieve it.

Add summarise_snapshot method to PlannerContext for creating summariz…

a5d3961

…ed versions of snapshots, including truncation of elements and optional DOM hashing.

Add unit tests for DOM utilities and Verifier class functionality, in…

e777a3b

…cluding stability checks for _hash_dom, diff summary generation, and tiered verification logic for navigation, scrolling, and button interactions.

Implement course correction functionality in update handler to sync b…

ef20207

…rowser state during plan execution.

Add integration test for modify_resume functionality in Planner, veri…

ca23074

…fying plan execution and course correction handling through mocked dependencies and queues.

refactor planner tests

9f1a9b3

Update primitive function names in course correction prompt and enfor…

aac4db8

…ce stubbing of subsequent helper calls in zero_shot.py

Add system messages and prompts for planning module

85857c8

Update Unify client to use caching in addition to tracing

7b8bc60

Refactor prompts and system messages in planner module to utilize cen…

2f4868c

…tralized definitions from sys_msg

YushaArif99 force-pushed the custom-planner branch from 2e5b9de to 2f4868c Compare May 7, 2025 18:13

minor improvements to the planner

27b9753

djl11 force-pushed the main branch 2 times, most recently from 0915ca5 to 85d10e2 Compare May 14, 2025 18:12

YushaArif99 closed this May 29, 2025

YushaArif99 deleted the custom-planner branch July 3, 2025 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom planner v0#2

Custom planner v0#2
YushaArif99 wants to merge 65 commits into
mainfrom
custom-planner

YushaArif99 commented May 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YushaArif99 commented May 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant