feat(core): expose agent.aiLongPress() and agent.aiClearInput()#2387
feat(core): expose agent.aiLongPress() and agent.aiClearInput()#2387
Conversation
Both actions already exist in the action space but were previously only reachable via callActionInActionSpace(). Add first-class ai* methods on Agent, consistent with aiTap / aiPinch, and wire the LongPress/ClearInput interfaceAlias into the cross-platform definitions. Android, iOS, and HarmonyOS devices are refactored to use defineActionLongPress(), removing hand-rolled schemas.
Deploying midscene with
|
| Latest commit: |
088542e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://29fc889c.midscene.pages.dev |
| Branch Preview URL: | https://feat-expose-ai-long-press-an.midscene.pages.dev |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be38e0f5c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| { | ||
| duration?: number; | ||
| locate: LocateResultElement; | ||
| defineActionLongPress(async (param) => { |
There was a problem hiding this comment.
Preserve platform long-press defaults in schema-driven flows
Switching this action to defineActionLongPress changes the schema from a mobile-specific duration?: number (no default) to the shared schema that carries a duration default of 500. In schema-driven callers (e.g. Playground/Visualizer), defaults are extracted from Zod and prefilled (packages/visualizer/src/types.ts, extractDefaultValue), then sent as explicit params (packages/playground/src/common.ts, executeAction), so Android/iOS long-press now sends duration=500 when users leave the field untouched instead of using device defaults (2000 on Android, 1000 on iOS). This is a behavior regression for mobile long-press timing.
Useful? React with 👍 / 👎.
Removing the hand-rolled LongPress schemas on Android and iOS accidentally routed their calls through ActionLongPressParamSchema, which declared `duration: z.number().default(500).optional()`. Zod applies the default before `.optional()`, so an omitted duration silently became 500 ms — replacing Android's 2000 ms and iOS's 1000 ms device-side defaults. Drop the schema-level default so parsed params preserve `undefined` when duration is omitted, letting each device's longPress(...) pick its own default (Android 2000, iOS 1000, Web 500 via base-page). Add a regression test that locks this behaviour, and update the EN/ZH aiLongPress docs to describe the real per-platform defaults.
Summary
ai*methods onAgentfor actions that were previously only reachable throughagent.callActionInActionSpace():agent.aiLongPress(locate, { duration? })agent.aiClearInput(locate)interfaceAlias: 'aiLongPress'on the shareddefineActionLongPress()helper so cross-platform action metadata is consistent (aiClearInputwas already wired).defineActionLongPress()instead of hand-rolling the schema, keeping the three platforms aligned with the core definition. HarmonyOS still ignoresdurationbecause the underlyinguitestAPI does not expose a custom hold time.Motivation
Users have asked us to open up dedicated interfaces for the remaining instant actions instead of driving them through the generic
callActionInActionSpace('LongPress' | 'ClearInput', ...)API:aiLongPressis a common gesture on mobile (context menus, selection mode, etc.) and was effectively hidden.aiClearInputis useful as an independent step (e.g. asserting empty-state validation) or when you want to decouple clearing from typing. It complementsaiInput({ mode: 'replace' })rather than replacing it.The
aiActplanner path is still available for natural-language flows; this PR is about giving users an equally ergonomic entry point for the deterministic path.API
Both methods accept the usual
LocateOption(deepLocate,xpath,cacheable, image prompts).Docs
apps/site/docs/en/api.mdxandapps/site/docs/zh/api.mdxgain dedicatedagent.aiLongPress()andagent.aiClearInput()sections, and the Instant Action intro list now mentions both methods.aiClearInputvs. the default clearing behaviour ofaiInput({ mode: 'replace' }).Test plan
pnpm run lintnpx nx test core— 806 passed (including the two new suites)pnpm exec vitest run tests/unit-test/action-long-press.test.ts tests/unit-test/action-clear-input.test.ts(insidepackages/core) — 12 passednpx nx test android— 264 passednpx nx test ios— 110 passednpx nx test harmony— 117 passedPre-existing failures outside the scope of this PR (verified reproducible on
main):packages/web-integration/tests/unit-test/playground-server.test.ts—EADDRINUSEon port 5800 in the local sandbox.packages/web-integration/tests/unit-test/yaml/player.test.ts > flush output even if assertion failed— e2e test timing out on network idle.