Skip to content

feat(core): expose agent.aiLongPress() and agent.aiClearInput()#2387

Merged
quanru merged 2 commits intomainfrom
feat/expose-ai-long-press-and-clear-input
Apr 23, 2026
Merged

feat(core): expose agent.aiLongPress() and agent.aiClearInput()#2387
quanru merged 2 commits intomainfrom
feat/expose-ai-long-press-and-clear-input

Conversation

@quanru
Copy link
Copy Markdown
Collaborator

@quanru quanru commented Apr 22, 2026

Summary

  • Add two first-class ai* methods on Agent for actions that were previously only reachable through agent.callActionInActionSpace():
    • agent.aiLongPress(locate, { duration? })
    • agent.aiClearInput(locate)
  • Wire interfaceAlias: 'aiLongPress' on the shared defineActionLongPress() helper so cross-platform action metadata is consistent (aiClearInput was already wired).
  • Refactor the Android / iOS / HarmonyOS device implementations to use defineActionLongPress() instead of hand-rolling the schema, keeping the three platforms aligned with the core definition. HarmonyOS still ignores duration because the underlying uitest API does not expose a custom hold time.

Motivation

Users have asked us to open up dedicated interfaces for the remaining instant actions instead of driving them through the generic callActionInActionSpace('LongPress' | 'ClearInput', ...) API:

  • aiLongPress is a common gesture on mobile (context menus, selection mode, etc.) and was effectively hidden.
  • aiClearInput is useful as an independent step (e.g. asserting empty-state validation) or when you want to decouple clearing from typing. It complements aiInput({ mode: 'replace' }) rather than replacing it.

The aiAct planner path is still available for natural-language flows; this PR is about giving users an equally ergonomic entry point for the deterministic path.

API

// New
await agent.aiLongPress('the first article on the homepage');
await agent.aiLongPress('the message bubble', { duration: 2000 });

await agent.aiClearInput('the search input field');
await agent.aiClearInput('the email input', { deepLocate: true });

Both methods accept the usual LocateOption (deepLocate, xpath, cacheable, image prompts).

Docs

  • apps/site/docs/en/api.mdx and apps/site/docs/zh/api.mdx gain dedicated agent.aiLongPress() and agent.aiClearInput() sections, and the Instant Action intro list now mentions both methods.
  • A short tip clarifies when to reach for aiClearInput vs. the default clearing behaviour of aiInput({ mode: 'replace' }).

Test plan

  • pnpm run lint
  • npx nx test core — 806 passed (including the two new suites)
  • pnpm exec vitest run tests/unit-test/action-long-press.test.ts tests/unit-test/action-clear-input.test.ts (inside packages/core) — 12 passed
  • npx nx test android — 264 passed
  • npx nx test ios — 110 passed
  • npx nx test harmony — 117 passed

Pre-existing failures outside the scope of this PR (verified reproducible on main):

  • packages/web-integration/tests/unit-test/playground-server.test.tsEADDRINUSE on port 5800 in the local sandbox.
  • packages/web-integration/tests/unit-test/yaml/player.test.ts > flush output even if assertion failed — e2e test timing out on network idle.

Both actions already exist in the action space but were previously
only reachable via callActionInActionSpace(). Add first-class ai*
methods on Agent, consistent with aiTap / aiPinch, and wire the
LongPress/ClearInput interfaceAlias into the cross-platform
definitions. Android, iOS, and HarmonyOS devices are refactored to
use defineActionLongPress(), removing hand-rolled schemas.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 22, 2026

Deploying midscene with  Cloudflare Pages  Cloudflare Pages

Latest commit: 088542e
Status: ✅  Deploy successful!
Preview URL: https://29fc889c.midscene.pages.dev
Branch Preview URL: https://feat-expose-ai-long-press-an.midscene.pages.dev

View logs

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be38e0f5c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{
duration?: number;
locate: LocateResultElement;
defineActionLongPress(async (param) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve platform long-press defaults in schema-driven flows

Switching this action to defineActionLongPress changes the schema from a mobile-specific duration?: number (no default) to the shared schema that carries a duration default of 500. In schema-driven callers (e.g. Playground/Visualizer), defaults are extracted from Zod and prefilled (packages/visualizer/src/types.ts, extractDefaultValue), then sent as explicit params (packages/playground/src/common.ts, executeAction), so Android/iOS long-press now sends duration=500 when users leave the field untouched instead of using device defaults (2000 on Android, 1000 on iOS). This is a behavior regression for mobile long-press timing.

Useful? React with 👍 / 👎.

Removing the hand-rolled LongPress schemas on Android and iOS
accidentally routed their calls through ActionLongPressParamSchema,
which declared `duration: z.number().default(500).optional()`. Zod
applies the default before `.optional()`, so an omitted duration
silently became 500 ms — replacing Android's 2000 ms and iOS's
1000 ms device-side defaults.

Drop the schema-level default so parsed params preserve `undefined`
when duration is omitted, letting each device's longPress(...) pick
its own default (Android 2000, iOS 1000, Web 500 via base-page).
Add a regression test that locks this behaviour, and update the EN/ZH
aiLongPress docs to describe the real per-platform defaults.
@quanru quanru merged commit 0e56495 into main Apr 23, 2026
9 checks passed
@quanru quanru deleted the feat/expose-ai-long-press-and-clear-input branch April 23, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants