Skip to content

feat: Phase 2 & 3 — Memory, API, MCP, Takeover, Skills, Menu Bar, SDK Ecosystem#1

Merged
terryso merged 50 commits into
masterfrom
feature/phase3-vision-features
May 16, 2026
Merged

feat: Phase 2 & 3 — Memory, API, MCP, Takeover, Skills, Menu Bar, SDK Ecosystem#1
terryso merged 50 commits into
masterfrom
feature/phase3-vision-features

Conversation

@terryso
Copy link
Copy Markdown
Owner

@terryso terryso commented May 16, 2026

Summary

This PR merges Phase 2 (Growth Features) and Phase 3 (Vision Features) into master, bringing Axion from an MVP CLI tool to a full desktop automation platform. 50 commits, 569 files changed, 71k+ lines added.

Phase 2 — Growth Features (Epic 4–7)

  • Epic 4: Cross-run Memory — Auto-extract app operation patterns after each run; Planner injects historical experience for more accurate plans (axion memory list/clear, --no-memory)
  • Epic 5: HTTP API Server — REST API + SSE event stream for external integrations (axion server, task submission, real-time progress, auth, concurrency)
  • Epic 6: MCP Server Mode — Act as MCP stdio server for external agents like Claude Code (axion mcp)
  • Epic 7: Takeover & Fast Mode — Pause/resume when automation gets stuck; --fast mode reduces LLM calls for simple tasks

Phase 3 — Vision Features (Epic 8–11)

  • Epic 8: Multi-window Workflows — Cross-app coordination, window layout management (arrange_windows with tile/cascade), blocking dialog detection
  • Epic 9: Record → Compile → Skill Reuse — Record desktop operations and compile into reusable skills with zero LLM cost (axion record, axion skill compile/run/list/delete)
  • Epic 10: Menu Bar App (AxionBar) — Native macOS menu bar app with task panel, SSE real-time progress, global hotkeys, skill quick trigger
  • Epic 11: Third-party SDK Ecosystem — Agent project template, plugin tool registration (@Tool macro), developer documentation and examples

Other Changes

  • Test migration to Swift Testing framework (all test targets)
  • README rewritten in English and Chinese with accurate tool list (21 tools)
  • Playwright MCP server integration for web automation
  • SIGINT handler for graceful Helper process cleanup
  • SDK dependency bumped to 0.3.2

Test plan

  • All unit tests pass (swift test --filter for Tools/Models/MCP/Services/Core/CLI)
  • Integration tests verified on macOS with AX permissions
  • CI pipeline green
  • Manual acceptance tests passed for Epic 4–9
  • Smoke test: axion run "Open Calculator" on clean master merge
  • Verify axion server, axion mcp, axion record commands work end-to-end

🤖 Generated with Claude Code

terryso and others added 30 commits May 13, 2026 10:09
…Story 4.1)

Add AppMemoryExtractor to extract operation summaries from SDK message streams,
MemoryCleanupService for 30-day expiry cleanup, and Memory status check in
axion doctor. RunCommand now collects tool pairs during execution and persists
App knowledge entries organized by bundle identifier domain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…assed)

Verified Memory extraction, domain organization, expiry cleanup, corruption
resilience, and doctor status reporting with real CLI commands against live API.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…amiliarity tracking (Story 4.2)

Implement cross-run learning system that extracts AX tree structure features,
identifies high-frequency operation patterns, marks failure experiences, and
auto-marks familiar apps after 3+ successful runs. Fix tool name mismatch
(get_ax_tree → get_accessibility_tree) found during acceptance testing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…management commands (Story 4.3)

Inject accumulated App Memory (profiles, patterns, failures, familiarity) into Planner system prompt
for more accurate plans. Add axion memory list/clear commands and --no-memory flag. All 6 ACs verified
via manual acceptance testing (11/11 pass) and 578 unit tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add Hummingbird-based HTTP API server with REST endpoints for submitting
and querying desktop automation tasks. Includes server subcommand, async
task execution via AgentRunner, actor-based RunTracker, and comprehensive
unit tests (624 tests, 0 failures). Manual acceptance: 10/10 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The open-agent-sdk-swift on GitHub now requires swift-mcp 2.0.0.
Update axion to match and adapt to breaking API changes:
- Remove ParameterValue conformance (replaced by @Schemable macro)
- Change Tool.Content to ContentBlock (renamed in swift-mcp 2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implement Server-Sent Events (SSE) endpoint for real-time monitoring of
agent task execution. Includes EventBroadcaster actor for multi-client
pub/sub, SSE event models, replay buffer for completed runs, and
integration with AgentRunner for step-level event emission.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add hand-on acceptance testing document covering Epic 4 (Memory),
Epic 5 (HTTP API), Epic 6 (MCP Server), and Epic 7 (Takeover/Fast Mode)
with real commands for verification. Includes review patches from
Story 4.1-4.3 (code fixes, test improvements, security hardening),
epic retrospectives, and sprint status updates.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
- Add 30 new QA automate tests across Stories 8.1/8.2/8.3
  (CrossAppWorkflowTests, PlannerPromptMultiWindowTests, TraceWindowContextTests,
  WindowManagementToolTests additions)
- Add Epic 8 retrospective document
- Update README tool count from 16 to 22
- Update sprint-status: epic-8-retrospective done

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Fix SkillRunCommand --allow-foreground flag defaulting to true (ArgumentParser validation error)
- Add Epic 9 manual acceptance test document with real command verification
- Update manual-e2e-test-checklist.md with recording/skill test cases
- Update README with Record and Replay Skills section
- Update sprint-status, project-context, and epics for Epic 9 completion

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tool registration

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
StatusBarController.sendNotification used UNUserNotificationCenter directly,
which crashes in test environment without an app bundle. Extract
NotificationSending protocol with injectable mock to enable isolated testing.
Also includes Epic 10/11 retrospective updates and sprint-status sync.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… Testing

Migrate 24 test files from XCTest to Swift Testing framework:
- TM-1: AxionCoreTests (11 files)
- TM-2: AxionHelperTests Models (4), Services (7), MCP (1)

Key changes: import XCTest → import Testing, class → @suite struct,
func test_xxx → @test("xxx") func xxx(), XCTAssert* → #expect(),
XCTUnwrap → try #require(), XCTAssertThrowsError → #expect(throws:).
ServiceContainerTests uses .serialized to prevent parallel race conditions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
terryso and others added 20 commits May 15, 2026 19:49
Migrate HelperMCPServerTests and HelperProcessSmokeTests from XCTest to
Swift Testing (@Suite/@Test/#expect). Includes prior TM-2/TM-3 changes for
ServiceContainerTests and all Tool test files.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Migrate Config, Planner, Executor, Engine, Verifier, IO, Helper, and
Trace test files from XCTest to Swift Testing framework. setUp/tearDown
converted to init/deinit with ~Copyable structs. Env-var-dependent tests
use EnvGate actor to isolate global state. Add --no-parallel --quiet to
both Makefile and CI to prevent parallel test races.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add --quiet, --skip AxionCLIIntegrationTests, --skip AxionE2ETests
to match make test exactly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…t Testing

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Migrate 10 final XCTest files to Swift Testing:
- AxionE2ETests (5): CorePipeline, HelperLifecycle, MockLLM, RealLLM, Helpers
- AxionCLITests/Memory (5): AppMemoryExtractor, AppProfileAnalyzer,
  FamiliarityTracker, MemoryCleanupService, MemoryContextProvider

Fix CI failure: DocumentationTests now skips when SDK repo is unavailable.

`import XCTest` fully eliminated from Tests/. All 1561 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These tests validated docs in a sibling open-agent-sdk-swift repo,
not Axion's own code. Not relevant for unit test suite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The arrange_windows implementation uses NSScreen.visibleFrame whose
origin.y includes the menu bar height (62px on CI). Changed absolute
y==0 assertions to relative offsets (y differences) so tests pass
regardless of screen configuration.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…TaskCancellation

withTaskCancellationHandler does not fire on SIGINT — it only responds
to cooperative Swift Task cancellation. Replace with DispatchSource
signal handler so Ctrl-C properly stops recording and saves the file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nner

Document-based apps (TextEdit, Pages) show an Open panel on launch that
blocks all automation. Instead of auto-dismissing (which would interfere
with tasks that need to open a specific file), detect the blocking dialog
via window title keywords and include it in the launch_app result as
`blocking_dialog: { window_id, title }`. The Planner then decides whether
to dismiss (Cmd+N, Escape) or interact with the dialog based on the task.

Follows OpenClick's detection approach: title keyword matching (open/save/
import/export + Chinese equivalents) with minimum window size filter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When recording on non-English macOS, app_switch events captured localized
names (e.g. "计算器") which launch_app couldn't resolve since the actual
file is Calculator.app. Now records bundle_id alongside localized name,
and skill compiler prefers bundle_id for launch_app arguments. Also adds
localized display name fallback matching in AppLauncherService.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UNUserNotificationCenter.current() crashes with NSException when the
process has no valid bundle proxy (e.g. running from swift build debug
directory). Add a guard to skip notification setup in non-bundle mode.

Also add scripts/build-bar-bundle.sh to create a proper .app bundle
for development and testing of the AxionBar macOS menu bar app.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AppMemoryExtractor relied solely on toolResult.isError to classify
runs as success/failure, but AxionHelper tools (e.g. launch_app) catch
errors and return structured JSON with "error" and "message" fields
instead of throwing. This caused the MCP framework to leave isError
false, so failures were recorded as successes in App Memory.

Add contentContainsErrorPayload() to also detect error payloads in
result JSON content, fixing failure tagging in memory entries and
profiles.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LM timeout

- Pass user's actual input (e.g. credentials) as resume context instead of
  fixed "用户已完成手动操作" string
- Improve takeover prompt to explain both manual-desktop and text-input options
- Add 90s resume watchdog: if LLM doesn't respond after takeover resume,
  interrupt and suggest running a follow-up task
- Fix AxionHelper crash when AX elements return NaN/Infinity bounds
- Fix AppBundleTests bundle ID expectation mismatch

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uming old agent

After takeover, close the current agent and create a new one with a minimal
context (original task + takeover summary + current screen state). This avoids
the 25K+ token context that caused LLM API timeouts with the old resume approach.

Inspired by openclick's replan-after-takeover pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Built-in Playwright MCP alongside axion-helper so agents can use DOM-level
web interactions (form filling, clicking, navigation) instead of relying on
AX tree which doesn't expose web form elements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… for web tasks

Revert the replan-with-fresh-context approach back to agent.resume() since
Playwright MCP keeps context small. Update planner prompt to instruct agent
to prefer Playwright for any web/URL/browser task, which eliminates the AX
tree limitation that caused takeovers in the first place.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Rewrite English and Chinese READMEs with Phase 2 (Memory, HTTP API,
  MCP Server, Takeover, Fast Mode) and Phase 3 (multi-window, record
  & skills, menu bar app, SDK ecosystem) features
- Correct MCP tool count from 22 to 21, replace non-existent tools
  with actual tools (start_recording, stop_recording)
- Add SIGINT handler in RunCommand for graceful Helper cleanup
- Bump open-agent-sdk-swift to 0.3.2, add .playwright-mcp to gitignore

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@terryso terryso merged commit 14e6208 into master May 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant