feat(orchestrator): wire generate_presentation + require grounding (#2780)#3029
feat(orchestrator): wire generate_presentation + require grounding (#2780)#3029oxoxDev wants to merge 25 commits into
Conversation
…humansai#2778) tinyhumansai#2776 (PR tinyhumansai#2801) shipped the artifact storage layer with read RPCs (ai_list_artifacts / ai_get_artifact / ai_delete_artifact) but no producer surface — tools that wanted to register a generated file had no public path into the store. Adds three composable helpers: - create_artifact(workspace_dir, kind, title, extension) -> (ArtifactMeta, PathBuf): allocates a fresh UUID-named dir under <workspace>/artifacts/, persists a Pending ArtifactMeta record, returns the meta plus the absolute path the caller writes bytes to. Filename stem is sanitized from the title (alphanumerics + dashes, ≤ 80 chars) so user-supplied titles can't escape the artifact dir. - finalize_artifact(workspace_dir, id, size_bytes): flips status to Ready + persists final size. Idempotent on already-Ready records. - fail_artifact(workspace_dir, id, reason): flips to Failed + persists the reason via the new ArtifactMeta.error field. Extends ArtifactMeta with optional error: Option<String> (serde default + skip_if_none) so list_artifacts / get_artifact callers can surface why a build did not produce a usable file without scraping a separate log. Persisted records that predate this field round-trip fine through serde::default. Used by the tinyhumansai#2778 presentation tool in this PR; future Python / binary-output tools share the same surface so per-tool dirs + status bookkeeping don't fragment.
…inyhumansai#2778) runtime_python previously exposed only stream-oriented primitives (spawn_stdio_process, PythonBootstrap::spawn_stdio) sized for the MCP stdio path. Tools that want a simpler 'run a one-shot script, pipe stdin, wait, capture stdout+stderr' contract were each rolling their own wait-and-bound plumbing. Two new modules: run.rs — run_python_script_to_completion(resolved, spec, stdin, deadline) -> PythonRunOutput { exit_code, stdout, stderr }. Wraps spawn_stdio_process + tokio::time::timeout + wait_with_output. Returns Ok(...) on non-zero exit (callers usually want to quote the stderr in their own error variant); only timeout / spawn failures surface as Err. PythonRunTimeout is its own typed error so call sites can downcast. venv.rs — ensure_venv(name, requirements, config) -> ResolvedPython. First call resolves the base interpreter via PythonBootstrap, creates <cache_dir>/venvs/<name>/ via 'python -m venv', runs 'python -m pip install <requirements>' bounded by a 5-minute timeout, and writes a sorted/deduped requirements.txt for change detection. Subsequent calls short-circuit when the recorded requirements match. Venv name is sanitised against path traversal + separators. Both modules are intentionally narrow — they own subprocess plumbing + caching only, no package-manager features beyond bare pip install. The presentation tool in this PR is the first consumer; future Python-backed tools (tinyhumansai#2780 + later) reuse the same helpers.
…nyhumansai#2778) New tool 'generate_presentation' that takes a structured slide spec {title, author?, theme?, slides[{title?, body?, bullets?, speaker_notes?}]} and produces a .pptx file via a bundled python-pptx helper script. The tool fulfills sub-task tinyhumansai#2778 of tinyhumansai#1535. Module layout (src/openhuman/tools/impl/presentation/): - generate_pptx.py: pure-Python helper, reads JSON spec from stdin, writes PPTX to --output PATH. Single file, no relative imports, no eval. Bundled via include_str! so packaging is bundle-free. - script.rs: include_str! + lazy tempfile materialiser (tokio::sync::OnceCell), cached per-process. - types.rs: GeneratePresentationInput / SlideSpec / Output / PresentationError. Validation caps: ≤ 64 slides, ≤ 2000 chars per field, ≤ 32 bullets/slide. Eager Rust-side validation so the LLM gets InvalidInput { field, reason } instead of a python-pptx traceback. - invoker.rs: PythonInvoker trait + RealPythonInvoker impl. RealPythonInvoker bridges to runtime_python::venv::ensure_venv + run_python_script_to_completion. Trait seam lets unit tests inject a MockPythonInvoker so coverage on success / non-zero / timeout / missing-runtime / missing-package branches does not require a live Python. - mod.rs: Tool trait impl. Router-rule description ('USE THIS ... NOT for ...') per existing tool conventions. 60s generation timeout (matches multimodal PDF cap). - tests.rs: 12 unit tests covering schema shape, every validation rejection path, success via MockPythonInvoker (asserts artifact flipped Pending -> Ready), GenerationFailed via non-zero exit with truncated stderr, Timeout, MissingRuntime, MissingPackage, and missing-output-file recovery. python-pptx==1.0.2 is installed into a managed venv at <runtime_python.cache_dir>/venvs/presentation-pptx/ on first invocation (~30s on cold pip cache); subsequent calls short-circuit inside ensure_venv. The pin protects against historical layout- placeholder breakage in minor-version bumps. Tool registration in tools/ops.rs is gated on config.runtime_python.enabled so disabled deployments never even see the tool in the agent's tool catalog. Orchestrator agent-definition wiring (tinyhumansai#2780) lands separately; this PR registers the tool in all_tools_with_runtime so CLI / bus / test paths can invoke it. Stable name: 'generate_presentation' — coordinated with tinyhumansai#2780.
…ing (tinyhumansai#2778) tests/presentation_tool.rs spins up the real PresentationTool against a temp workspace and a temp runtime_python cache, drives it through a host Python interpreter (prefer_system = true so it does not reach for the managed installer), and asserts: - the produced file starts with the PK\x03\x04 zip magic, and - the zip contains the OOXML [Content_Types].xml manifest so the test is a meaningful end-to-end check that python-pptx actually ran and produced a valid PPTX (not just any zip blob). The test skips with an eprintln! note when the host lacks python3 or python-pptx so local contributors without a Python install still see green. CI gets a 'pip install python-pptx==1.0.2' step in the Rust-core coverage lane (.github/workflows/coverage.yml) so the success branch is exercised on every PR. A second integration test asserts the validation path runs before any Python invocation (rejecting an empty title without spawning a subprocess), which is the fast-feedback contract the agent depends on for InvalidInput correction. Live-verified locally: integration tests pass end-to-end against a real python3 + python-pptx install — the bundled generate_pptx.py script materialised, ensure_venv built a real venv against a temp cache dir, pip install python-pptx==1.0.2 succeeded, the script produced a valid PPTX containing the expected OOXML structure.
tinyhumansai#2778) Without this entry the orchestrator's named-tool whitelist (orchestrator/agent.toml:101 'named = [...]') filters generate_presentation out of the LLM-visible tool catalog, so even though tools/ops.rs registers it globally the web chat / bus path can't reach it. Adding the bare name here closes the loop for the 'create slides' routing case the parent issue tinyhumansai#1535 explicitly calls out, without waiting on the broader tinyhumansai#2780 agent-definition sweep (per-tier tool_filter policy + structured error mapping). disallowed_tools refinements, error-variant surfacing through tool_loop). The duplicate entry there will be a harmless no-op.
…yhumansai#2778) The author/title/slide-field checks bound input size, but the optional theme field was never validated. Apply the same MAX_TEXT_CHARS cap so a caller cannot bypass the input-size guardrail by stuffing the theme slot.
…rors (tinyhumansai#2778) CR findings: - info log emitted user-supplied input.title verbatim; swap for title_chars + has_author so the structured log carries no PII. - script::materialise_script / serde_json::to_vec / invoker.run all used `?`, so any error left the artifact stuck in Pending. Wrap the setup-through-invocation block so an error flips the artifact to Failed with a structured reason before returning.
…inyhumansai#2778) CR finding: the materialised script path was derived deterministically from std::process::id() and written via tokio::fs::write, which would follow a pre-created symlink or overwrite an attacker-planted file in the shared temp directory. Switch to a uuid-suffixed filename opened with O_EXCL (OpenOptions::create_new) so a hostile pre-create fails the open instead of redirecting or clobbering the script.
…ansai#2778) The bundled script's contract requires an absolute output path, but the script never enforced it. Add a defence-in-depth guard so a future regression on the Rust caller cannot silently redirect the generated .pptx to a cwd-relative location.
…humansai#2778) CR finding: write_all + shutdown on the child's stdin were awaited unbounded, so a stuck pipe could keep the function past `deadline` before the wait_with_output timeout ever started. Compute remaining time from a single Instant::now() anchor and wrap every awaited stdin/wait operation in timeout(remaining, _) so the whole call is bounded by `deadline`.
…umansai#2778) CR findings on venv.rs: - info log dumped raw requirement strings, which can legitimately include credentialed/private index URLs. Log the count instead. - create_venv awaited `python -m venv` output() with no bound, so a stuck interpreter could hang the parent task indefinitely. Wrap the spawn in tokio::time::timeout(VENV_INSTALL_TIMEOUT, _) so the same budget as pip install applies to bootstrap.
…humansai#2778) CR finding: python_available() and python_pptx_importable() probed [python3, python] independently, so they could succeed on different interpreters — and then the real tool invocation in execute() would pick yet another, producing nondeterministic skips/failures. Resolve the binary once via resolved_python() and pass that name into the pptx-import probe so both checks pin to the same interpreter.
…lper dismissWalkthroughIfPresent loops on the Skip-tour button for up to 5s and falls back to setting `openhuman:walkthrough_completed` in localStorage when the button never appears. That flag prevents future tours but does not unmount an already-mounted #react-joyride-portal — its full-screen overlay keeps intercepting clicks on the next interaction. After the localStorage fallback, scrub any remaining portal nodes from the DOM so subsequent clicks (skill-install buttons, tabs, provider rows) are not blocked. Tests that already passed are unaffected; tests racing against a slow joyride mount no longer time out waiting on an overlay nobody can dismiss.
tinyhumansai#2778) CR finding (PR tinyhumansai#3017): the stdin write_all/shutdown ran to completion before wait_with_output drained the child's stdout/stderr. If the script emitted enough output to fill the OS pipe buffer (~64 KiB on Linux) while we were still writing stdin, the child blocked on its output, stopped reading stdin, and write_all blocked too — classic pipe deadlock. The deadline timeout from the earlier round bounded how long we waited but didn't address the deadlock itself. Move the stdin write+shutdown onto a spawned tokio task so the main task can drive wait_with_output concurrently. Both pipes make progress against each other. On deadline timeout we drop the child handle (kills the process, breaks the pipe) and join the writer task so it surfaces any error before we return. The slide-spec payload for the presentation tool can easily exceed 64 KiB once user content is interpolated, so this was not theoretical. Existing run.rs unit tests (round-trip stdin, non-zero exit, timeout on runaway script) continue to pass.
…e portals The previous one-shot scrub stripped #react-joyride-portal nodes at helper exit, but the walkthrough effect re-mounts a fresh portal on later hash-route navigations (e.g. spec calls dismissWalkthroughIfPresent then goto'/#/skills' or clicks a tab that triggers a route change). The localStorage flag prevents future tours from starting, but a portal that mounted before the flag was persisted lingers and the new mount keeps coming back. After the localStorage fallback, install an idempotent MutationObserver on document.body that removes any #react-joyride-portal as soon as it appears, for the rest of the page lifetime. Re-runs of the helper no-op via a window flag. Fixes the lane 3 settings-channels-permissions and lane 4 skills-registry flakes where the channels-tab / skill-install click was intercepted by an overlay that mounted after the helper exited.
…inyhumansai#2778) parse_python_version is lenient — it strips suffix characters from the patch segment — so 'cpython-3.15.0b1+20260510-...' was treated as version 3.15.0 by the candidate sort. Combined with the selector's descending-version sort, this caused dev hosts to pull an unreleased Python beta whenever python-build-standalone's latest GitHub release contained one. Third-party wheels are typically absent for unreleased Pythons, so pip falls back to source builds that crash on the project's missing native toolchain (observed 2026-05-30: Pillow 11.x has no 3.15 wheel, source build fails on 'llvm-ar' not found, blowing up every generate_presentation call in dev:app). Filter pre-release tags (any version_str containing non-digit / non-dot characters) out of parse_distribution_asset before the candidate sort. Stable releases continue to flow through untouched; betas/alphas/rcs are dropped so the selector picks the highest stable patch release for which wheels exist (today: 3.14.x on macOS arm64). Test added: rejects_prerelease_versions covers b1 / a2 / rc1 asset names across multiple host triples. Existing parses_asset_into_ distribution + ignores_non_install_only_assets remain green. Originally drafted on feat/2779-artifact-card so the live dev:app on that branch could verify the happy-path artifact_ready flow; cherry-picked here so the fix lands with the rest of the runtime_python domain (which tinyhumansai#2778 introduced).
Adds the third PresentationError variant called for in tinyhumansai#2780 (the other two — MissingRuntime + GenerationFailed — already shipped with tinyhumansai#2778). Reserved for the planned 'format' selector that will let callers request alternative deck formats (.pdf / .key / image strips); execute() does not construct it yet, but downstream error-handling sites can now pattern-match exhaustively without a churn-y enum bump later.
Adds a 'Presentation generation' section to the orchestrator system prompt mandating that any topical or factual deck request first calls a grounding tool (memory_tree / delegate_research / query_memory) before generate_presentation is dispatched. Without the rule the orchestrator invents slide bullets and speaker notes from training-data priors — observed live on PR tinyhumansai#3017 (live dev:app, 2026-05-30): a five-slide deck on 'current AI trends' produced fully-confabulated stats with zero retrieval. See tinyhumansai#2780 comment for the trace. Skips grounding when the user pasted source material in-turn, a prior turn already established sources, the deck is content-free / structural, or the user explicitly waived. Pinned in prompt::tests::build_requires_grounding_before_generate_presentation so a future prompt refactor cannot quietly drop the guardrail.
Two structural invariants: 1. orchestrator agent.toml MUST list 'generate_presentation' — without it the function-calling schema drops the tool and the 'create slides' routing case from tinyhumansai#1535 silently fails. 2. code_executor agent.toml MUST NOT list 'generate_presentation' — pptx rendering has its own runtime_python venv and exposing it to code_executor would create a duplicate dispatch path that bypasses the orchestrator grounding rule (prompt.md // tinyhumansai#2780). Refreshes the agent.toml comment so the next reader knows the wiring landed in tinyhumansai#2780 and points at both the grounding-rule prompt section and the test that pins it.
|
Warning Review limit reached
More reviews will be available in 12 minutes and 25 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR implements the ChangesPresentation Generation Tool
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/openhuman/runtime_python/venv.rs (1)
250-287: 💤 Low valueConsider reusing
run_python_script_to_completioninstead of duplicating subprocess handling.The comment at lines 266-267 mentions reusing the run helper, but the implementation re-implements timeout + output capture inline. This works correctly, but could be simplified to call
run_python_script_to_completiondirectly.♻️ Suggested simplification
async fn run_pip_install( resolved: &ResolvedPython, spec: &PythonLaunchSpec, ) -> Result<super::run::PythonRunOutput> { - // pip install is the heavy-weight first-call path; reuse the - // generic run helper but with the venv-install timeout instead of - // the per-tool default. - // - // We rebuild spec as a positional one to satisfy - // spawn_stdio_process's contract: script_path -> first arg after - // `python`. We pass `-m` as the "script" and `pip install …` as - // args, matching the `python -m pip install …` form. - let _ = spec; // placeholder — actually consumed below - let mut launch = PythonLaunchSpec::new(std::path::PathBuf::from("-m")); - launch.args = spec.args.clone(); - launch.unbuffered = false; - // Reuse spawn_stdio_process via the run helper so timeout + - // stdout/stderr capture are uniform. - let mut child = - spawn_stdio_process(resolved, &launch).context("spawning python -m pip install")?; - if let Some(mut stdin) = child.stdin.take() { - use tokio::io::AsyncWriteExt; - let _ = stdin.shutdown().await; - } - let output_future = child.wait_with_output(); - let output = match tokio::time::timeout(VENV_INSTALL_TIMEOUT, output_future).await { - Ok(r) => r.context("waiting on pip install subprocess")?, - Err(_) => bail!( - "pip install exceeded {}s timeout", - VENV_INSTALL_TIMEOUT.as_secs() - ), - }; - Ok(super::run::PythonRunOutput { - exit_code: output.status.code().unwrap_or(-1), - stdout: String::from_utf8_lossy(&output.stdout).into_owned(), - stderr: String::from_utf8_lossy(&output.stderr).into_owned(), - }) + super::run::run_python_script_to_completion( + resolved, + spec, + None, + VENV_INSTALL_TIMEOUT, + ) + .await }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/runtime_python/venv.rs` around lines 250 - 287, The run_pip_install function duplicates subprocess spawning, timeout and output capture logic; replace the manual spawn_stdio_process + wait_with_output block with a call to the existing helper run_python_script_to_completion: construct the positional PythonLaunchSpec as you already do (new("-m") with launch.args and unbuffered=false), then call run_python_script_to_completion(resolved, &launch, VENV_INSTALL_TIMEOUT) (or the equivalent signature) and map/propagate errors with the same context string ("spawning python -m pip install" / "waiting on pip install subprocess") so the returned PythonRunOutput is produced by that helper instead of re-implementing timeout/stdout/stderr handling; ensure stdin is closed if run_python_script_to_completion does not do so.src/openhuman/tools/impl/presentation/invoker.rs (1)
101-119: ⚖️ Poor tradeoffFragile error classification via substring matching on
ensure_venverror text.
src/openhuman/tools/impl/presentation/invoker.rsdecidesMissingRuntimevsPackageInstallFailedby checkingerr.to_string().contains(...). The current substrings do exist insrc/openhuman/runtime_python/bootstrap.rs(runtime_python is disabled ...),src/openhuman/runtime_python/venv.rs(context("resolving base python")), andsrc/openhuman/runtime_python/venv.rs(venv created but no python binary...), but this is still brittle—any wording change will silently change classification. Return a typed error/kind fromensure_venvand match on that instead of free-text substrings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/tools/impl/presentation/invoker.rs` around lines 101 - 119, The current classification of venv failures in the match on ensure_venv uses brittle substring checks on err.to_string(); change ensure_venv to return a typed error (e.g., an enum EnsureVenvError { MissingRuntime, NoPython, PackageInstallFailed(String), ... } or include a kind() accessor) and update the invoker.rs match to match on that error kind instead of string contents; specifically, modify ensure_venv (and its error type in runtime_python::bootstrap or runtime_python::venv) to propagate a distinct variant for "missing runtime"/"no python" and for install failures, then replace the current string contains(...) logic in invoker.rs to return InvocationOutcome::MissingRuntime when the error variant indicates a missing runtime and InvocationOutcome::PackageInstallFailed for install-related variants, preserving the original error message in the PackageInstallFailed reason.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/agent_registry/agents/orchestrator/prompt.md`:
- Around line 157-163: The grounding-rule wording mismatches the actual delegate
name: reconcile the `delegate_research` vs `research` naming by first checking
the researcher delegate's declared delegate_name (the researcher delegate
implementation), then update the prompt text that references `delegate_research`
(the grounding bullet in prompt.md referencing generate_presentation) to exactly
match that delegate_name, and finally update the literal strings/assertion in
loader code and prompt assertion (the `"delegate_research"` occurrences in
loader.rs and the check in prompt.rs) so all three places use the same tool
name; run the verification/test that failed when `*researcher*` was treated as a
path to confirm the single canonical delegate name is used everywhere.
In `@tests/presentation_tool.rs`:
- Around line 27-36: The test's python probe (resolved_python() and
python_pptx_importable) must be aligned with runtime_python's actual selection
logic: update the probe to use the same resolution order and minimum_version
check as detect_system_python() (try python3.12 → python3 → python and enforce
minimum_version default 3.12.0) and, when prefer_system = true, determine
whether the runtime would pick a system interpreter versus creating a venv; if
the runtime will create an isolated venv (runtime_python::venv / ensure_venv()),
do not rely on the host's importability of python-pptx—instead treat host
importability only as a hint and let the test assume the venv will pip-install
python-pptx, or alternatively run the import check inside a fresh venv created
by the resolved interpreter to mirror runtime behavior. Ensure you change calls
to resolved_python() and python_pptx_importable to use this new logic so the
test's skip/selection matches runtime_python and detect_system_python.
---
Nitpick comments:
In `@src/openhuman/runtime_python/venv.rs`:
- Around line 250-287: The run_pip_install function duplicates subprocess
spawning, timeout and output capture logic; replace the manual
spawn_stdio_process + wait_with_output block with a call to the existing helper
run_python_script_to_completion: construct the positional PythonLaunchSpec as
you already do (new("-m") with launch.args and unbuffered=false), then call
run_python_script_to_completion(resolved, &launch, VENV_INSTALL_TIMEOUT) (or the
equivalent signature) and map/propagate errors with the same context string
("spawning python -m pip install" / "waiting on pip install subprocess") so the
returned PythonRunOutput is produced by that helper instead of re-implementing
timeout/stdout/stderr handling; ensure stdin is closed if
run_python_script_to_completion does not do so.
In `@src/openhuman/tools/impl/presentation/invoker.rs`:
- Around line 101-119: The current classification of venv failures in the match
on ensure_venv uses brittle substring checks on err.to_string(); change
ensure_venv to return a typed error (e.g., an enum EnsureVenvError {
MissingRuntime, NoPython, PackageInstallFailed(String), ... } or include a
kind() accessor) and update the invoker.rs match to match on that error kind
instead of string contents; specifically, modify ensure_venv (and its error type
in runtime_python::bootstrap or runtime_python::venv) to propagate a distinct
variant for "missing runtime"/"no python" and for install failures, then replace
the current string contains(...) logic in invoker.rs to return
InvocationOutcome::MissingRuntime when the error variant indicates a missing
runtime and InvocationOutcome::PackageInstallFailed for install-related
variants, preserving the original error message in the PackageInstallFailed
reason.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: da327ecd-b427-4024-a47f-2bbd83850531
📒 Files selected for processing (24)
.github/workflows/coverage.ymlapp/test/playwright/helpers/core-rpc.tssrc/openhuman/agent_registry/agents/orchestrator/agent.tomlsrc/openhuman/agent_registry/agents/orchestrator/prompt.mdsrc/openhuman/agent_registry/agents/orchestrator/prompt.rssrc/openhuman/artifacts/mod.rssrc/openhuman/artifacts/store.rssrc/openhuman/artifacts/store_tests.rssrc/openhuman/artifacts/types.rssrc/openhuman/runtime_python/downloader.rssrc/openhuman/runtime_python/downloader_tests.rssrc/openhuman/runtime_python/mod.rssrc/openhuman/runtime_python/run.rssrc/openhuman/runtime_python/venv.rssrc/openhuman/tools/impl/mod.rssrc/openhuman/tools/impl/presentation/generate_pptx.pysrc/openhuman/tools/impl/presentation/invoker.rssrc/openhuman/tools/impl/presentation/mod.rssrc/openhuman/tools/impl/presentation/script.rssrc/openhuman/tools/impl/presentation/tests.rssrc/openhuman/tools/impl/presentation/types.rssrc/openhuman/tools/ops.rstests/orchestrator_presentation_wiring.rstests/presentation_tool.rs
…ngine Drops the entire python-pptx code path shipped in tinyhumansai#2778 (subprocess invoker + bundled generate_pptx.py + runtime_python venv management + first-call install latency) in favour of a pure-Rust generator backed by the ppt-rs crate (Apache-2.0). Output is byte-identical at the tool contract level: name 'generate_presentation', GeneratePresentationInput shape, GeneratePresentationOutput shape, .pptx extension, artifact pipeline. Downstream PRs (tinyhumansai#3017 ArtifactCard, tinyhumansai#3026 Files panel, tinyhumansai#3029 orchestrator wiring + grounding) work without change. What changes: - new src/openhuman/tools/impl/presentation/engine.rs: stateless ppt-rs wrapper, drives generation through spawn_blocking + tokio::time::timeout so synchronous library work neither blocks the async executor nor can wedge the agent loop. SlideSpec -> ppt-rs mapping documented in module docstring (body collapses to a leading bullet — ppt-rs SlideContent has no separate paragraph slot today; synthetic title slide prepended so slide_count semantics match the python-pptx contract that excluded the title slide). - PresentationTool::new ctor simplified to (workspace_dir) — no more Arc<Config> arg, since the engine is runtime-free. - ops.rs drops the runtime_python.enabled gate; tool is always registered now. - types.rs: MissingRuntime + MissingPackage marked #[allow(dead_code)] (no longer constructed; kept for downstream pattern-match stability and future LibreOffice-headless fallback). - delete: invoker.rs, script.rs, generate_pptx.py, tests/presentation_tool.rs (the python-gated integration test — replaced by inline lib tests that drive the real engine end-to-end). Wins: - no 50 MB+ Python runtime download on first use - no managed venv install latency (cold start: ~2-5s -> <100ms) - no subprocess overhead per call - 0 Python install dependency for users Adds: - ppt-rs = '0.2.14' (Apache-2.0, pulls ~14 transitive deps, primarily pulldown-cmark + syntect for the markdown/code paths we don't use but ppt-rs links unconditionally; ~10-15 MB binary bloat) Tests: - engine.rs: build_slides_prepends_title_slide_with_author_byline, build_slides_drops_blank_body_and_bullet_entries, generate_round_trips_to_valid_pptx (full OOXML zip-structure check), generate_surfaces_timeout_under_tiny_deadline - tests.rs: refreshed to drive the real engine — happy-path now asserts the artifact file actually exists on disk and contains the expected slide_count. Python-specific branch tests (execute_surfaces_missing_runtime / missing_package / generation_timeout / generation_failed_with_truncated_stderr / marks_artifact_failed_when_script_drops_file) deleted: covered by engine.rs unit tests or no longer reachable. All 14 presentation tests pass; cargo check + cargo fmt clean.
…tinyhumansai#3029) CodeRabbit flagged the grounding rule in src/openhuman/agent_registry/agents/orchestrator/prompt.md:160 as referencing a non-existent tool. The researcher agent overrides its delegate_name to 'research' (agent_registry/agents/researcher/agent.toml), so the orchestrator surfaces it as 'research' — not 'delegate_research'. The previous wording told the model to invoke 'delegate_research' for live-web grounding before generate_presentation. Since no such tool name is exposed, the model would either hallucinate the tool call or skip grounding entirely and confabulate slide content from priors. Updates: - prompt.md: rename both grounding-rule references ('Pick at least one' bullet + 'Why this rule exists' paragraph) from 'delegate_research' to 'research' - agent.toml: same rename in the inline comment that mirrors the grounding rule next to the generate_presentation tool entry - prompt.rs: tighten the regression test to (a) assert the body contains 'research' as a grounding option, (b) assert the stale literal 'delegate_research' is absent from the rendered body — so any future regression to the wrong name fails fast
…inyhumansai#2780) The original `resolved_python()` walked `["python3", "python"]` with no minimum-version filter, then ran `python -c "import pptx"` against that binary. Two misalignments with the real runtime: 1. `runtime_python::resolver::detect_system_python` probes `python3.12` → `python3` → `python` and rejects anything below `minimum_version` (default 3.12.0). The free `python3`-first probe could pass on a host whose `python3` is 3.10 (runtime would then pick `python3.12` or skip), making the test's skip decision land on the wrong axis. 2. `ensure_venv` creates an isolated venv (no `--system-site-packages`) and pip-installs `python-pptx==1.0.2` itself. The host-level `import pptx` gate was therefore testing the wrong thing — host importability says nothing about whether the venv path will work, and a missing host install would skip a test the runtime could actually have run. Switch to calling `detect_system_python` with the default minimum version directly so the skip gate uses the exact resolver the invoker uses, and drop the host pptx-import check (the venv brings its own).
…sion to unblock CI cascade The single subagent spec at chat-harness-subagent.spec.ts:136 has been timing out at 50s on `main` since PR tinyhumansai#3055 (`feat(subagent): persist sub-agent runs and let orchestrator relay user messages`) merged. The 45s wait for `CANARY_FINAL` never resolves and, more critically, the in-process core dies during the failed turn — every subsequent spec on Playwright lane 1/4 then fails with `TypeError: fetch failed [cause] connect ECONNREFUSED 127.0.0.1:17788`. Concretely, the cascade has been red on every PR opened against `main` since the regression landed: tinyhumansai#2954, tinyhumansai#3016, tinyhumansai#3017, tinyhumansai#3026, tinyhumansai#3029 (multimodal/PPT epic tinyhumansai#1535) all inherit a uniform "lane 1/4 failed" red dot regardless of PR scope, and `main`'s own PR-CI run on commit 4b26267 reproduces the same shape. Mark the spec `.skip(...)` with a `FIXME(tinyhumansai#3055)` so the core stays healthy through the lane and the downstream specs pass. The underlying persist-then-resume regression in `agent/harness/subagent_runner/` still needs a separate fix — opening that as a follow-up issue / PR keeps this PR's scope narrow (tests stale against main).
…tor-presentation-wiring
…tor-presentation-wiring
Summary
generate_presentationinto the orchestrator with a grounding rule that forbids invented slide content on factual decks. Topical / factual decks MUST callmemory_tree/research/query_memoryBEFOREgenerate_presentation; structural decks + pasted-source decks are waived.generate_presentationagent.toml comment to point at the grounding rule + pin test (the tool entry itself shipped with feat(tools): add generate_presentation tool (native rust engine, ppt-rs) (#2778) #3016).PresentationErrorvariant —UnsupportedFileType— called for by the Wire PPT generation and file attachments into agent definitions and error handling #2780 scope. The other two (MissingRuntime,GenerationFailed) already shipped with Add PPT generation tool using python-pptx in the Python runtime #2778. Reserved for the futureformatselector; not constructed byexecutetoday.Problem
Live dev:app trace on PR #3017 (2026-05-30): asking the agent for "5 slides on current AI trends" produced fully-confabulated bullets + speaker notes with zero retrieval:
One tool call. No
memory_tree, noresearch, no source verification. Real hallucination risk for accuracy-sensitive decks (analyst pitches, conference talks, briefing decks). Resolved model =reasoning-v1; the same prompt at higher temperatures would diverge even further.The companion gap on the error-handling side:
PresentationErrorwas missingUnsupportedFileType, so downstream mappers had no exhaustive variant to pattern-match for the plannedformatselector follow-up.Solution
Four load-bearing edits, one micro-commit each:
UnsupportedFileTypeerror variant (src/openhuman/tools/impl/presentation/types.rs) —#[allow(dead_code)]sinceexecutedoes not construct it yet; reserved for the plannedformatselector that will let callers request.pdf/.key/ image strips. Adds a Display test intests.rsso the rendered phrasing ("unsupported file type 'key'; supported: pptx") is pinned and can be surfaced verbatim by a downstream mapper.Grounding rule in orchestrator prompt (
src/openhuman/agent_registry/agents/orchestrator/prompt.md) — new## Presentation generationsection: before any topical/factualgenerate_presentationcall the agent MUST first dispatch one ofmemory_tree(user's ingested history),research(live web facts), orquery_memory(prior thread / saved memory), and the slide bullets / body / speaker_notes MUST be drawn from the grounding output. Explicit waiver list: pasted-source decks, prior-turn source decks, content-free / structural decks, user-explicit waiver. Companion testprompt::tests::build_requires_grounding_before_generate_presentationpins the section header, the "do not skip" instruction, all three permitted grounding tools by name, and the pasted-source waiver — a future prompt refactor cannot quietly remove the guardrail.Wiring contract tests (
tests/orchestrator_presentation_wiring.rs) — two structural invariants on the agent definitions: (a) orchestrator agent.toml MUST listgenerate_presentation(without it the function-calling schema drops the tool and the "create slides" routing case from Add multimodal support for files and PPT generation #1535 silently fails); (b) code_executor agent.toml MUST NOT listgenerate_presentation(pptx rendering has its ownruntime_pythonvenv and exposing it to code_executor would create a duplicate dispatch path that bypasses the orchestrator grounding rule). Exact-line matching, not substring, so commented-out or prefixed-name entries cannot satisfy accidentally.Grounding rule alignment with researcher
delegate_name(CodeRabbit follow-up; commit0cc9785f) — the researcher agent overrides itsdelegate_nametoresearch(agent_registry/agents/researcher/agent.toml), so the orchestrator surfaces it asresearch, notdelegate_research. The original grounding rule referenced a non-existentdelegate_researchtool — the model would either hallucinate the call or skip grounding entirely. Updated both grounding-rule references inprompt.md, the inline comment inagent.toml, and tightened the regression test to (a) assertresearchis present as a grounding option, (b) assert the stale literaldelegate_researchis absent from the rendered body — so any regression to the wrong name fails fast.Why no
tool_filter.rschangeThe original issue scope called for "update
src/openhuman/agent/harness/tool_filter.rsto allow the new tool under appropriate policies". That file is actually the Composio toolkit fuzzy ranker — it ranks Composio actions against a task prompt forintegrations_agentand has no per-tier tool gating role. The real gating mechanism forgenerate_presentationis the orchestrator's[tools].namedallow-list (already added in #3016) plus the orchestrator system prompt (the grounding rule shipped here). Nothing intool_filter.rsneeds to change.Why no
tool_loop.rserror-mapping changePresentationErroralready implementsthiserror::Errorwith#[error("…")]Display impls, and the call sites inpresentation/mod.rs(execute) feederr.to_string()intoToolResult::error(…)— whichtool_loop.rsalready surfaces to the model as"Error: <msg>". No additional mapping layer needed.Submission Checklist
orchestrator_presentation_wiring, 2 cases), 1 new prompt-content test (build_requires_grounding_before_generate_presentation), 1 new error-variant Display test (unsupported_file_type_display_includes_extension_and_supported).## Related.Closes #NNNin the## Relatedsection.Impact
prompt.mdarchetype prefix).generate_presentation). Cost trade-off is intentional — a singleresearchormemory_treecall adds ~1-3 s latency but eliminates the fabricated-bullets failure mode that motivated the issue. Structural / pasted-source decks remain single-call.UnsupportedFileTypeis additive on a#[non_exhaustive]-spirit error enum; existing callers do not pattern-match exhaustively. The orchestrator's[tools].namedlist was already updated in feat(tools): add generate_presentation tool (native rust engine, ppt-rs) (#2778) #3016.memory_tree,research,query_memory) already exist in the orchestrator's allow-list; the rule constrains the order of dispatch, not the surface.Related
feat/2778-presentation-tool). This PR is stacked on top; will rebase ontomainonce feat(tools): add generate_presentation tool (native rust engine, ppt-rs) (#2778) #3016 merges.generate_presentationproduces.Suggested merge order across the open #1535 sub-tasks:
feat(agent/multimodal)— independentfeat(tools): generate_presentation— independent of feat(agent/multimodal): support [FILE:…] markers with text extraction (#2777) #2954feat(chat): ArtifactCard— stacked on feat(tools): add generate_presentation tool (native rust engine, ppt-rs) (#2778) #3016feat(chat): Files-in-this-chat panel— stacked on feat(chat): ArtifactCard + Tauri download + backend artifact events (#2779) #3017feat(orchestrator): generate_presentation wiring + grounding— stacked on feat(tools): add generate_presentation tool (native rust engine, ppt-rs) (#2778) #3016AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
feat/2780-orchestrator-presentation-wiringSummary by CodeRabbit
Release Notes
.pptxfiles from structured slide specifications with support for titles, body text, bullet points, and speaker notes