diff --git a/docs/developer/ray-integration.md b/docs/developer/ray-integration.md
index d850966c..7e05199f 100644
--- a/docs/developer/ray-integration.md
+++ b/docs/developer/ray-integration.md
@@ -4,7 +4,7 @@
 
 When `roar run` executes a Ray workload, the driver process patches Ray startup/shutdown and injects worker instrumentation through Ray `runtime_env`. That worker instrumentation emits per-task I/O fragments (local file + S3 touches with task metadata) to a detached in-cluster collector actor.
 
-At shutdown, the driver gathers collected fragments/events, merges optional proxy/node-agent logs, deduplicates artifacts, reconstructs task ordering from read/write dependencies, and writes Ray task lineage into `ROAR_PROJECT_DIR/.roar/roar.db`.
+At shutdown, the driver flushes the collector actor's fragment streamer. After `ray job submit` returns, `FragmentReconstituter` fetches/decrypts fragment batches from GLaaS and writes Ray task lineage into `ROAR_PROJECT_DIR/.roar/roar.db`.
 
 ## 2. Architecture overview
 
@@ -37,12 +37,9 @@ flowchart TD
 
 - Patching gate: `tracking_import` patches Ray only when `ROAR_WRAP=1` and `ray` is imported.
 - `ray.init` patch (`_patch_ray_init`):
-  - Loads `[ray]` config (`enabled`, `log_dir`, explicit `pip_install`).
+  - Loads `[ray]` config (`enabled`, explicit `pip_install`).
   - Builds `runtime_env` and `env_vars`.
   - Injects worker env vars:
-    - `ROAR_WORKER=1`
-    - `ROAR_LOG_DIR=<configured ray.log_dir or ROAR_LOG_DIR>`
-    - `ROAR_LOG_BACKEND=actor`
     - `ROAR_JOB_ID=<generated or supplied>`
     - `ROAR_DRIVER_JOB_UID=<driver ROAR_JOB_ID>`
     - selected AWS vars passed through when present.
@@ -57,8 +54,8 @@ flowchart TD
     - Sets `runtime_env["worker_process_setup_hook"] = "roar.ray.roar_worker._startup"` and mirrors it via internal env var.
   - Sanitizes reserved setup-hook env key for Ray versions that reject manual export (`_sanitize_worker_runtime_env_for_ray`).
   - After real `ray.init`, registers pre-shutdown collection and ensures `RoarLogCollectorActor` exists.
-- `ray.shutdown` patch (`_patch_ray_shutdown`): collects Ray I/O first (`_collect_ray_io`), then calls real shutdown.
-- `_collect_ray_io`: calls `roar.ray.collector.collect(project_dir, log_dir, proxy_logs)` when `ROAR_WRAP=1`.
+- `ray.shutdown` patch (`_patch_ray_shutdown`): flushes collector actor fragments first (`_collect_ray_io`), then calls real shutdown.
+- `_collect_ray_io`: looks up `roar-log-collector-<job_id>`, calls `flush_to_glaas()`, then kills the detached actor.
 
 ### b. `roar-worker` entrypoint (`roar/ray/roar_worker.py`)
 
@@ -76,30 +73,22 @@ flowchart TD
 - Fragment emission:
   - `TaskFragment` includes Ray IDs, function name, timing, exit code, and `reads`/`writes` of `ArtifactRef`.
   - Local write hashing is streaming (`blake3` if installed, otherwise `sha256`) via `_TrackedWriteFile`.
-  - Local path capture is restricted to `/shared/...` (`_should_track_local_path`).
+- Local path capture excludes pseudo-filesystems (`/proc`, `/sys`, `/dev`).
   - S3 refs use `hash_algorithm="etag"` and size where available.
   - `_emit_fragment()` sends snapshots to `RoarLogCollectorActor.append_fragment.remote(fragment.to_dict())`.
 
 ### c. `RoarLogCollectorActor` (`roar/ray/actor.py`)
 
 - Detached, named actor (`roar-log-collector-<ROAR_JOB_ID>`, namespace `roar`).
-- Aggregation point for worker payloads:
-  - `append_fragment` / `get_all_fragments` for fragment snapshots.
-  - `append_batch` / `get_all` for event batches.
+- Thin pass-through to `GlaasFragmentStreamer`:
+  - `append_fragment` forwards fragments into the encrypted GLaaS batch stream.
+  - `flush_to_glaas` flushes any buffered batches.
 
 ### d. `collector.py`
 
-- Driver-side shutdown collector.
-- Collection order:
-  - `_collect_actor_payload()` first (events + fragments from named actor).
-  - If only fragments are present, can synthesize events (`_events_from_fragments`) and/or write fragments directly (`collect_fragments`).
-  - Falls back to filesystem logs (`*.jsonl` under `ROAR_LOG_DIR`) when actor data is unavailable.
-  - Merges optional node proxy logs (`_merge_proxy_logs`).
-- Dedup/normalization:
-  - Event path rollup (`_aggregate_paths`) deduplicates by path and tracks read/write direction.
-  - Capture method preference is `python < proxy < tracer`.
-  - Keeps max observed size and best available hash.
-  - Fragment artifact upsert prefers `artifact_hashes (algorithm,digest)`; otherwise latest artifact by path.
+- Fragments-only merge path.
+- `collect_fragments(...)` inserts Ray task child jobs + artifact inputs/outputs from decrypted fragment batches.
+- Artifact identity prefers `artifact_hashes (algorithm,digest)` and falls back to latest artifact by path when no digest exists.
 - Step-number topology (`_assign_step_numbers`):
   - Collapses incremental snapshots by `job_uid`.
   - Builds DAG from artifact hash dependencies (producer writes hash, consumer reads hash).
@@ -115,7 +104,7 @@ flowchart TD
 
 ### f. `worker.py`
 
-- Complementary worker setup hook (`setup()`) for event-style logging backends (`actor` or `filesystem`).
+- Legacy compatibility worker setup hook (`setup()`) that forwards event payloads to the collector actor only.
 - Extends coverage beyond `open()` with optional SDK/data patches:
   - boto3 S3 ops
   - pandas parquet writes
@@ -165,18 +154,13 @@ sequenceDiagram
 | `ROAR_WRAP` | env var | driver (`roar run`) | Enables Ray monkey-patching in `sitecustomize.py` when set to `1`. |
 | `ROAR_JOB_ID` | env var | driver + worker env | Ray integration job ID; used in actor naming and task UID derivation. |
 | `ROAR_PROJECT_DIR` | env var | driver | Determines where collector writes (`<project>/.roar/roar.db`) and config lookup start dir. |
-| `ROAR_LOG_DIR` | env var | driver + worker env | Log directory for filesystem fallback; default `/shared/.roar-logs`. |
-| `ROAR_LOG_BACKEND` | env var | worker env | Backend hint (`actor`/`filesystem`); Ray patch sets `actor` for workers. |
-| `ROAR_WORKER` | env var | worker env | Marker that process is a Ray worker under roar instrumentation. |
 | `ROAR_DRIVER_JOB_UID` | env var | worker env | Parent driver job UID stored in Ray task fragments/jobs. |
 | `ray.enabled` | `roar.toml` | `[ray]` | Turns Ray runtime_env injection on/off. |
-| `ray.log_dir` | `roar.toml` | `[ray]` | Default worker log directory for Ray collection fallback. |
 | `ray.pip_install` | `roar.toml` | `[ray]` | When explicitly enabled, injects current `roar` requirement into `runtime_env.pip`. |
 | `ray.actor_attribution` | `roar.toml` | `[ray]` | Fragment boundary mode in worker: `per_call` or `per_actor`. |
 
 ## 6. Known limitations / caveats
 
-- `roar_worker` local file capture is limited to `/shared/...` paths.
 - Local read events from `open()` do not include content hashes by default.
 - S3 hash identity uses ETag; multipart/object semantics can make ETag differ from full-content digest.
 - Fragment emission to actor is best-effort; failures are intentionally swallowed to avoid breaking user workloads.
diff --git a/docs/end-user/ray-integration.md b/docs/end-user/ray-integration.md
index 198fb72e..05b9b87c 100644
--- a/docs/end-user/ray-integration.md
+++ b/docs/end-user/ray-integration.md
@@ -52,8 +52,6 @@ User-facing Ray options in `[ray]`:
 
 - `ray.enabled`
   - Turn Ray tracing on or off.
-- `ray.log_dir`
-  - Set the worker log directory used for fallback collection.
 - `ray.actor_attribution`
   - `per_call` (default): attribute by actor method call.
   - `per_actor`: group attribution by actor.
@@ -64,8 +62,6 @@ Helpful environment variables:
   - Enables Ray wrapping (normally set automatically by `roar run`).
 - `ROAR_PROJECT_DIR`
   - Controls where `.roar/roar.db` is created/read.
-- `ROAR_LOG_DIR`
-  - Overrides worker log directory.
 
 ## 7. Viewing results
 
@@ -98,7 +94,6 @@ ORDER BY j.step_number, j.timestamp, io.kind;
 
 ## 8. Known limitations
 
-- Local file capture is strongest for worker-visible shared paths (commonly `/shared`).
 - Some read events may not include full content hashes.
 - S3 identity is ETag-based, which is not always a full-content digest.
 - If cluster/runtime policies block required `runtime_env` behavior, tracing may be partial.
diff --git a/pyproject.toml b/pyproject.toml
index d7863872..22770c01 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "maturin"
 
 [project]
 name = "roar-cli"
-version = "0.2.8"
+version = "0.2.9"
 description = "Reproducibility and provenance tracker for ML training pipelines"
 authors = [
     { name="TReqs Team", email="info@treqs.ai" }
@@ -81,6 +81,7 @@ include = [
     { path = "LICENSE", format = "wheel" },
     { path = "roar/bin/*", format = "sdist" },
     { path = "roar/bin/*", format = "wheel" },
+    { path = "roar_inject.pth", format = "wheel" },
 ]
 
 [tool.pytest.ini_options]
@@ -97,6 +98,8 @@ markers = [
     "cloud: Tests for cloud storage operations",
     "happy_path: Happy path tests for core functionality",
     "ray_e2e: Ray end-to-end tests requiring a running Docker cluster",
+    "ray_contract: User-facing Ray contract tests using `roar run ray job submit ...`",
+    "ray_diagnostic: Diagnostic Ray tests that intentionally inspect internal runtime details",
 ]
 addopts = "-v --strict-markers -n auto --dist loadfile --ignore=tests/ebpf --ignore=tests/live_glaas --ignore=tests/benchmarks --ignore=tests/integration --ignore=tests/e2e"
 timeout = 60
diff --git a/roar/bin/libroar_tracer_preload.so b/roar/bin/libroar_tracer_preload.so
deleted file mode 100755
index 6eebc055..00000000
Binary files a/roar/bin/libroar_tracer_preload.so and /dev/null differ
diff --git a/roar/cli/commands/_execution.py b/roar/cli/commands/_execution.py
index e1d9e9fa..2a411804 100644
--- a/roar/cli/commands/_execution.py
+++ b/roar/cli/commands/_execution.py
@@ -48,7 +48,7 @@ def validate_git_clean() -> str:
     except (subprocess.CalledProcessError, FileNotFoundError):
         # Ray jobs run from extracted working dirs that are not git repos.
         # Allow execution there while preserving git checks everywhere else.
-        if "RAY_JOB_CONFIG_JSON_ENV_VAR" in os.environ:
+        if "RAY_JOB_ID" in os.environ:
             return cwd
         raise click.ClickException(
             "roar requires the working directory to be inside a git repository."
diff --git a/roar/cli/commands/_ray_job_submit.py b/roar/cli/commands/_ray_job_submit.py
index 79ddeddd..c96d8983 100644
--- a/roar/cli/commands/_ray_job_submit.py
+++ b/roar/cli/commands/_ray_job_submit.py
@@ -10,6 +10,13 @@
 from ...glaas_client import GlaasClient
 from ...ray.fragment_key import generate_fragment_key, save_key
 
+_ROAR_WORKER_PY_EXECUTABLE = "roar-worker"
+_ROAR_WORKER_SETUP_HOOK = "roar.ray.roar_worker._startup"
+_ROAR_DRIVER_ENTRYPOINT_MODULE = "roar.ray.driver_entrypoint"
+_ROAR_JOB_INSTRUMENTED_ENV_VAR = "ROAR_JOB_INSTRUMENTED"
+_ROAR_CLUSTER_GLAAS_URL_ENV = "ROAR_CLUSTER_GLAAS_URL"
+_ROAR_CLUSTER_AWS_ENDPOINT_URL_ENV = "ROAR_CLUSTER_AWS_ENDPOINT_URL"
+
 
 @dataclass(frozen=True)
 class RayJobSubmitRewrite:
@@ -30,6 +37,7 @@ def maybe_rewrite_ray_job_submit(command: list[str]) -> RayJobSubmitRewrite:
     entrypoint = list(command[separator_index + 1 :])
     if not entrypoint:
         return RayJobSubmitRewrite(command=command)
+    entrypoint = _wrap_entrypoint_for_driver_proxy(entrypoint)
 
     runtime_env_json_arg = _find_runtime_env_json(before_separator)
     runtime_env = _load_runtime_env(before_separator, runtime_env_json_arg)
@@ -38,16 +46,54 @@ def maybe_rewrite_ray_job_submit(command: list[str]) -> RayJobSubmitRewrite:
         return RayJobSubmitRewrite(command=command)
 
     merged_pip = _merge_roar_runtime_env_pip(runtime_env.get("pip"))
-    if merged_pip or ("pip" in runtime_env and merged_pip is not None):
+    if merged_pip:
         runtime_env["pip"] = merged_pip
+    else:
+        runtime_env.pop("pip", None)
+
+    # py_executable is intentionally NOT set at job level — it would apply to the
+    # JobSupervisor/driver process which doesn't have roar installed yet (pip runs after
+    # the supervisor starts). worker_process_setup_hook is sufficient: it runs inside
+    # each worker process after the runtime env (and pip) is ready.
+    runtime_env["worker_process_setup_hook"] = _ROAR_WORKER_SETUP_HOOK
 
     env_vars = dict(runtime_env.get("env_vars", {}) or {})
+    env_vars[_ROAR_JOB_INSTRUMENTED_ENV_VAR] = "1"
+    env_vars["ROAR_WRAP"] = "1"
+    env_vars["ROAR_RAY_NODE_AGENTS"] = "1"
+    # Stable job_id shared by driver + workers for node agent name resolution.
+    import uuid as _uuid
+
+    env_vars["ROAR_JOB_ID"] = _uuid.uuid4().hex[:8]
+
+    # Tell the job driver where roar.db lives (CWD inside the Ray job is the
+    # extracted working_dir, not the original project directory).
+    roar_dir = os.environ.get("ROAR_PROJECT_DIR", "")
+    if not roar_dir:
+        # Use CWD — `roar run` is invoked from the project root.
+        roar_dir = os.getcwd()
+    if roar_dir and os.path.isfile(os.path.join(roar_dir, ".roar", "roar.db")):
+        env_vars["ROAR_PROJECT_DIR"] = roar_dir
+
+    # Route S3 traffic through the per-node proxy (port 19191) for I/O capture.
+    # Save the REAL upstream endpoint (not the roar-run local proxy) so the
+    # cluster-side proxy can forward to the actual S3 service.
+    original_endpoint = (
+        os.environ.get("ROAR_UPSTREAM_S3_ENDPOINT") or os.environ.get("AWS_ENDPOINT_URL") or ""
+    )
+    cluster_upstream_endpoint = _resolve_cluster_upstream_s3_endpoint(original_endpoint)
+    if cluster_upstream_endpoint:
+        env_vars["ROAR_UPSTREAM_S3_ENDPOINT"] = cluster_upstream_endpoint
+    env_vars["ROAR_PROXY_PORT"] = "19191"
+    env_vars["AWS_ENDPOINT_URL"] = "http://127.0.0.1:19191"
+
     fragment_session_id: str | None = None
 
     glaas_url = _resolve_glaas_url()
     if glaas_url:
-        env_vars["GLAAS_URL"] = glaas_url
-        env_vars["GLAAS_API_URL"] = glaas_url
+        cluster_glaas_url = _resolve_cluster_glaas_url(glaas_url)
+        if cluster_glaas_url:
+            env_vars["GLAAS_URL"] = cluster_glaas_url
 
         key = generate_fragment_key()
         try:
@@ -67,7 +113,6 @@ def maybe_rewrite_ray_job_submit(command: list[str]) -> RayJobSubmitRewrite:
         runtime_env.pop("env_vars", None)
 
     before_separator = _store_runtime_env(before_separator, runtime_env, runtime_env_json_arg)
-    entrypoint = _wrap_entrypoint(entrypoint)
     return RayJobSubmitRewrite(
         command=[*before_separator, "--", *entrypoint],
         session_id=fragment_session_id,
@@ -84,6 +129,17 @@ def _is_ray_job_submit(command: list[str]) -> bool:
     return binary == "ray" and noun in {"job", "jobs"} and verb == "submit"
 
 
+def _wrap_entrypoint_for_driver_proxy(entrypoint: list[str]) -> list[str]:
+    if (
+        len(entrypoint) >= 3
+        and entrypoint[1] == "-m"
+        and entrypoint[2] == _ROAR_DRIVER_ENTRYPOINT_MODULE
+    ):
+        return entrypoint
+
+    return ["python", "-m", _ROAR_DRIVER_ENTRYPOINT_MODULE, "--", *entrypoint]
+
+
 def _find_runtime_env_json(command: list[str]) -> tuple[int, int | None] | None:
     for index, arg in enumerate(command):
         if arg == "--runtime-env-json":
@@ -137,27 +193,23 @@ def _store_runtime_env(
     return command_out
 
 
-def _wrap_entrypoint(entrypoint: list[str]) -> list[str]:
-    if len(entrypoint) >= 2 and Path(entrypoint[0]).name == "roar" and entrypoint[1] == "run":
-        return entrypoint
-    return ["roar", "run", *entrypoint]
-
-
 def _merge_roar_runtime_env_pip(existing_pip: object) -> list[str] | None:
     roar_req = _resolve_roar_requirement()
-    if roar_req is None:
-        # Local dev mode: vendor wheel present means cluster has roar pre-installed.
-        # Skip pip injection — preserve existing pip list unchanged.
-        existing = _coerce_runtime_env_pip(existing_pip)
-        return existing if existing else None
     dependencies = _coerce_runtime_env_pip(existing_pip)
     dependencies = [
         dependency
         for dependency in dependencies
         if _requirement_name(dependency) not in {"roar-cli", "roar"}
+        # Also deduplicate URL-based requirements (e.g. presigned S3 URLs).
+        # _requirement_name() returns the full URL for these, so the name-based
+        # filter above never matches them — without this check the URL would
+        # survive the filter and get appended again, producing duplicates.
+        and dependency.strip() != roar_req.strip()
     ]
-    dependencies.append(roar_req)
-    return dependencies
+    # "skip" means roar is already installed on workers (e.g. Docker image with editable install).
+    if roar_req != "skip":
+        dependencies.append(roar_req)
+    return dependencies if dependencies else None
 
 
 def _coerce_runtime_env_pip(value: object) -> list[str]:
@@ -183,24 +235,23 @@ def _requirement_name(requirement: str) -> str:
     return text.strip().lower()
 
 
-def _resolve_roar_requirement() -> str | None:
-    import os
+def _resolve_roar_requirement() -> str:
+    import importlib.metadata as importlib_metadata
 
-    wheel_path = Path(os.getcwd()) / "vendor" / "roar-cli.whl"
-    if wheel_path.exists():
-        # Local dev mode: vendor wheel exists, cluster has roar pre-installed.
-        # Signal to skip pip injection entirely.
-        return None
+    # Allow overriding the pip requirement — useful for testing unreleased wheels via S3 URL
+    # without a PyPI publish. Set ROAR_CLUSTER_PIP_REQ=https://... in the environment.
+    override = os.environ.get("ROAR_CLUSTER_PIP_REQ", "").strip()
+    if override:
+        return override
 
-    import importlib.metadata as importlib_metadata
+    try:
+        version = importlib_metadata.version("roar-cli")
+        return f"roar-cli=={version}"
+    except importlib_metadata.PackageNotFoundError:
+        pass
+    except Exception:
+        pass
 
-    for package_name in ("roar-cli", "roar"):
-        try:
-            return f"{package_name}=={importlib_metadata.version(package_name)}"
-        except importlib_metadata.PackageNotFoundError:
-            continue
-        except Exception:
-            break
     return "roar-cli"
 
 
@@ -209,10 +260,28 @@ def _resolve_glaas_url() -> str | None:
 
     url = get_glaas_url()
     if not url:
-        return None
+        return "https://api.glaas.ai"
     return str(url)
 
 
+def _resolve_cluster_glaas_url(host_glaas_url: str | None) -> str | None:
+    override = os.environ.get(_ROAR_CLUSTER_GLAAS_URL_ENV, "").strip()
+    if override:
+        return override
+    if not host_glaas_url:
+        return None
+    return str(host_glaas_url)
+
+
+def _resolve_cluster_upstream_s3_endpoint(host_endpoint: str | None) -> str | None:
+    override = os.environ.get(_ROAR_CLUSTER_AWS_ENDPOINT_URL_ENV, "").strip()
+    if override:
+        return override
+    if not host_endpoint:
+        return None
+    return str(host_endpoint)
+
+
 def _register_fragment_session(
     glaas_url: str,
     session_id: str,
diff --git a/roar/cli/commands/init.py b/roar/cli/commands/init.py
index 9e42f226..2927b433 100644
--- a/roar/cli/commands/init.py
+++ b/roar/cli/commands/init.py
@@ -119,8 +119,6 @@
 enabled = true
 # Inject roar-cli into runtime_env.pip for remote workers
 pip_install = true
-# Shared log directory for Ray worker I/O capture
-log_dir = "/shared/.roar-logs"
 # Actor attribution mode for Ray actor methods (per_call | per_actor)
 actor_attribution = "per_call"
 
diff --git a/roar/cli/commands/register.py b/roar/cli/commands/register.py
index fbe4716b..e7e5fb28 100644
--- a/roar/cli/commands/register.py
+++ b/roar/cli/commands/register.py
@@ -1,9 +1,9 @@
 """
 Native Click implementation of the register command.
 
-Usage: roar register [options] <artifact_path>
+Usage: roar register [options] <target>
 
-Registers an artifact and its complete lineage with GLaaS.
+Registers artifact, step, or session lineage with GLaaS.
 """
 
 import click
@@ -25,7 +25,7 @@ def _confirm_secrets(detected_secrets: list[str]) -> bool:
 
 
 @click.command("register")
-@click.argument("artifact_path", type=click.STRING)
+@click.argument("target", type=click.STRING)
 @click.option(
     "--dry-run",
     is_flag=True,
@@ -44,15 +44,15 @@ def _confirm_secrets(detected_secrets: list[str]) -> bool:
 )
 @click.pass_obj
 @require_init
-def register(
-    ctx: RoarContext, artifact_path: str, dry_run: bool, yes: bool, as_blake3: bool
-) -> None:
-    """Register artifact lineage with GLaaS.
+def register(ctx: RoarContext, target: str, dry_run: bool, yes: bool, as_blake3: bool) -> None:
+    """Register lineage with GLaaS.
 
-    Submits the complete lineage of an artifact to the GLaaS server,
-    including all jobs and artifacts in the dependency chain.
+    Submits lineage to the GLaaS server, starting from one of:
+    - an artifact path
+    - a DAG step reference like ``@4``
+    - a local session hash/prefix previously shown by roar
 
-    The ARTIFACT_PATH must be a file that has been tracked by roar run.
+    Artifact paths must refer to files tracked by roar.
 
     If secrets are detected in the data (API keys, tokens, passwords, etc.),
     you will be prompted to confirm. Use --yes to skip the prompt and
@@ -67,6 +67,10 @@ def register(
 
         roar register -y model.pt           # Skip confirmation prompt
 
+        roar register @4                    # Register the lineage for DAG step 4
+
+        roar register 8d7a1f2c...           # Register a whole local session
+
         roar register --as-blake3 model.pt  # Upgrade S3 etag hashes
 
         roar register outputs/metrics.json  # Register from subdirectory
@@ -74,9 +78,9 @@ def register(
     # Create service
     service = RegisterService()
 
-    # Register the artifact lineage
-    result = service.register_artifact_lineage(
-        artifact_path=artifact_path,
+    # Register the requested lineage target
+    result = service.register_lineage_target(
+        target=target,
         roar_dir=ctx.roar_dir,
         cwd=ctx.cwd,
         dry_run=dry_run,
@@ -105,9 +109,10 @@ def register(
         click.echo("")
         click.echo("View on GLaaS:")
         click.echo(f"  Session:  {web_url}/dag/{result.session_hash}")
-        click.echo(f"  Artifact: {web_url}/artifact/{result.artifact_hash}")
+        if result.artifact_hash:
+            click.echo(f"  Artifact: {web_url}/artifact/{result.artifact_hash}")
     else:
-        click.echo(f"Registered lineage for: {artifact_path}")
+        click.echo(f"Registered lineage for: {target}")
         click.echo(f"  Session: {result.session_hash[:12]}...")
         click.echo(f"  Jobs: {result.jobs_registered}")
         click.echo(f"  Artifacts: {result.artifacts_registered}")
@@ -122,12 +127,13 @@ def register(
             for error in result.error.split("; "):
                 click.echo(f"  - {error}", err=True)
 
-        # Print reproduce command
-        click.echo("")
-        click.echo("To reproduce this artifact:")
-        click.echo(f"  roar reproduce {result.artifact_hash}")
+        if result.artifact_hash:
+            click.echo("")
+            click.echo("To reproduce this artifact:")
+            click.echo(f"  roar reproduce {result.artifact_hash}")
 
         click.echo("")
         click.echo("View on GLaaS:")
         click.echo(f"  Session:  {web_url}/dag/{result.session_hash}")
-        click.echo(f"  Artifact: {web_url}/artifact/{result.artifact_hash}")
+        if result.artifact_hash:
+            click.echo(f"  Artifact: {web_url}/artifact/{result.artifact_hash}")
diff --git a/roar/cli/context.py b/roar/cli/context.py
index 4918c254..46847120 100644
--- a/roar/cli/context.py
+++ b/roar/cli/context.py
@@ -7,6 +7,7 @@
 
 from __future__ import annotations
 
+import os
 import sys
 from dataclasses import dataclass, field
 from pathlib import Path
@@ -137,5 +138,12 @@ def is_initialized(self) -> bool:
 
     @property
     def has_repo(self) -> bool:
-        """Check if we're in a git repository."""
-        return self.repo_root is not None
+        """Check if we're in a git repository.
+
+        Returns True when ROAR_JOB_INSTRUMENTED=1 (Ray job driver running
+        from an extracted working_dir that is not a git repo).
+        """
+        if self.repo_root is not None:
+            return True
+        # Ray job drivers run from extracted working dirs without .git
+        return os.environ.get("ROAR_JOB_INSTRUMENTED") == "1"
diff --git a/roar/cli/decorators.py b/roar/cli/decorators.py
index 85ef07cc..642ae41d 100644
--- a/roar/cli/decorators.py
+++ b/roar/cli/decorators.py
@@ -79,7 +79,9 @@ def wrapper(*args: Any, **kwargs: Any) -> Any:
 
         initialized = bool(ctx.is_initialized) or _has_roar_dir(ctx)
 
-        if not initialized and "RAY_JOB_CONFIG_JSON_ENV_VAR" in os.environ:
+        if not initialized and (
+            "RAY_JOB_ID" in os.environ or os.environ.get("ROAR_JOB_INSTRUMENTED") == "1"
+        ):
             try:
                 _auto_init_for_ray_job(ctx)
             except Exception as e:
diff --git a/roar/config.py b/roar/config.py
index d9a8bbb6..644ea721 100644
--- a/roar/config.py
+++ b/roar/config.py
@@ -138,11 +138,6 @@
         "default": True,
         "description": "Inject roar-cli into Ray runtime_env.pip for remote workers",
     },
-    "ray.log_dir": {
-        "type": str,
-        "default": "/shared/.roar-logs",
-        "description": "Shared Ray worker log directory for roar collection",
-    },
     "ray.actor_attribution": {
         "type": str,
         "default": "per_call",
diff --git a/roar/core/interfaces/repositories.py b/roar/core/interfaces/repositories.py
index 751cdc32..fdf68c9b 100644
--- a/roar/core/interfaces/repositories.py
+++ b/roar/core/interfaces/repositories.py
@@ -149,6 +149,10 @@ def get(self, session_id: int) -> dict[str, Any] | None:
         """Get session by ID."""
         ...
 
+    def get_all(self) -> list[dict[str, Any]]:
+        """Get all sessions."""
+        ...
+
     def get_by_hash_prefix(self, hash_prefix: str) -> dict[str, Any] | None:
         """Get first session matching a hash prefix."""
         ...
diff --git a/roar/core/models/config.py b/roar/core/models/config.py
index 32f3fc49..08f53960 100644
--- a/roar/core/models/config.py
+++ b/roar/core/models/config.py
@@ -184,7 +184,6 @@ class RayConfig(ConfigBaseModel):
 
     enabled: bool = True
     pip_install: bool = True
-    log_dir: str = "/shared/.roar-logs"
     actor_attribution: Literal["per_call", "per_actor"] = "per_call"
 
 
diff --git a/roar/db/repositories/session.py b/roar/db/repositories/session.py
index 0295e690..c3bc732b 100644
--- a/roar/db/repositories/session.py
+++ b/roar/db/repositories/session.py
@@ -11,12 +11,20 @@
 from typing import Any
 
 import blake3
-from sqlalchemy import delete, func, select, update
+from sqlalchemy import case, delete, func, select, update
 from sqlalchemy.orm import Session as SASession
 
 from ...core.interfaces.repositories import SessionRepository
 from ..models import Job, Session
 
+_STEP_NOISE_COMMANDS = (
+    "ray_task:unknown",
+    "ray_task:__init__",
+    "ray_task:s3_proxy",
+    "ray_task:s3_driver_proxy",
+    "ray_task:RoarNodeAgent.__init__",
+)
+
 
 class SQLAlchemySessionRepository(SessionRepository):
     """
@@ -212,6 +220,17 @@ def get(self, session_id: int) -> dict[str, Any] | None:
         session = self._session.get(Session, session_id)
         return self._session_to_dict(session) if session else None
 
+    def get_all(self) -> list[dict[str, Any]]:
+        """Get all sessions ordered by most recent first."""
+        sessions = (
+            self._session.execute(
+                select(Session).order_by(Session.created_at.desc(), Session.id.desc())
+            )
+            .scalars()
+            .all()
+        )
+        return [self._session_to_dict(session) for session in sessions]
+
     def get_by_hash(self, session_hash: str) -> dict[str, Any] | None:
         """
         Get a session by its content hash.
@@ -313,9 +332,30 @@ def get_step_by_number(
                 .where(
                     Job.session_id == session_id,
                     Job.step_number == step_number,
-                    Job.job_type.is_(None) | (Job.job_type == "run"),
+                    Job.job_type.is_(None) | (Job.job_type != "build"),
+                )
+                .order_by(
+                    case(
+                        (Job.job_type.is_(None), 6),
+                        (Job.job_type == "run", 6),
+                        (
+                            Job.command.is_not(None)
+                            & (~Job.command.in_(_STEP_NOISE_COMMANDS))
+                            & Job.parent_job_uid.is_not(None)
+                            & Job.script.is_not(None)
+                            & (~Job.script.like("%.%")),
+                            5,
+                        ),
+                        (
+                            Job.command.is_not(None) & (~Job.command.in_(_STEP_NOISE_COMMANDS)),
+                            4,
+                        ),
+                        (Job.command.in_(_STEP_NOISE_COMMANDS), 1),
+                        else_=2,
+                    ).desc()
                 )
                 .order_by(Job.timestamp.desc())
+                .order_by(Job.id.desc())
                 .limit(1)
             )
         job = self._session.execute(query).scalar_one_or_none()
@@ -605,6 +645,7 @@ def _job_to_dict(self, job: Job) -> dict[str, Any]:
         return {
             "id": job.id,
             "job_uid": job.job_uid,
+            "parent_job_uid": job.parent_job_uid,
             "timestamp": job.timestamp,
             "command": job.command,
             "script": job.script,
diff --git a/roar/glaas/auth.py b/roar/glaas/auth.py
index 43232612..71271123 100644
--- a/roar/glaas/auth.py
+++ b/roar/glaas/auth.py
@@ -20,9 +20,9 @@ def get_glaas_url() -> str | None:
     """Get GLaaS server URL from config or environment."""
     from ..config import config_get
 
-    url = config_get("glaas.url")
+    url = os.environ.get("GLAAS_URL")
     if not url:
-        url = os.environ.get("GLAAS_URL")
+        url = config_get("glaas.url")
     return url
 
 
diff --git a/roar/presenters/dag_data_builder.py b/roar/presenters/dag_data_builder.py
index 0ffc6c82..fa21c5bd 100644
--- a/roar/presenters/dag_data_builder.py
+++ b/roar/presenters/dag_data_builder.py
@@ -4,6 +4,14 @@
 
 from typing import Any
 
+_STEP_NOISE_COMMANDS = {
+    "ray_task:unknown",
+    "ray_task:__init__",
+    "ray_task:s3_proxy",
+    "ray_task:s3_driver_proxy",
+    "ray_task:RoarNodeAgent.__init__",
+}
+
 
 class DagDataBuilder:
     """Build DAG visualization data from session."""
@@ -115,6 +123,41 @@ def _group_by_step_number(self, steps: list[dict]) -> dict[int, list[dict]]:
 
         return steps_by_number
 
+    def _step_sort_key(self, step: dict) -> tuple[int, float, int]:
+        job_type = step.get("job_type")
+        command = str(step.get("command") or "")
+        script = str(step.get("script") or "")
+        parent_job_uid = str(step.get("parent_job_uid") or "")
+        is_phase_wrapper = (
+            command.startswith("ray_task:")
+            and command not in _STEP_NOISE_COMMANDS
+            and bool(parent_job_uid)
+            and bool(script)
+            and "." not in script
+        )
+        if job_type in (None, "run"):
+            priority = 6
+        elif is_phase_wrapper:
+            priority = 5
+        elif command and command not in _STEP_NOISE_COMMANDS:
+            priority = 4
+        elif command in _STEP_NOISE_COMMANDS:
+            priority = 1
+        else:
+            priority = 2
+        return (
+            priority,
+            float(step.get("timestamp") or 0.0),
+            int(step.get("id") or 0),
+        )
+
+    def _representative_steps(self, steps_by_number: dict[int, list[dict]]) -> dict[int, dict]:
+        return {
+            num: max(group, key=self._step_sort_key)
+            for num, group in steps_by_number.items()
+            if group
+        }
+
     def _select_steps(
         self,
         steps_by_number: dict[int, list[dict]],
@@ -124,13 +167,14 @@ def _select_steps(
         """Pick which steps to show based on expanded mode."""
         if expanded:
             return steps
-        return [steps_by_number[num][-1] for num in sorted(steps_by_number.keys())]
+        representative_by_step = self._representative_steps(steps_by_number)
+        return [representative_by_step[num] for num in sorted(representative_by_step.keys())]
 
     def _collect_artifacts(
         self, steps_by_number: dict[int, list[dict]]
     ) -> tuple[dict[str, dict], dict[str, list[str]]]:
         """Build all_artifacts and artifacts_by_path from latest step outputs."""
-        latest_by_step: dict[int, dict] = {num: group[-1] for num, group in steps_by_number.items()}
+        latest_by_step = self._representative_steps(steps_by_number)
 
         all_artifacts: dict[str, dict] = {}
         artifacts_by_path: dict[str, list[str]] = {}
@@ -179,7 +223,7 @@ def _collect_consumers(
         all_artifacts: dict[str, dict],
     ) -> dict[str, list[int]]:
         """Build consumer relationships from step inputs."""
-        latest_by_step: dict[int, dict] = {num: group[-1] for num, group in steps_by_number.items()}
+        latest_by_step = self._representative_steps(steps_by_number)
 
         artifact_consumers: dict[str, list[int]] = {}
 
@@ -206,7 +250,7 @@ def _build_nodes(
         expanded: bool,
     ) -> list[dict]:
         """Build node data for each step to show."""
-        latest_by_step: dict[int, dict] = {num: group[-1] for num, group in steps_by_number.items()}
+        latest_by_step = self._representative_steps(steps_by_number)
         nodes = []
 
         for step in steps_to_show:
diff --git a/roar/ray/actor.py b/roar/ray/actor.py
index af2b3ebf..7e808bf5 100644
--- a/roar/ray/actor.py
+++ b/roar/ray/actor.py
@@ -13,10 +13,7 @@ def __init__(
         token: str | None = None,
         glaas_url: str | None = None,
     ) -> None:
-        self._events: list[dict] = []
-        self._fragments: list[dict] = []
         self._streamer: GlaasFragmentStreamer | None = None
-
         if session_id and token and glaas_url:
             self._streamer = GlaasFragmentStreamer(
                 session_id=session_id,
@@ -24,20 +21,17 @@ def __init__(
                 glaas_url=glaas_url,
             )
 
-    def append_batch(self, events: list[dict]) -> None:
-        self._events.extend(events)
+    def ping(self) -> bool:
+        return True
 
-    def get_all(self) -> list[dict]:
-        return list(self._events)
+    def append_batch(self, _events: list[dict]) -> None:
+        # Deprecated compatibility shim for legacy worker hooks.
+        return None
 
     def append_fragment(self, fragment: dict) -> None:
-        self._fragments.append(fragment)
         if self._streamer is not None:
             self._streamer.append_fragment(fragment)
 
-    def get_all_fragments(self) -> list[dict]:
-        return list(self._fragments)
-
     def flush_to_glaas(self) -> bool:
         if self._streamer is None:
             return True
diff --git a/roar/ray/collector.py b/roar/ray/collector.py
index 8b675608..e8419677 100644
--- a/roar/ray/collector.py
+++ b/roar/ray/collector.py
@@ -2,9 +2,9 @@
 roar Ray log collector.
 
 Called from the driver process atexit handler (when ROAR_WRAP=1).
-Collects worker events from a Ray actor aggregator when available, with
-filesystem JSONL logs as a fallback. De-duplicates paths and inserts
-artifact + job_input/output rows into ROAR_PROJECT_DIR/.roar/roar.db.
+In Phase 2, Ray lineage is fragments-only:
+FragmentReconstituter fetches encrypted fragment batches from GLaaS and this
+module merges those fragments into ROAR_PROJECT_DIR/.roar/roar.db.
 """
 
 from __future__ import annotations
@@ -12,19 +12,14 @@
 import json
 import os
 import sqlite3
+import sys
 import time
 import uuid
 from collections import deque
-from contextlib import suppress
 from pathlib import Path
 from typing import Any
 
 from roar.ray.fragment import ArtifactRef, TaskFragment
-from roar.services.execution.proxy import parse_log_line
-
-_READ_OPS = frozenset({"GetObject"})
-_WRITE_OPS = frozenset({"PutObject", "CompleteMultipartUpload"})
-_CAPTURE_PRIORITY = {"python": 0, "proxy": 1, "tracer": 2}
 
 
 def _get_logger():
@@ -33,134 +28,172 @@ def _get_logger():
     return get_logger()
 
 
-def collect(
-    project_dir: str | None = None,
-    log_dir: str | None = None,
-    proxy_logs: dict[str, dict[str, Any]] | None = None,
-) -> None:
-    """
-    Collect Ray worker I/O logs and write to the roar SQLite database.
+def _apply_reconstitution_filters(
+    fragments: list[TaskFragment],
+    *,
+    project_dir: str,
+) -> list[TaskFragment]:
+    try:
+        from roar.config import load_config
+        from roar.services.execution.provenance.file_filter import (
+            FileFilterService,
+            _get_editable_install_dirs,
+        )
+    except Exception:
+        return [fragment for fragment in fragments if _should_keep_fragment(fragment)]
+
+    config = load_config(start_dir=project_dir)
+    filters_config = config.get("filters", {}) if isinstance(config, dict) else {}
+    cleanup_config = config.get("cleanup", {}) if isinstance(config, dict) else {}
+    ignore_system_reads = bool(filters_config.get("ignore_system_reads", True))
+    ignore_package_reads = bool(filters_config.get("ignore_package_reads", True))
+    ignore_torch_cache = bool(filters_config.get("ignore_torch_cache", True))
+    ignore_tmp_files = bool(filters_config.get("ignore_tmp_files", True))
+    if bool(cleanup_config.get("delete_tmp_writes", False)):
+        ignore_tmp_files = False
+
+    filter_service = FileFilterService()
+    editable_dirs = _get_editable_install_dirs()
+    sys_prefix = sys.prefix
+    sys_base_prefix = sys.base_prefix
+
+    filtered: list[TaskFragment] = []
+    for fragment in fragments:
+        for ref in [*fragment.reads, *fragment.writes]:
+            ref.path = _normalize_reconstituted_path(str(ref.path or ""), project_dir=project_dir)
+        fragment.reads = [
+            ref
+            for ref in fragment.reads
+            if _should_include_ref(
+                ref,
+                kind="read",
+                filter_service=filter_service,
+                ignore_system_reads=ignore_system_reads,
+                ignore_package_reads=ignore_package_reads,
+                ignore_torch_cache=ignore_torch_cache,
+                ignore_tmp_files=ignore_tmp_files,
+                sys_prefix=sys_prefix,
+                sys_base_prefix=sys_base_prefix,
+                editable_dirs=editable_dirs,
+            )
+        ]
+        fragment.writes = [
+            ref
+            for ref in fragment.writes
+            if _should_include_ref(
+                ref,
+                kind="write",
+                filter_service=filter_service,
+                ignore_system_reads=ignore_system_reads,
+                ignore_package_reads=ignore_package_reads,
+                ignore_torch_cache=ignore_torch_cache,
+                ignore_tmp_files=ignore_tmp_files,
+                sys_prefix=sys_prefix,
+                sys_base_prefix=sys_base_prefix,
+                editable_dirs=editable_dirs,
+            )
+        ]
+        if _should_keep_fragment(fragment):
+            filtered.append(fragment)
+    return filtered
 
-    Args:
-        project_dir: Directory containing the .roar/ subdirectory.
-                     Defaults to ROAR_PROJECT_DIR env var, then "/app".
-        log_dir:     Directory where worker JSONL logs were written.
-                     Defaults to ROAR_LOG_DIR env var, then "/shared/.roar-logs".
-    """
-    if project_dir is None:
-        project_dir = os.environ.get("ROAR_PROJECT_DIR", "/app")
-    if log_dir is None:
-        log_dir = os.environ.get("ROAR_LOG_DIR", "/shared/.roar-logs")
 
-    db_path = os.path.join(project_dir, ".roar", "roar.db")
-    log_path = Path(log_dir)
+def _should_keep_fragment(fragment: TaskFragment) -> bool:
+    if fragment.reads or fragment.writes:
+        return True
+    try:
+        return float(fragment.ended_at) > float(fragment.started_at)
+    except (TypeError, ValueError):
+        return False
 
-    if not os.path.exists(db_path):
-        return  # roar not initialised; nothing to do
 
-    actor_events: list[dict[str, Any]] | None = None
-    actor_fragments: list[dict[str, Any]] = []
-    actor_payload = _collect_actor_payload()
-    if actor_payload is not None:
-        actor_events, actor_fragments = actor_payload
+def _normalize_reconstituted_path(path: str, *, project_dir: str) -> str:
+    if not path or path.startswith("s3://"):
+        return path
 
-    if actor_fragments:
-        session_id, base_step = _resolve_active_session_context(db_path)
-        collect_fragments(
-            actor_fragments,
-            project_dir=project_dir,
-            driver_job_uid=os.environ.get("ROAR_JOB_ID"),
-            session_id=session_id,
-            step_number=base_step,
-        )
-        _consume_filesystem_logs(log_path)
-        return
+    normalized = os.path.normpath(path)
+    marker = f"{os.sep}runtime_resources{os.sep}working_dir_files{os.sep}"
+    if marker not in normalized:
+        return path
 
-    task_events = _collect_events(log_path, actor_events=actor_events)
-    _merge_proxy_logs(task_events, proxy_logs or {})
-    if not task_events:
-        return
+    packaged_suffix = normalized.split(marker, 1)[1]
+    packaged_parts = Path(packaged_suffix).parts
+    if len(packaged_parts) < 2 or not packaged_parts[0].startswith("_ray_pkg_"):
+        return path
 
-    path_info = _aggregate_paths(task_events)
-    if not path_info:
-        return
+    restored = Path(project_dir).joinpath(*packaged_parts[1:])
+    return str(restored.resolve(strict=False))
 
-    conn = sqlite3.connect(db_path)
-    conn.row_factory = sqlite3.Row
-    conn.execute("PRAGMA foreign_keys = ON")
-    conn.execute("PRAGMA journal_mode = WAL")
 
-    try:
-        now = time.time()
-        job_id = _create_ray_job(conn, now)
-        artifact_columns = {
-            row["name"] for row in conn.execute("PRAGMA table_info(artifacts)").fetchall()
-        }
+def _should_include_ref(
+    ref: ArtifactRef,
+    *,
+    kind: str,
+    filter_service,
+    ignore_system_reads: bool,
+    ignore_package_reads: bool,
+    ignore_torch_cache: bool,
+    ignore_tmp_files: bool,
+    sys_prefix: str,
+    sys_base_prefix: str,
+    editable_dirs,
+) -> bool:
+    path = str(ref.path or "")
+    if not path or path.startswith("s3://"):
+        return bool(path)
+
+    if kind == "read":
+        if filter_service._is_roar_internal(path) or filter_service._is_git_metadata(path):
+            return False
+        if ignore_system_reads and filter_service._is_system_read(path):
+            return False
+        if ignore_torch_cache and filter_service._is_torch_cache(path):
+            return False
+        if ignore_package_reads and filter_service._is_package_file(
+            path,
+            sys_prefix,
+            sys_base_prefix,
+            editable_dirs=editable_dirs,
+        ):
+            return False
+        return not (ignore_tmp_files and filter_service._is_tmp_path(path))
+
+    if filter_service._is_write_noise(path):
+        return False
+    if ignore_torch_cache and filter_service._is_torch_cache(path):
+        return False
+    return not (ignore_tmp_files and filter_service._is_tmp_path(path))
 
-        for path, info in path_info.items():
-            task_id = info["writer_task_id"] or info["task_id"]
-            node_id = info["writer_node_id"] or info["node_id"]
-            metadata_payload = {"ray_task_id": task_id}
-            if node_id:
-                metadata_payload["ray_node_id"] = node_id
-            metadata = json.dumps(metadata_payload)
-
-            _insert_artifact(
-                conn,
-                columns=artifact_columns,
-                artifact_id=str(uuid.uuid4()),
-                now=now,
-                path=path,
-                source_type=info["source_type"],
-                capture_method=info["capture_method"],
-                hash_value=info["hash"],
-                size=info["size"],
-                metadata=metadata,
-            )
 
-            row = conn.execute(
-                "SELECT id FROM artifacts WHERE first_seen_path = ? ORDER BY first_seen_at DESC LIMIT 1",
-                (path,),
-            ).fetchone()
-            if row is None:
-                continue
-            actual_artifact_id = row["id"]
+def collect(
+    project_dir: str | None = None,
+    log_dir: str | None = None,
+    proxy_logs: dict[str, dict[str, Any]] | None = None,
+    fragments: list[dict] | None = None,
+) -> None:
+    """
+    Compatibility entrypoint that now only accepts explicit fragments.
+    """
+    del log_dir
 
-            if info["hash"] and info["hash_algorithm"]:
-                conn.execute(
-                    """
-                    INSERT OR IGNORE INTO artifact_hashes
-                        (artifact_id, algorithm, digest)
-                    VALUES (?, ?, ?)
-                    """,
-                    (actual_artifact_id, info["hash_algorithm"], info["hash"]),
-                )
+    if not fragments:
+        return
 
-            if info["saw_write"]:
-                conn.execute(
-                    """
-                    INSERT OR IGNORE INTO job_outputs
-                        (job_id, artifact_id, path)
-                    VALUES (?, ?, ?)
-                    """,
-                    (job_id, actual_artifact_id, path),
-                )
-            if info["saw_read"]:
-                conn.execute(
-                    """
-                    INSERT OR IGNORE INTO job_inputs
-                        (job_id, artifact_id, path)
-                    VALUES (?, ?, ?)
-                    """,
-                    (job_id, actual_artifact_id, path),
-                )
+    if project_dir is None:
+        project_dir = os.environ.get("ROAR_PROJECT_DIR", "/app")
 
-        conn.commit()
-        # Consume logs once to keep collection idempotent if collect() is called
-        # multiple times during shutdown.
-        _consume_filesystem_logs(log_path)
-    finally:
-        conn.close()
+    db_path = os.path.join(project_dir, ".roar", "roar.db")
+    if not os.path.exists(db_path):
+        return  # roar not initialised; nothing to do
+
+    session_id, base_step = _resolve_active_session_context(db_path)
+    collect_fragments(
+        fragments=fragments,
+        project_dir=project_dir,
+        driver_job_uid=os.environ.get("ROAR_JOB_ID"),
+        session_id=session_id,
+        step_number=base_step,
+    )
 
 
 def collect_fragments(
@@ -200,6 +233,14 @@ def collect_fragments(
                 continue
             parsed_fragments.append(fragment)
 
+        parsed_fragments = _apply_reconstitution_filters(
+            parsed_fragments,
+            project_dir=project_dir,
+        )
+        if not parsed_fragments:
+            conn.commit()
+            return
+
         step_by_job_uid = _assign_step_numbers(parsed_fragments, base_step=step_number)
 
         for fragment in parsed_fragments:
@@ -223,32 +264,37 @@ def collect_fragments(
                 continue
             job_id = int(row["id"])
 
-            for ref in fragment.reads:
+            written_artifact_ids: dict[str, str] = {}
+
+            for ref in fragment.writes:
                 artifact_id = _upsert_artifact_for_ref(
                     conn,
                     columns=artifact_columns,
                     ref=ref,
                     now=now,
                 )
+                written_artifact_ids[str(ref.path)] = artifact_id
                 conn.execute(
                     """
-                    INSERT OR IGNORE INTO job_inputs
+                    INSERT OR IGNORE INTO job_outputs
                         (job_id, artifact_id, path)
                     VALUES (?, ?, ?)
                     """,
                     (job_id, artifact_id, ref.path),
                 )
 
-            for ref in fragment.writes:
-                artifact_id = _upsert_artifact_for_ref(
-                    conn,
-                    columns=artifact_columns,
-                    ref=ref,
-                    now=now,
-                )
+            for ref in fragment.reads:
+                artifact_id = written_artifact_ids.get(str(ref.path), "")
+                if not artifact_id:
+                    artifact_id = _upsert_artifact_for_ref(
+                        conn,
+                        columns=artifact_columns,
+                        ref=ref,
+                        now=now,
+                    )
                 conn.execute(
                     """
-                    INSERT OR IGNORE INTO job_outputs
+                    INSERT OR IGNORE INTO job_inputs
                         (job_id, artifact_id, path)
                     VALUES (?, ?, ?)
                     """,
@@ -260,70 +306,6 @@ def collect_fragments(
         conn.close()
 
 
-def _events_from_fragments(fragments: list[dict[str, Any]]) -> list[dict[str, Any]]:
-    events: list[dict[str, Any]] = []
-
-    for payload in fragments:
-        if not isinstance(payload, dict):
-            continue
-
-        try:
-            fragment = TaskFragment.from_dict(payload)
-        except Exception:
-            continue
-
-        task_id = _to_text(fragment.ray_task_id) or _to_text(fragment.job_uid) or "unknown"
-        node_id = _to_text(fragment.ray_node_id)
-
-        for ref in fragment.reads:
-            event: dict[str, Any] = {
-                "path": ref.path,
-                "mode": "r",
-                "task_id": task_id,
-                "capture_method": _normalize_capture_method(_to_text(ref.capture_method)),
-                "size": _normalize_size(ref.size),
-            }
-            if node_id:
-                event["node_id"] = node_id
-            hash_value = _normalize_hash(_to_text(ref.hash))
-            hash_algorithm = _normalize_hash_algorithm(
-                _to_text(ref.hash_algorithm),
-                hash_value,
-                _normalize_source_type(None, ref.path),
-                ref.path,
-            )
-            if hash_value:
-                event["hash"] = hash_value
-            if hash_algorithm:
-                event["hash_algorithm"] = hash_algorithm
-            events.append(event)
-
-        for ref in fragment.writes:
-            event = {
-                "path": ref.path,
-                "mode": "w",
-                "task_id": task_id,
-                "capture_method": _normalize_capture_method(_to_text(ref.capture_method)),
-                "size": _normalize_size(ref.size),
-            }
-            if node_id:
-                event["node_id"] = node_id
-            hash_value = _normalize_hash(_to_text(ref.hash))
-            hash_algorithm = _normalize_hash_algorithm(
-                _to_text(ref.hash_algorithm),
-                hash_value,
-                _normalize_source_type(None, ref.path),
-                ref.path,
-            )
-            if hash_value:
-                event["hash"] = hash_value
-            if hash_algorithm:
-                event["hash_algorithm"] = hash_algorithm
-            events.append(event)
-
-    return events
-
-
 def _assign_step_numbers(
     fragments: list[TaskFragment],
     base_step: int = 1,
@@ -357,14 +339,10 @@ def _assign_step_numbers(
             writes_by_job.append(set())
 
         for ref in fragment.reads:
-            hash_value = _normalize_hash(_to_text(ref.hash))
-            if hash_value:
-                reads_by_job[job_index].add(hash_value)
+            reads_by_job[job_index].update(_dependency_tokens_for_ref(ref))
 
         for ref in fragment.writes:
-            hash_value = _normalize_hash(_to_text(ref.hash))
-            if hash_value:
-                writes_by_job[job_index].add(hash_value)
+            writes_by_job[job_index].update(_dependency_tokens_for_ref(ref))
 
     # hash -> producer job indices
     producers_by_hash: dict[str, set[int]] = {}
@@ -416,7 +394,8 @@ def _insert_fragment_job(
     step_number: int,
     now: float,
 ) -> None:
-    command = f"ray_task:{fragment.function_name}" if fragment.function_name else "ray_task"
+    command_name = _task_command_name(fragment.function_name)
+    command = f"ray_task:{command_name}" if command_name else "ray_task"
     timestamp = fragment.started_at or now
     duration = max(0.0, float(fragment.ended_at - fragment.started_at))
     parent_job_uid = fragment.parent_job_uid or driver_job_uid
@@ -463,10 +442,104 @@ def _insert_fragment_job(
         values.append(json.dumps(metadata))
 
     placeholders = ", ".join("?" for _ in fields)
+    field_list = ", ".join(fields)
     conn.execute(
-        f"INSERT OR IGNORE INTO jobs ({', '.join(fields)}) VALUES ({placeholders})",
+        f"INSERT OR IGNORE INTO jobs ({field_list}) VALUES ({placeholders})",
         values,
     )
+    _update_fragment_job(
+        conn=conn,
+        job_columns=job_columns,
+        fragment=fragment,
+        parent_job_uid=parent_job_uid,
+        session_id=session_id,
+        step_number=step_number,
+        timestamp=timestamp,
+        duration=duration,
+        command=command,
+    )
+
+
+def _update_fragment_job(
+    conn: sqlite3.Connection,
+    *,
+    job_columns: set[str],
+    fragment: TaskFragment,
+    parent_job_uid: str | None,
+    session_id: int | None,
+    step_number: int,
+    timestamp: float,
+    duration: float,
+    command: str,
+) -> None:
+    metadata_json = ""
+    if "metadata" in job_columns:
+        metadata: dict[str, Any] = {
+            "ray_task_id": fragment.ray_task_id,
+            "ray_worker_id": fragment.ray_worker_id,
+            "ray_node_id": fragment.ray_node_id,
+        }
+        if fragment.ray_actor_id:
+            metadata["ray_actor_id"] = fragment.ray_actor_id
+        metadata_json = json.dumps(metadata)
+
+    updates = [
+        "timestamp = CASE WHEN timestamp > ? THEN ? ELSE timestamp END",
+        "duration_seconds = CASE WHEN duration_seconds IS NULL OR duration_seconds < ? THEN ? ELSE duration_seconds END",
+        "exit_code = CASE WHEN ? != 0 THEN ? ELSE COALESCE(exit_code, 0) END",
+        "command = CASE WHEN command IS NULL OR command = '' THEN ? ELSE command END",
+        "script = CASE WHEN script IS NULL OR script = '' THEN ? ELSE script END",
+        "status = COALESCE(status, 'completed')",
+        "job_type = COALESCE(job_type, 'ray_task')",
+    ]
+    params: list[Any] = [
+        timestamp,
+        timestamp,
+        duration,
+        duration,
+        fragment.exit_code,
+        fragment.exit_code,
+        command,
+        fragment.function_name,
+    ]
+
+    if "parent_job_uid" in job_columns:
+        updates.append(
+            "parent_job_uid = CASE WHEN (parent_job_uid IS NULL OR parent_job_uid = '') AND ? IS NOT NULL AND ? != '' THEN ? ELSE parent_job_uid END"
+        )
+        params.extend([parent_job_uid, parent_job_uid, parent_job_uid])
+    if "session_id" in job_columns and session_id is not None:
+        updates.append("session_id = COALESCE(session_id, ?)")
+        params.append(session_id)
+    if "step_number" in job_columns:
+        updates.append(
+            "step_number = CASE WHEN step_number IS NULL OR step_number < ? THEN ? ELSE step_number END"
+        )
+        params.extend([step_number, step_number])
+    if "metadata" in job_columns:
+        updates.append(
+            "metadata = CASE WHEN (metadata IS NULL OR metadata = '') AND ? != '' THEN ? ELSE metadata END"
+        )
+        params.extend([metadata_json, metadata_json])
+
+    params.append(fragment.job_uid)
+    conn.execute(
+        f"UPDATE jobs SET {', '.join(updates)} WHERE job_uid = ?",
+        params,
+    )
+
+
+def _task_command_name(function_name: str) -> str:
+    text = str(function_name or "").strip()
+    if not text:
+        return ""
+
+    parts = [part for part in text.split(".") if part and part != "<locals>"]
+    if not parts:
+        return text
+    if len(parts) >= 2 and parts[-2][:1].isupper():
+        return ".".join(parts[-2:])
+    return parts[-1]
 
 
 def _upsert_artifact_for_ref(
@@ -478,9 +551,13 @@ def _upsert_artifact_for_ref(
 ) -> str:
     digest = _normalize_hash(_to_text(ref.hash))
     algorithm = _to_text(ref.hash_algorithm)
+    existing_by_path = conn.execute(
+        "SELECT id, hash FROM artifacts WHERE first_seen_path = ? ORDER BY first_seen_at DESC LIMIT 1",
+        (ref.path,),
+    ).fetchone()
 
     if digest and algorithm:
-        row = conn.execute(
+        existing_by_hash = conn.execute(
             """
             SELECT artifact_id
             FROM artifact_hashes
@@ -489,16 +566,39 @@ def _upsert_artifact_for_ref(
             """,
             (algorithm, digest),
         ).fetchone()
-        if row is not None:
-            return str(row["artifact_id"])
+        if existing_by_path is not None:
+            path_artifact_id = str(existing_by_path["id"])
+            path_digest_row = conn.execute(
+                """
+                SELECT digest
+                FROM artifact_hashes
+                WHERE artifact_id = ? AND algorithm = ?
+                LIMIT 1
+                """,
+                (path_artifact_id, algorithm),
+            ).fetchone()
+            path_digest = _normalize_hash(
+                _to_text(path_digest_row["digest"])
+                if path_digest_row is not None
+                else existing_by_path["hash"]
+            )
 
-    if not digest:
-        existing = conn.execute(
-            "SELECT id FROM artifacts WHERE first_seen_path = ? ORDER BY first_seen_at DESC LIMIT 1",
-            (ref.path,),
-        ).fetchone()
-        if existing is not None:
-            return str(existing["id"])
+            if path_digest in (None, digest):
+                _backfill_artifact_for_ref(
+                    conn,
+                    columns=columns,
+                    artifact_id=path_artifact_id,
+                    ref=ref,
+                    digest=digest,
+                    algorithm=algorithm,
+                )
+                return path_artifact_id
+
+        if existing_by_hash is not None:
+            return str(existing_by_hash["artifact_id"])
+
+    if not digest and existing_by_path is not None:
+        return str(existing_by_path["id"])
 
     artifact_id = str(uuid.uuid4())
     metadata_payload = {"capture_method": ref.capture_method}
@@ -539,97 +639,68 @@ def _upsert_artifact_for_ref(
     return artifact_id
 
 
-def _collect_actor_payload() -> tuple[list[dict[str, Any]] | None, list[dict[str, Any]]] | None:
-    try:
-        import ray
-    except Exception:
-        return None
-
-    is_initialized = getattr(ray, "is_initialized", None)
-    if callable(is_initialized) and not is_initialized():
-        return None
-
-    job_id = os.environ.get("ROAR_JOB_ID", "default")
-    actor_name = f"roar-log-collector-{job_id}"
-
-    try:
-        actor = ray.get_actor(actor_name, namespace="roar")
-    except Exception:
-        return None
+def _backfill_artifact_for_ref(
+    conn: sqlite3.Connection,
+    *,
+    columns: set[str],
+    artifact_id: str,
+    ref: ArtifactRef,
+    digest: str,
+    algorithm: str,
+) -> None:
+    updates: list[str] = []
+    params: list[Any] = []
 
-    try:
-        events: list[dict[str, Any]] | None = None
-        get_all = getattr(actor, "get_all", None)
-        get_all_remote = getattr(get_all, "remote", None) if get_all is not None else None
-        if callable(get_all_remote):
-            raw_events = ray.get(get_all_remote(), timeout=30)
-            if isinstance(raw_events, list):
-                events = [event for event in raw_events if isinstance(event, dict)]
-            else:
-                events = []
-
-        fragments: list[dict[str, Any]] = []
-        get_all_fragments = getattr(actor, "get_all_fragments", None)
-        get_fragments_remote = (
-            getattr(get_all_fragments, "remote", None) if get_all_fragments is not None else None
+    if "hash" in columns:
+        updates.append("hash = COALESCE(NULLIF(hash, ''), ?)")
+        params.append(digest)
+    if "capture_method" in columns:
+        updates.append("capture_method = COALESCE(NULLIF(capture_method, ''), ?)")
+        params.append(_normalize_capture_method(ref.capture_method))
+    if ref.size > 0:
+        updates.append("size = CASE WHEN size <= 0 THEN ? ELSE size END")
+        params.append(ref.size)
+
+    if updates:
+        params.append(artifact_id)
+        conn.execute(
+            f"UPDATE artifacts SET {', '.join(updates)} WHERE id = ?",
+            params,
         )
-        if callable(get_fragments_remote):
-            raw_fragments = ray.get(get_fragments_remote(), timeout=30)
-            if isinstance(raw_fragments, list):
-                fragments = [fragment for fragment in raw_fragments if isinstance(fragment, dict)]
 
-        return events, fragments
-    except Exception:
-        return None
-    finally:
-        _flush_actor_stream(ray, actor)
-        with suppress(Exception):
-            ray.kill(actor)
+    conn.execute(
+        """
+        INSERT OR IGNORE INTO artifact_hashes
+            (artifact_id, algorithm, digest)
+        VALUES (?, ?, ?)
+        """,
+        (artifact_id, algorithm, digest),
+    )
 
 
-def _collect_events(
-    log_path: Path,
-    actor_events: list[dict[str, Any]] | None = None,
-) -> dict[str, list[dict[str, Any]]]:
-    if actor_events is not None:
-        return _group_events_by_task(actor_events)
+def _resolve_active_session_context(db_path: str) -> tuple[int | None, int]:
+    if not db_path or not os.path.exists(db_path):
+        return None, 1
 
     try:
-        import ray
+        conn = sqlite3.connect(db_path)
+    except sqlite3.Error:
+        return None, 1
 
-        if ray.is_initialized():
-            actor_events = _collect_from_actor()
-            if actor_events is not None:
-                return _group_events_by_task(actor_events)
-    except Exception:
-        pass
-
-    if not log_path.exists():
-        return {}
-    return _read_events(log_path)
-
-
-def _consume_filesystem_logs(log_path: Path) -> None:
-    if not log_path.exists():
-        return
-    for log_file in log_path.glob("*.jsonl"):
-        with suppress(OSError):
-            log_file.unlink()
-
-
-def _resolve_active_session_context(db_path: str) -> tuple[int | None, int]:
-    conn = sqlite3.connect(db_path)
     conn.row_factory = sqlite3.Row
     try:
-        row = conn.execute(
-            """
-            SELECT id, current_step
-            FROM sessions
-            WHERE is_active = 1
-            ORDER BY id DESC
-            LIMIT 1
-            """
-        ).fetchone()
+        try:
+            row = conn.execute(
+                """
+                SELECT id, current_step
+                FROM sessions
+                WHERE is_active = 1
+                ORDER BY id DESC
+                LIMIT 1
+                """
+            ).fetchone()
+        except sqlite3.Error:
+            return None, 1
         if row is None:
             return None, 1
         current_step = int(row["current_step"] or 1)
@@ -638,212 +709,6 @@ def _resolve_active_session_context(db_path: str) -> tuple[int | None, int]:
         conn.close()
 
 
-def _collect_from_actor() -> list[dict[str, Any]] | None:
-    try:
-        import ray
-    except Exception:
-        return None
-
-    job_id = os.environ.get("ROAR_JOB_ID", "default")
-    actor_name = f"roar-log-collector-{job_id}"
-
-    try:
-        actor = ray.get_actor(actor_name, namespace="roar")
-    except Exception:
-        return None
-
-    try:
-        events = ray.get(actor.get_all.remote(), timeout=30)
-        if not isinstance(events, list):
-            return []
-        return [event for event in events if isinstance(event, dict)]
-    except Exception:
-        return None
-    finally:
-        _flush_actor_stream(ray, actor)
-        with suppress(Exception):
-            ray.kill(actor)
-
-
-def _flush_actor_stream(ray_module: Any, actor: Any) -> None:
-    get_fn = getattr(ray_module, "get", None)
-    if not callable(get_fn):
-        return
-
-    flush_to_glaas = getattr(actor, "flush_to_glaas", None)
-    flush_remote = getattr(flush_to_glaas, "remote", None) if flush_to_glaas is not None else None
-    if not callable(flush_remote):
-        return
-
-    with suppress(Exception):
-        get_fn(flush_remote(), timeout=5)
-
-
-def _group_events_by_task(events: list[dict[str, Any]]) -> dict[str, list[dict[str, Any]]]:
-    task_events: dict[str, list[dict[str, Any]]] = {}
-    for event in events:
-        task_id = _to_text(event.get("task_id")) or "unknown"
-        task_events.setdefault(task_id, []).append(event)
-    return task_events
-
-
-def _read_events(log_path: Path) -> dict[str, list[dict[str, Any]]]:
-    # task_id -> list of event dicts
-    task_events: dict[str, list[dict[str, Any]]] = {}
-    logger = _get_logger()
-
-    for log_file in sorted(log_path.glob("*.jsonl")):
-        task_id = log_file.stem
-        events: list[dict[str, Any]] = []
-        try:
-            for line_number, line in enumerate(
-                log_file.read_text(encoding="utf-8").splitlines(), start=1
-            ):
-                stripped = line.strip()
-                if not stripped:
-                    continue
-                try:
-                    payload = json.loads(stripped)
-                    if isinstance(payload, dict):
-                        events.append(payload)
-                    else:
-                        logger.warning(
-                            "Skipping non-object JSON payload in Ray log %s line %d",
-                            log_file,
-                            line_number,
-                        )
-                except json.JSONDecodeError:
-                    logger.warning(
-                        "Skipping malformed JSON line in Ray log %s line %d",
-                        log_file,
-                        line_number,
-                    )
-        except OSError as exc:
-            logger.warning("Skipping unreadable Ray log %s: %s", log_file, exc)
-
-        if events:
-            task_events[task_id] = task_events.get(task_id, []) + events
-
-    return task_events
-
-
-def _merge_proxy_logs(
-    task_events: dict[str, list[dict[str, Any]]],
-    proxy_logs: dict[str, dict[str, Any]],
-) -> None:
-    for fallback_node_id, payload in proxy_logs.items():
-        if not isinstance(payload, dict):
-            continue
-
-        node_id = _to_text(payload.get("node_id")) or _to_text(fallback_node_id)
-        lines = payload.get("proxy_log_lines")
-        if not isinstance(lines, list):
-            continue
-
-        task_id = f"proxy-{node_id or 'unknown'}"
-        events = task_events.setdefault(task_id, [])
-        for line in lines:
-            if not isinstance(line, str):
-                continue
-
-            parsed = parse_log_line(line)
-            if parsed is None:
-                continue
-
-            mode = "r" if parsed.operation in _READ_OPS else "w"
-            event: dict[str, Any] = {
-                "path": f"s3://{parsed.bucket}/{parsed.key}",
-                "mode": mode,
-                "task_id": task_id,
-                "source_type": "s3",
-                "capture_method": "proxy",
-                "operation": parsed.operation,
-            }
-            if node_id:
-                event["node_id"] = node_id
-            if parsed.etag:
-                event["hash"] = parsed.etag
-                event["hash_algorithm"] = "etag"
-            if parsed.size_bytes is not None:
-                event["size"] = _normalize_size(parsed.size_bytes)
-
-            events.append(event)
-
-
-def _aggregate_paths(task_events: dict[str, list[dict[str, Any]]]) -> dict[str, dict[str, Any]]:
-    # path -> rollup info
-    path_info: dict[str, dict[str, Any]] = {}
-
-    for fallback_task_id, events in task_events.items():
-        for event in events:
-            raw_path = event.get("path")
-            path = _to_text(raw_path)
-            if not path:
-                continue
-
-            event_task_id = _to_text(event.get("task_id")) or fallback_task_id
-            event_node_id = _to_text(event.get("node_id"))
-            operation = _to_text(event.get("operation"))
-            mode = _to_text(event.get("mode")) or "r"
-
-            is_read, is_write = _infer_direction(mode, operation)
-            source_type = _normalize_source_type(_to_text(event.get("source_type")), path)
-            capture_method = _normalize_capture_method(_to_text(event.get("capture_method")))
-            hash_value = _normalize_hash(_to_text(event.get("hash")))
-            hash_algorithm = _normalize_hash_algorithm(
-                _to_text(event.get("hash_algorithm")),
-                hash_value,
-                source_type,
-                path,
-            )
-            event_size = _normalize_size(event.get("size"))
-
-            if path not in path_info:
-                path_info[path] = {
-                    "task_id": event_task_id,
-                    "node_id": event_node_id,
-                    "writer_task_id": None,
-                    "writer_node_id": None,
-                    "saw_read": False,
-                    "saw_write": False,
-                    "source_type": source_type,
-                    "capture_method": capture_method,
-                    "hash": hash_value,
-                    "hash_algorithm": hash_algorithm,
-                    "size": event_size,
-                }
-
-            info = path_info[path]
-
-            if event_node_id and not info["node_id"]:
-                info["node_id"] = event_node_id
-
-            if source_type and not info["source_type"]:
-                info["source_type"] = source_type
-
-            info["capture_method"] = _choose_capture_method(info["capture_method"], capture_method)
-
-            if hash_value and not info["hash"]:
-                info["hash"] = hash_value
-                if hash_algorithm:
-                    info["hash_algorithm"] = hash_algorithm
-            elif hash_value and hash_algorithm and not info["hash_algorithm"]:
-                info["hash_algorithm"] = hash_algorithm
-            if event_size > info["size"]:
-                info["size"] = event_size
-
-            if is_write:
-                info["saw_write"] = True
-                if info["writer_task_id"] is None:
-                    info["writer_task_id"] = event_task_id
-                    info["writer_node_id"] = event_node_id
-
-            if is_read:
-                info["saw_read"] = True
-
-    return path_info
-
-
 def _create_ray_job(conn: sqlite3.Connection, now: float) -> int:
     roar_job_id = os.environ.get("ROAR_JOB_ID")
     if roar_job_id:
@@ -933,17 +798,6 @@ def _insert_artifact(
     )
 
 
-def _infer_direction(mode: str, operation: str | None) -> tuple[bool, bool]:
-    if operation in _WRITE_OPS:
-        return False, True
-    if operation in _READ_OPS:
-        return True, False
-
-    is_write = any(flag in mode for flag in ("w", "a", "x", "+"))
-    is_read = "r" in mode or "+" in mode or not is_write
-    return is_read, is_write
-
-
 def _normalize_source_type(source_type: str | None, path: str) -> str | None:
     if source_type:
         lowered = source_type.strip().lower()
@@ -960,14 +814,18 @@ def _normalize_capture_method(capture_method: str | None) -> str | None:
     return method or None
 
 
-def _choose_capture_method(existing: str | None, incoming: str | None) -> str | None:
-    if not incoming:
-        return existing
-    if not existing:
-        return incoming
-    existing_rank = _CAPTURE_PRIORITY.get(existing, -1)
-    incoming_rank = _CAPTURE_PRIORITY.get(incoming, -1)
-    return incoming if incoming_rank >= existing_rank else existing
+def _dependency_tokens_for_ref(ref: ArtifactRef) -> set[str]:
+    tokens: set[str] = set()
+
+    hash_value = _normalize_hash(_to_text(ref.hash))
+    if hash_value:
+        tokens.add(f"hash:{hash_value}")
+
+    path = _to_text(ref.path)
+    if path:
+        tokens.add(f"path:{path}")
+
+    return tokens
 
 
 def _normalize_hash(value: str | None) -> str | None:
@@ -979,30 +837,6 @@ def _normalize_hash(value: str | None) -> str | None:
     return text or None
 
 
-def _normalize_hash_algorithm(
-    algorithm: str | None,
-    hash_value: str | None,
-    source_type: str | None,
-    path: str,
-) -> str | None:
-    if not hash_value:
-        return None
-    if algorithm:
-        normalized = algorithm.strip().lower()
-        if normalized:
-            return normalized
-    if source_type == "s3" or path.startswith("s3://"):
-        return "etag"
-    return None
-
-
-def _normalize_size(value: Any) -> int:
-    try:
-        return max(0, int(value))
-    except (TypeError, ValueError):
-        return 0
-
-
 def _to_text(value: Any) -> str | None:
     if value is None:
         return None
diff --git a/roar/ray/driver_entrypoint.py b/roar/ray/driver_entrypoint.py
new file mode 100644
index 00000000..02b76574
--- /dev/null
+++ b/roar/ray/driver_entrypoint.py
@@ -0,0 +1,143 @@
+from __future__ import annotations
+
+import contextlib
+import os
+import subprocess
+import sys
+import time
+from collections.abc import Sequence
+
+from roar.ray.collector import collect_fragments
+from roar.ray.fragment import TaskFragment
+from roar.ray.glaas_fragment_streamer import GlaasFragmentStreamer
+from roar.ray.proxy_fragments import build_proxy_fragment
+from roar.services.execution.proxy import ProxyHandle, ProxyService, S3LogEntry
+
+_DEFAULT_LOCAL_PROXY_PORT = 19191
+
+
+def _warn(message: str) -> None:
+    with contextlib.suppress(Exception):
+        sys.stderr.write(message + "\n")
+
+
+def _build_driver_proxy_fragment(
+    entries: Sequence[S3LogEntry],
+    *,
+    started_at: float,
+    ended_at: float,
+    exit_code: int,
+) -> TaskFragment | None:
+    return build_proxy_fragment(
+        entries,
+        function_name="s3_driver_proxy",
+        task_id="proxy:driver",
+        parent_job_uid=os.environ.get("ROAR_JOB_ID"),
+        started_at=started_at,
+        ended_at=ended_at,
+        exit_code=exit_code,
+    )
+
+
+def _emit_driver_proxy_fragment(fragment: TaskFragment) -> None:
+    session_id = str(os.environ.get("ROAR_SESSION_ID", "")).strip()
+    token = str(os.environ.get("ROAR_FRAGMENT_TOKEN", "")).strip()
+    glaas_url = str(os.environ.get("GLAAS_URL") or "").strip()
+
+    if session_id and token and glaas_url:
+        streamer = GlaasFragmentStreamer(
+            session_id=session_id,
+            token=token,
+            glaas_url=glaas_url,
+        )
+        streamer.append_fragment(fragment.to_dict())
+        streamer.close()
+        return
+
+    project_dir = str(os.environ.get("ROAR_PROJECT_DIR", "")).strip()
+    if not project_dir:
+        return
+
+    collect_fragments(
+        fragments=[fragment.to_dict()],
+        project_dir=project_dir,
+        driver_job_uid=os.environ.get("ROAR_JOB_ID"),
+    )
+
+
+def _local_proxy_port() -> int:
+    raw_value = str(os.environ.get("ROAR_PROXY_PORT", "")).strip()
+    if raw_value.isdigit():
+        port = int(raw_value)
+        if 1024 < port <= 65535:
+            return port
+    return _DEFAULT_LOCAL_PROXY_PORT
+
+
+def _start_driver_proxy() -> tuple[ProxyService | None, ProxyHandle | None]:
+    endpoint = str(os.environ.get("AWS_ENDPOINT_URL", "")).strip().lower()
+    if endpoint and not (
+        endpoint.startswith("http://127.0.0.1:") or endpoint.startswith("http://localhost:")
+    ):
+        return None, None
+
+    service = ProxyService()
+    handle = service.start_for_run(
+        session_id=str(os.environ.get("ROAR_SESSION_ID", "")).strip() or None,
+        job_id=str(os.environ.get("ROAR_JOB_ID", "")).strip() or None,
+        upstream_url=str(os.environ.get("ROAR_UPSTREAM_S3_ENDPOINT", "")).strip() or None,
+        port=_local_proxy_port(),
+    )
+    return service, handle
+
+
+def _run_child(argv: Sequence[str], env: dict[str, str]) -> int:
+    env.setdefault("ROAR_DRIVER_PHASE_CAPTURE", "1")
+    process = subprocess.Popen(list(argv), env=env)
+    return int(process.wait())
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    args = list(argv if argv is not None else sys.argv[1:])
+    if args and args[0] == "--":
+        args = args[1:]
+    if not args:
+        _warn("roar ray driver entrypoint requires a command after --")
+        return 2
+
+    service: ProxyService | None = None
+    handle: ProxyHandle | None = None
+    child_env = os.environ.copy()
+    started_at = time.time()
+
+    try:
+        try:
+            service, handle = _start_driver_proxy()
+        except Exception as exc:
+            _warn(f"[roar-driver] failed to start local S3 proxy: {exc}")
+        else:
+            if handle is not None:
+                child_env["ROAR_PROXY_PORT"] = str(handle.port)
+
+        exit_code = _run_child(args, env=child_env)
+    finally:
+        ended_at = time.time()
+        if service is not None and handle is not None:
+            try:
+                entries = service.stop_for_run(handle)
+                fragment = _build_driver_proxy_fragment(
+                    entries,
+                    started_at=started_at,
+                    ended_at=ended_at,
+                    exit_code=locals().get("exit_code", 1),
+                )
+                if fragment is not None:
+                    _emit_driver_proxy_fragment(fragment)
+            except Exception as exc:
+                _warn(f"[roar-driver] failed to collect local S3 proxy lineage: {exc}")
+
+    return exit_code
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/roar/ray/fragment.py b/roar/ray/fragment.py
index 35028cc7..439d5593 100644
--- a/roar/ray/fragment.py
+++ b/roar/ray/fragment.py
@@ -26,6 +26,7 @@ class TaskFragment:
     started_at: float
     ended_at: float
     exit_code: int
+    recorded_at: float | None = None
     reads: list[ArtifactRef] = field(default_factory=list)
     writes: list[ArtifactRef] = field(default_factory=list)
     worker_packages: dict[str, str] | None = None
diff --git a/roar/ray/fragment_reconstituter.py b/roar/ray/fragment_reconstituter.py
index e118a2d2..fb2f2fb3 100644
--- a/roar/ray/fragment_reconstituter.py
+++ b/roar/ray/fragment_reconstituter.py
@@ -2,6 +2,8 @@
 
 import base64
 import json
+import mimetypes
+import os
 import sqlite3
 import urllib.request
 from dataclasses import dataclass
@@ -10,7 +12,10 @@
 
 from cryptography.hazmat.primitives.ciphers.aead import AESGCM
 
-from .collector import collect_fragments
+from roar.db.context import create_database_context
+
+from .collector import _resolve_active_session_context, collect_fragments
+from .s3_key_paths import parse_s3_key_placeholder, s3_object_key
 
 
 def _get_logger():
@@ -26,6 +31,15 @@ class ReconstitutionResult:
     fragments_processed: int = 0
 
 
+@dataclass(frozen=True)
+class _CompositeOutputLeaf:
+    path: str
+    artifact_id: str
+    digest: str
+    algorithm: str
+    size: int
+
+
 class FragmentReconstituter:
     def __init__(
         self,
@@ -61,12 +75,20 @@ def reconstitute(self) -> ReconstitutionResult:
         if not fragments:
             return ReconstitutionResult()
 
+        fragments = self._resolve_s3_key_placeholders(fragments)
+        fragments = self._drop_proxy_fallback_duplicates(fragments)
+        fragments = self._deduplicate_fragments(fragments)
         jobs_before, artifacts_before = self._count_local_rows()
+        session_id, step_number = _resolve_active_session_context(str(self._roar_db_path))
+        driver_job_uid = str(os.environ.get("ROAR_JOB_ID", "")).strip() or None
 
         try:
             collect_fragments(
                 fragments=fragments,
                 project_dir=str(self._project_dir()),
+                driver_job_uid=driver_job_uid,
+                session_id=session_id,
+                step_number=step_number,
             )
         except Exception as exc:
             _get_logger().warning(
@@ -76,6 +98,7 @@ def reconstitute(self) -> ReconstitutionResult:
             )
             return ReconstitutionResult(fragments_processed=len(fragments))
 
+        self._materialize_reconstituted_composites(fragments)
         jobs_after, artifacts_after = self._count_local_rows()
         return ReconstitutionResult(
             jobs_merged=max(0, jobs_after - jobs_before),
@@ -100,7 +123,7 @@ def _fetch_batches(self) -> list[dict[str, Any]]:
             )
             return []
 
-        rows = payload.get("fragments")
+        rows = payload.get("data", {}).get("fragments", payload.get("fragments"))
         if not isinstance(rows, list):
             _get_logger().warning(
                 "Invalid fragment response for session %s: missing fragments list",
@@ -143,6 +166,181 @@ def _decrypt_batch(self, batch: dict[str, Any], key: bytes) -> list[dict[str, An
 
         return [item for item in decoded if isinstance(item, dict)]
 
+    @staticmethod
+    def _resolve_s3_key_placeholders(fragments: list[dict[str, Any]]) -> list[dict[str, Any]]:
+        concrete_paths_by_key: dict[str, set[str]] = {}
+
+        for fragment in fragments:
+            for list_key in ("reads", "writes"):
+                items = fragment.get(list_key, [])
+                if not isinstance(items, list):
+                    continue
+                for item in items:
+                    if not isinstance(item, dict):
+                        continue
+                    path = item.get("path", "")
+                    if not isinstance(path, str) or not path.startswith("s3://"):
+                        continue
+                    object_key = s3_object_key(path)
+                    if object_key:
+                        concrete_paths_by_key.setdefault(object_key, set()).add(path)
+
+        if not concrete_paths_by_key:
+            return fragments
+
+        for fragment in fragments:
+            for list_key in ("reads", "writes"):
+                items = fragment.get(list_key, [])
+                if not isinstance(items, list):
+                    continue
+                for item in items:
+                    if not isinstance(item, dict):
+                        continue
+                    path = item.get("path", "")
+                    if not isinstance(path, str):
+                        continue
+                    placeholder = parse_s3_key_placeholder(path)
+                    if placeholder is None:
+                        continue
+                    _bucket_hint, object_key = placeholder
+                    matches = concrete_paths_by_key.get(object_key, set())
+                    if len(matches) == 1:
+                        item["path"] = next(iter(matches))
+
+        return fragments
+
+    @staticmethod
+    def _deduplicate_fragments(fragments: list[dict[str, Any]]) -> list[dict[str, Any]]:
+        """Deduplicate artifacts across capture methods within each task.
+
+        Priority: proxy > native > python.
+        For the same (task, path, kind) tuple, keep the highest-priority capture,
+        but merge richer metadata from lower-priority duplicates when the winner
+        does not carry it.
+        """
+        priority = {"proxy": 3, "native": 2, "python": 1, "tracer": 1}
+        winners: dict[tuple[str, str, str], tuple[int, dict[str, Any]]] = {}
+
+        for fragment_index, fragment in enumerate(fragments):
+            task_key = str(
+                fragment.get("job_uid")
+                or fragment.get("ray_task_id")
+                or f"fragment:{fragment_index}"
+            )
+            for list_key in ("reads", "writes"):
+                items = fragment.get(list_key, [])
+                if not isinstance(items, list):
+                    continue
+                for item in items:
+                    if not isinstance(item, dict):
+                        continue
+                    path = item.get("path", "")
+                    if not isinstance(path, str) or not path:
+                        continue
+                    method = str(item.get("capture_method", "python"))
+                    current_priority = priority.get(method, 0)
+                    dedup_key = (task_key, list_key, path)
+                    existing = winners.get(dedup_key)
+                    if existing is None:
+                        winners[dedup_key] = (current_priority, item)
+                    elif current_priority > existing[0]:
+                        FragmentReconstituter._merge_ref_metadata(item, existing[1])
+                        winners[dedup_key] = (current_priority, item)
+                    else:
+                        FragmentReconstituter._merge_ref_metadata(existing[1], item)
+
+        for fragment_index, fragment in enumerate(fragments):
+            task_key = str(
+                fragment.get("job_uid")
+                or fragment.get("ray_task_id")
+                or f"fragment:{fragment_index}"
+            )
+            for list_key in ("reads", "writes"):
+                if list_key not in fragment:
+                    continue
+                items = fragment[list_key]
+                if not isinstance(items, list):
+                    continue
+
+                deduplicated_items: list[dict[str, Any]] = []
+                for item in items:
+                    if not isinstance(item, dict):
+                        continue
+                    path = item.get("path", "")
+                    if not isinstance(path, str) or not path:
+                        deduplicated_items.append(item)
+                        continue
+                    winner = winners.get((task_key, list_key, path))
+                    if winner is not None and item is winner[1]:
+                        deduplicated_items.append(item)
+                fragment[list_key] = deduplicated_items
+
+        return fragments
+
+    @staticmethod
+    def _drop_proxy_fallback_duplicates(fragments: list[dict[str, Any]]) -> list[dict[str, Any]]:
+        proxy_function_names = {"s3_proxy", "s3_driver_proxy"}
+        task_scoped_s3_refs: set[tuple[str, str]] = set()
+
+        for fragment in fragments:
+            function_name = str(fragment.get("function_name") or "")
+            if function_name in proxy_function_names:
+                continue
+            for list_key in ("reads", "writes"):
+                items = fragment.get(list_key, [])
+                if not isinstance(items, list):
+                    continue
+                for item in items:
+                    if not isinstance(item, dict):
+                        continue
+                    path = item.get("path", "")
+                    if isinstance(path, str) and path.startswith("s3://"):
+                        task_scoped_s3_refs.add((list_key, path))
+
+        if not task_scoped_s3_refs:
+            return fragments
+
+        for fragment in fragments:
+            if str(fragment.get("function_name") or "") not in proxy_function_names:
+                continue
+            for list_key in ("reads", "writes"):
+                items = fragment.get(list_key, [])
+                if not isinstance(items, list):
+                    continue
+                fragment[list_key] = [
+                    item
+                    for item in items
+                    if not (
+                        isinstance(item, dict)
+                        and isinstance(item.get("path"), str)
+                        and item["path"].startswith("s3://")
+                        and (list_key, item["path"]) in task_scoped_s3_refs
+                    )
+                ]
+
+        return fragments
+
+    @staticmethod
+    def _merge_ref_metadata(winner: dict[str, Any], candidate: dict[str, Any]) -> None:
+        winner_hash = str(winner.get("hash") or "").strip()
+        candidate_hash = str(candidate.get("hash") or "").strip()
+        if not winner_hash and candidate_hash:
+            winner["hash"] = candidate_hash
+
+        winner_algorithm = str(winner.get("hash_algorithm") or "").strip()
+        candidate_algorithm = str(candidate.get("hash_algorithm") or "").strip()
+        if not winner_algorithm and candidate_algorithm:
+            winner["hash_algorithm"] = candidate_algorithm
+
+        winner_size = winner.get("size")
+        candidate_size = candidate.get("size")
+        if (
+            (not isinstance(winner_size, int) or winner_size <= 0)
+            and isinstance(candidate_size, int)
+            and candidate_size > 0
+        ):
+            winner["size"] = candidate_size
+
     @staticmethod
     def _sequence_key(batch: dict[str, Any]) -> int:
         sequence = batch.get("sequence")
@@ -159,6 +357,299 @@ def _project_dir(self) -> Path:
             return parent.parent
         return parent
 
+    def _materialize_reconstituted_composites(self, fragments: list[dict[str, Any]]) -> None:
+        job_uids = sorted(
+            {
+                str(fragment.get("job_uid") or "").strip()
+                for fragment in fragments
+                if isinstance(fragment, dict)
+            }
+            - {""}
+        )
+        if not job_uids:
+            return
+
+        try:
+            from roar.services.execution.dataset_identifier import DatasetIdentifierInferer
+            from roar.services.execution.job_recording import (
+                ExecutionJobRecorder,
+                RunCompositeMaterializationConfig,
+            )
+            from roar.services.put.composite_builder import CompositeArtifactBuilder, CompositeLeaf
+        except Exception:
+            return
+
+        config = RunCompositeMaterializationConfig.from_repo_root(str(self._project_dir()))
+        if not config.enabled:
+            return
+
+        inferer = DatasetIdentifierInferer()
+        builder = CompositeArtifactBuilder()
+
+        try:
+            with create_database_context(self._roar_db_path.parent) as db_ctx:
+                for job_uid in job_uids:
+                    job = db_ctx.jobs.get_by_uid(job_uid)
+                    if not isinstance(job, dict):
+                        continue
+
+                    job_id = int(job["id"])
+                    outputs = db_ctx.jobs.get_outputs(job_id)
+                    leaves = self._extract_composite_output_leaves(outputs)
+                    if len(leaves) < config.min_components:
+                        continue
+
+                    dataset_identifiers = inferer.infer(
+                        [leaf.path for leaf in leaves],
+                        repo_root=str(self._project_dir()),
+                        min_confidence=0.0,
+                    )
+                    grouped_roots = self._group_composite_roots(
+                        leaves=leaves,
+                        dataset_identifiers=dataset_identifiers,
+                        config=config,
+                    )
+                    if not grouped_roots:
+                        continue
+
+                    materialized: list[dict[str, Any]] = []
+                    for root_path, members in grouped_roots.items():
+                        artifact_id_by_path = {
+                            member.path: member.artifact_id for member in members
+                        }
+                        composite_leaves = [
+                            CompositeLeaf(
+                                relative_path=self._relative_path_for_root(member.path, root_path),
+                                digest=member.digest,
+                                size=member.size,
+                                component_type=mimetypes.guess_type(member.path)[0],
+                                leaf_kind="file",
+                                component_algorithm=member.algorithm,
+                            )
+                            for member in sorted(members, key=lambda item: item.path)
+                        ]
+                        if len(composite_leaves) < config.min_components:
+                            continue
+
+                        composite = builder.build_for_leaves(
+                            root_path=root_path,
+                            leaves=composite_leaves,
+                            session_hash="",
+                            source_type=self._source_type_for_root(root_path),
+                        )
+                        if composite is None:
+                            continue
+
+                        metadata = json.dumps(
+                            {
+                                "composite": {
+                                    "root_path": composite.root_path,
+                                    "component_count_total": composite.component_count_total,
+                                    "component_count_stored": composite.component_count_stored,
+                                }
+                            }
+                        )
+                        artifact_id, _created = db_ctx.artifacts.register(
+                            hashes={"composite-blake3": composite.digest},
+                            size=int(composite.payload.get("size") or 0),
+                            path=composite.root_path,
+                            source_type=composite.payload.get("source_type"),
+                            metadata=metadata,
+                        )
+                        component_payload = []
+                        for component in list(composite.payload.get("components") or []):
+                            relative_path = str(component.get("relative_path") or "")
+                            component_path = self._path_for_relative_member(
+                                root_path, relative_path
+                            )
+                            component_payload.append(
+                                {
+                                    **component,
+                                    "artifact_id": artifact_id_by_path.get(component_path),
+                                }
+                            )
+                        db_ctx.composites.upsert_details(
+                            artifact_id=artifact_id,
+                            components=component_payload,
+                            component_count_total=composite.component_count_total,
+                            membership_index=composite.payload.get("membership_index"),
+                        )
+                        db_ctx.jobs.add_output(job_id, artifact_id, composite.root_path)
+                        materialized.append(
+                            {
+                                "local_artifact_id": artifact_id,
+                                "root_path": composite.root_path,
+                                "hash": composite.digest,
+                                "component_count_total": composite.component_count_total,
+                                "component_count_stored": composite.component_count_stored,
+                            }
+                        )
+
+                    if materialized:
+                        metadata_json = ExecutionJobRecorder._merge_composites_into_metadata_json(
+                            job.get("metadata"),
+                            materialized,
+                        )
+                        db_ctx.jobs.update_metadata(job_id, metadata_json)
+        except Exception as exc:
+            _get_logger().warning(
+                "Failed to materialize composite artifacts during fragment reconstitution for session %s: %s",
+                self._session_id,
+                exc,
+            )
+
+    @staticmethod
+    def _extract_composite_output_leaves(
+        outputs: list[dict[str, Any]],
+    ) -> list[_CompositeOutputLeaf]:
+        leaves: list[_CompositeOutputLeaf] = []
+        seen_paths: set[str] = set()
+        for output in outputs:
+            if not isinstance(output, dict):
+                continue
+            if str(output.get("kind") or "").strip().lower() == "composite":
+                continue
+
+            path = str(output.get("path") or output.get("first_seen_path") or "").strip()
+            artifact_id = str(output.get("artifact_id") or "").strip()
+            if not path or not artifact_id or path in seen_paths:
+                continue
+
+            algorithm = ""
+            digest = ""
+            hashes = output.get("hashes")
+            if isinstance(hashes, list):
+                for item in hashes:
+                    if not isinstance(item, dict):
+                        continue
+                    raw_algorithm = str(item.get("algorithm") or "").strip().lower()
+                    raw_digest = str(item.get("digest") or "").strip().lower()
+                    if not raw_algorithm or not FragmentReconstituter._is_hex_digest(raw_digest):
+                        continue
+                    algorithm = raw_algorithm
+                    digest = raw_digest
+                    break
+            if not algorithm or not digest:
+                continue
+
+            try:
+                size = max(0, int(output.get("size") or 0))
+            except (TypeError, ValueError):
+                size = 0
+
+            leaves.append(
+                _CompositeOutputLeaf(
+                    path=path,
+                    artifact_id=artifact_id,
+                    digest=digest,
+                    algorithm=algorithm,
+                    size=size,
+                )
+            )
+            seen_paths.add(path)
+
+        return leaves
+
+    @staticmethod
+    def _group_composite_roots(
+        *,
+        leaves: list[_CompositeOutputLeaf],
+        dataset_identifiers: list[dict[str, Any]],
+        config: Any,
+    ) -> dict[str, list[_CompositeOutputLeaf]]:
+        grouped: dict[str, list[_CompositeOutputLeaf]] = {}
+        assigned_paths: set[str] = set()
+        ranked_candidates = sorted(
+            dataset_identifiers,
+            key=lambda item: (
+                float(item.get("confidence", 0.0))
+                if isinstance(item.get("confidence"), (int, float))
+                else 0.0
+            ),
+            reverse=True,
+        )
+        for candidate in ranked_candidates:
+            confidence = candidate.get("confidence")
+            if not isinstance(confidence, (int, float)):
+                continue
+
+            dataset_id = str(candidate.get("dataset_id") or "").strip()
+            if not dataset_id:
+                continue
+
+            confidence_floor = FragmentReconstituter._composite_confidence_floor(candidate, config)
+            if float(confidence) < confidence_floor:
+                continue
+
+            matches = [
+                leaf
+                for leaf in leaves
+                if leaf.path not in assigned_paths
+                and FragmentReconstituter._is_path_under_root(leaf.path, dataset_id)
+            ]
+            if len(matches) < config.min_components:
+                continue
+
+            grouped[dataset_id] = matches
+            assigned_paths.update(leaf.path for leaf in matches)
+            if len(grouped) >= int(config.max_roots_per_job):
+                break
+
+        return grouped
+
+    @staticmethod
+    def _is_path_under_root(path: str, root: str) -> bool:
+        normalized_root = root.rstrip("/")
+        return path == normalized_root or path.startswith(normalized_root + "/")
+
+    @staticmethod
+    def _relative_path_for_root(path: str, root: str) -> str:
+        normalized_root = root.rstrip("/")
+        prefix = normalized_root + "/"
+        if path.startswith(prefix):
+            return path[len(prefix) :]
+        return Path(path).name
+
+    @staticmethod
+    def _path_for_relative_member(root: str, relative_path: str) -> str:
+        normalized_root = root.rstrip("/")
+        if not relative_path:
+            return normalized_root
+        return f"{normalized_root}/{relative_path.lstrip('/')}"
+
+    @staticmethod
+    def _source_type_for_root(root: str) -> str | None:
+        if root.startswith("s3://"):
+            return "s3"
+        if root.startswith("gs://"):
+            return "gs"
+        return None
+
+    @staticmethod
+    def _composite_confidence_floor(candidate: dict[str, Any], config: Any) -> float:
+        try:
+            configured_floor = float(config.min_confidence)
+        except (TypeError, ValueError):
+            configured_floor = 0.8
+
+        evidence = candidate.get("evidence")
+        if isinstance(evidence, list):
+            evidence_set = {str(item) for item in evidence}
+            if {"high_cardinality", "shard_cluster"}.issubset(evidence_set):
+                return min(configured_floor, 0.5)
+
+        return configured_floor
+
+    @staticmethod
+    def _is_hex_digest(value: str) -> bool:
+        if not value or len(value) % 2 != 0:
+            return False
+        try:
+            bytes.fromhex(value)
+        except ValueError:
+            return False
+        return True
+
     def _count_local_rows(self) -> tuple[int, int]:
         if not self._roar_db_path.exists():
             return 0, 0
diff --git a/roar/ray/glaas_fragment_streamer.py b/roar/ray/glaas_fragment_streamer.py
index a1ae41c4..f6af0c13 100644
--- a/roar/ray/glaas_fragment_streamer.py
+++ b/roar/ray/glaas_fragment_streamer.py
@@ -1,8 +1,10 @@
 from __future__ import annotations
 
 import base64
+import copy
 import json
 import os
+import urllib.error
 import urllib.request
 from typing import Any
 
@@ -39,8 +41,55 @@ def flush(self) -> bool:
         if not self._buffer:
             return True
 
+        while self._buffer:
+            remaining = len(self._buffer)
+            chunk_size = remaining
+
+            while chunk_size >= 1:
+                ok, too_large = self._post_chunk(self._buffer[:chunk_size])
+                if ok:
+                    del self._buffer[:chunk_size]
+                    self._next_sequence += 1
+                    break
+                if (
+                    too_large
+                    and chunk_size == 1
+                    and self._split_oversized_fragment(self._buffer[0])
+                ):
+                    break
+                if too_large and chunk_size > 1:
+                    chunk_size = max(1, chunk_size // 2)
+                    continue
+                return False
+
+        return True
+
+    def _split_oversized_fragment(self, fragment: dict[str, Any]) -> bool:
+        reads = fragment.get("reads")
+        writes = fragment.get("writes")
+        if not isinstance(reads, list) or not isinstance(writes, list):
+            return False
+
+        refs: list[tuple[str, dict[str, Any]]] = []
+        refs.extend(("reads", copy.deepcopy(ref)) for ref in reads if isinstance(ref, dict))
+        refs.extend(("writes", copy.deepcopy(ref)) for ref in writes if isinstance(ref, dict))
+        if len(refs) <= 1:
+            return False
+
+        midpoint = max(1, len(refs) // 2)
+        replacement: list[dict[str, Any]] = []
+        for subset in (refs[:midpoint], refs[midpoint:]):
+            part = copy.deepcopy(fragment)
+            part["reads"] = [ref for kind, ref in subset if kind == "reads"]
+            part["writes"] = [ref for kind, ref in subset if kind == "writes"]
+            replacement.append(part)
+
+        self._buffer[:1] = replacement
+        return True
+
+    def _post_chunk(self, chunk: list[dict[str, Any]]) -> tuple[bool, bool]:
         try:
-            plaintext = json.dumps(self._buffer, separators=(",", ":")).encode("utf-8")
+            plaintext = json.dumps(chunk, separators=(",", ":")).encode("utf-8")
             key = bytes.fromhex(self._token)
             nonce = os.urandom(12)
             ciphertext = AESGCM(key).encrypt(nonce, plaintext, None)
@@ -63,6 +112,14 @@ def flush(self) -> bool:
             )
             with urllib.request.urlopen(request, timeout=5):
                 pass
+        except urllib.error.HTTPError as exc:
+            _get_logger().warning(
+                "Failed to stream Ray fragments for session %s sequence %d: %s",
+                self._session_id,
+                self._next_sequence,
+                exc,
+            )
+            return False, exc.code == 413
         except Exception as exc:
             _get_logger().warning(
                 "Failed to stream Ray fragments for session %s sequence %d: %s",
@@ -70,11 +127,9 @@ def flush(self) -> bool:
                 self._next_sequence,
                 exc,
             )
-            return False
+            return False, False
 
-        self._buffer.clear()
-        self._next_sequence += 1
-        return True
+        return True, False
 
     def close(self) -> None:
         self.flush()
diff --git a/roar/ray/node_agent.py b/roar/ray/node_agent.py
index 36e53794..aac4e77c 100644
--- a/roar/ray/node_agent.py
+++ b/roar/ray/node_agent.py
@@ -1,5 +1,7 @@
 from __future__ import annotations
 
+import contextlib
+import os
 import socket
 import subprocess
 import threading
@@ -17,17 +19,19 @@
 __all__ = ["RoarNodeAgent", "build_node_agent_name"]
 
 
-def _find_free_port() -> int:
-    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
-        sock.bind(("127.0.0.1", 0))
-        return int(sock.getsockname()[1])
+_ROAR_PROXY_PORT = 19191
+
+
+def _can_connect_to_local_proxy(port: int) -> bool:
+    with contextlib.suppress(OSError), socket.create_connection(("127.0.0.1", port), timeout=0.25):
+        return True
+    return False
 
 
 @ray.remote(num_cpus=0)
 class RoarNodeAgent:
-    def __init__(self, job_id: str, log_dir: str) -> None:
+    def __init__(self, job_id: str) -> None:
         self._job_id = str(job_id)
-        self._log_dir = str(log_dir)
         self._proxy_process: subprocess.Popen | None = None
         self._proxy_port: int | None = None
         self._proxy_log_lines: list[str] = []
@@ -61,11 +65,30 @@ def _start_proxy(self) -> None:
         package_path = Path(__file__).resolve().parents[1]
         proxy_binary = tracer_backends.find_proxy_binary(package_path)
         if not proxy_binary:
+            print(f"[roar-agent] roar-proxy binary not found in {package_path}")
+            return
+        print(f"[roar-agent] found proxy binary: {proxy_binary}")
+
+        port = _ROAR_PROXY_PORT
+        if _can_connect_to_local_proxy(port):
+            self._proxy_port = port
+            print(f"[roar-agent] reusing existing local proxy on port {port}")
             return
 
-        port = _find_free_port()
         cmd = [proxy_binary, "--port", str(port), "--job-id", self._job_id]
 
+        # Only use ROAR_UPSTREAM_S3_ENDPOINT — never fall back to AWS_ENDPOINT_URL.
+        # By the time the node agent runs on a worker, AWS_ENDPOINT_URL has been
+        # overwritten to http://127.0.0.1:19191 (the proxy itself) by _ray_job_submit.py.
+        # Using it as --upstream would make the proxy forward to itself → 502.
+        upstream = os.environ.get("ROAR_UPSTREAM_S3_ENDPOINT")
+        if upstream:
+            cmd.extend(["--upstream", upstream])
+            print(f"[roar-agent] upstream: {upstream}")
+        else:
+            print("[roar-agent] no upstream set, proxy will use default AWS")
+
+        print(f"[roar-agent] starting proxy: {' '.join(cmd)}")
         process = subprocess.Popen(
             cmd,
             stdout=subprocess.PIPE,
@@ -99,6 +122,11 @@ def _reader() -> None:
                 return
 
             if process.poll() is not None:
+                with self._log_lock:
+                    output = "\n".join(self._proxy_log_lines[-20:])
+                print(f"[roar-agent] proxy process exited early (rc={process.returncode})")
+                print(f"[roar-agent] proxy cmd: {' '.join(cmd)}")
+                print(f"[roar-agent] proxy output:\n{output}")
                 return
 
             time.sleep(0.05)
@@ -135,5 +163,17 @@ def collect_logs(self) -> dict[str, Any]:
             "proxy_log_lines": log_lines,
         }
 
+    def get_log_entries_since(self, since_index: int) -> dict[str, Any]:
+        """Return proxy log entries added after since_index."""
+        with self._log_lock:
+            new_lines = self._proxy_log_lines[since_index:]
+            current_index = len(self._proxy_log_lines)
+        return {
+            "entries": new_lines,
+            "current_index": current_index,
+            "node_id": self._node_id,
+            "proxy_port": self._proxy_port,
+        }
+
     def shutdown(self) -> None:
         self._terminate_proxy()
diff --git a/roar/ray/proxy_fragments.py b/roar/ray/proxy_fragments.py
new file mode 100644
index 00000000..a04babaa
--- /dev/null
+++ b/roar/ray/proxy_fragments.py
@@ -0,0 +1,99 @@
+from __future__ import annotations
+
+import os
+from collections.abc import Sequence
+
+from roar.ray.collector import collect_fragments
+from roar.ray.fragment import ArtifactRef, TaskFragment, derive_task_uid
+from roar.ray.glaas_fragment_streamer import GlaasFragmentStreamer
+from roar.services.execution.proxy import S3LogEntry
+
+_S3_WRITE_OPS = frozenset({"PutObject", "UploadPart", "CompleteMultipartUpload", "DeleteObject"})
+
+
+def entry_to_ref(entry: S3LogEntry) -> tuple[str, ArtifactRef] | None:
+    operation = str(entry.operation or "").strip()
+    if not operation:
+        return None
+
+    kind = "write" if operation in _S3_WRITE_OPS else "read"
+    path = f"s3://{entry.bucket}/{entry.key}"
+    etag = str(entry.etag or "").strip() or None
+    return (
+        kind,
+        ArtifactRef(
+            path=path,
+            hash=etag,
+            hash_algorithm="etag" if etag else "",
+            size=int(entry.size_bytes or 0),
+            capture_method="proxy",
+        ),
+    )
+
+
+def build_proxy_fragment(
+    entries: Sequence[S3LogEntry],
+    *,
+    function_name: str,
+    task_id: str,
+    parent_job_uid: str | None,
+    started_at: float,
+    ended_at: float,
+    exit_code: int,
+    node_id: str = "driver",
+    recorded_at: float | None = None,
+) -> TaskFragment | None:
+    roar_job_id = str(parent_job_uid or os.environ.get("ROAR_JOB_ID", "default"))
+    fragment = TaskFragment(
+        job_uid=derive_task_uid(roar_job_id, task_id),
+        parent_job_uid=roar_job_id,
+        ray_task_id=task_id,
+        ray_worker_id="",
+        ray_node_id=node_id,
+        ray_actor_id=None,
+        function_name=function_name,
+        started_at=started_at,
+        ended_at=ended_at,
+        exit_code=exit_code,
+        recorded_at=recorded_at,
+    )
+
+    for entry in entries:
+        parsed = entry_to_ref(entry)
+        if parsed is None:
+            continue
+        kind, ref = parsed
+        if kind == "write":
+            fragment.writes.append(ref)
+        else:
+            fragment.reads.append(ref)
+
+    if not fragment.reads and not fragment.writes:
+        return None
+    return fragment
+
+
+def emit_fragment(fragment: TaskFragment) -> None:
+    session_id = str(os.environ.get("ROAR_SESSION_ID", "")).strip()
+    token = str(os.environ.get("ROAR_FRAGMENT_TOKEN", "")).strip()
+    glaas_url = str(os.environ.get("GLAAS_URL") or "").strip()
+
+    if session_id and token and glaas_url:
+        streamer = GlaasFragmentStreamer(
+            session_id=session_id,
+            token=token,
+            glaas_url=glaas_url,
+        )
+        streamer.append_fragment(fragment.to_dict())
+        streamer.close()
+        return
+
+    project_dir = str(os.environ.get("ROAR_PROJECT_DIR", "")).strip()
+    if not project_dir:
+        return
+
+    collect_fragments(
+        fragments=[fragment.to_dict()],
+        project_dir=project_dir,
+        driver_job_uid=os.environ.get("ROAR_JOB_ID"),
+    )
diff --git a/roar/ray/roar_worker.py b/roar/ray/roar_worker.py
index 9ed8088d..13335e64 100644
--- a/roar/ray/roar_worker.py
+++ b/roar/ray/roar_worker.py
@@ -2,14 +2,83 @@
 
 import atexit
 import builtins
+import collections
+import contextlib
+import functools
 import hashlib
 import os
+import queue
+import re
+import socket
+import struct
+import subprocess
 import sys
+import tempfile
+import threading
 import time
 from collections.abc import Callable
-from typing import Any
+from pathlib import Path
+from typing import Any, cast
 
+from roar.ray.collector import collect_fragments
 from roar.ray.fragment import ArtifactRef, TaskFragment, derive_task_uid
+from roar.ray.glaas_fragment_streamer import GlaasFragmentStreamer
+
+IOEvent = collections.namedtuple(
+    "IOEvent",
+    [
+        "kind",
+        "task_id",
+        "function_name",
+        "path",
+        "hash_value",
+        "hash_algorithm",
+        "size",
+        "capture_method",
+    ],
+)
+
+_event_queue: queue.Queue[IOEvent | None] = queue.Queue()
+_collector_thread: threading.Thread | None = None
+_shutdown_event = threading.Event()
+_native_events_buffer: list[tuple[str, str, ArtifactRef]] = []
+_native_lock = threading.Lock()
+_native_child_task_ids: dict[int, str] = {}
+_native_child_task_lock = threading.Lock()
+_native_thread_task_ids: dict[int, str] = {}
+_recent_native_thread_task_ids: dict[int, tuple[str, float]] = {}
+_native_thread_task_lock = threading.Lock()
+_native_task_launch_context = threading.local()
+_native_threading_patch_lock = threading.Lock()
+_native_threading_patch_refcount = 0
+_task_timing_state: dict[str, dict[str, Any]] = {}
+_task_timing_lock = threading.Lock()
+_direct_streamer: GlaasFragmentStreamer | None = None
+_direct_streamer_lock = threading.Lock()
+_s3_tracking_scope = threading.local()
+
+_FLUSH_INTERVAL_SECONDS = float(os.environ.get("ROAR_FRAGMENT_FLUSH_INTERVAL", "2.0"))
+_IDLE_FLUSH_INTERVAL_SECONDS = float(os.environ.get("ROAR_FRAGMENT_IDLE_FLUSH_INTERVAL", "0.25"))
+_FLUSH_THRESHOLD_EVENTS = int(os.environ.get("ROAR_FRAGMENT_FLUSH_THRESHOLD", "200"))
+_TASK_BOUNDARY_NATIVE_FLUSH_WAIT_SECONDS = float(
+    os.environ.get("ROAR_RAY_TASK_NATIVE_FLUSH_WAIT", "0.2")
+)
+_TASK_BOUNDARY_NATIVE_QUIET_PERIOD_SECONDS = float(
+    os.environ.get("ROAR_RAY_TASK_NATIVE_FLUSH_QUIET", "0.02")
+)
+_TASK_BOUNDARY_NATIVE_POLL_INTERVAL_SECONDS = float(
+    os.environ.get("ROAR_RAY_TASK_NATIVE_FLUSH_POLL", "0.01")
+)
+_RECENT_NATIVE_THREAD_BINDING_LINGER_SECONDS = float(
+    os.environ.get("ROAR_RAY_NATIVE_THREAD_BINDING_LINGER", "1.0")
+)
+
+_PROXY_LOG_RE = re.compile(
+    r"^\[S3:(\w+)\]\s+(s3://[^\s]+)"
+    r"(?:\s+\((\d+)\s+bytes\))?"
+    r"(?:\s+etag=(\S+))?"
+)
+_S3_WRITE_OPS = frozenset({"PutObject", "UploadPart", "CompleteMultipartUpload", "DeleteObject"})
 
 _real_open = builtins.open
 
@@ -22,11 +91,18 @@
 else:
     _blake3_constructor = _blake3_import
 
-_current_task_id: str | None = None
-_current_fragment: TaskFragment | None = None
-_collector_actor: Any = None
 _startup_complete = False
 _actor_attribution_mode = "per_call"
+_proxy_configured = False
+_DEFAULT_LOCAL_PROXY_PORT = 19191
+_real_subprocess_popen = subprocess.Popen
+_real_thread_start = threading.Thread.start
+
+
+def _get_logger():
+    from roar.core.logging import get_logger
+
+    return get_logger()
 
 
 def _active_hash_algorithm() -> str:
@@ -51,16 +127,28 @@ def _to_text(value: Any) -> str | None:
     return text or None
 
 
-def _get_task_id() -> str | None:
+def _get_current_task_id() -> str:
     try:
         ray = sys.modules.get("ray")
         if ray is None:
-            return None
+            return ""
         ctx = ray.get_runtime_context()
         task_id = ctx.get_task_id()
-        return _to_text(task_id)
+        return _to_text(task_id) or ""
     except Exception:
-        return None
+        return ""
+
+
+def _resolved_task_id() -> str:
+    task_id = _get_current_task_id()
+    if task_id:
+        return task_id
+
+    bound_task_id = _bound_native_task_id_for_event(os.getpid(), threading.get_native_id())
+    if bound_task_id:
+        return bound_task_id
+
+    return _current_native_launch_task_id()
 
 
 def _get_actor_id() -> str | None:
@@ -119,16 +207,29 @@ def _get_actor_attribution() -> str:
 
 
 def _get_task_function_name() -> str:
+    def _resolve_name(candidate: Any) -> str:
+        try:
+            value = candidate() if callable(candidate) else candidate
+        except Exception:
+            return ""
+        text = _to_text(value) or ""
+        return "" if text == "unknown" else text
+
     try:
         ray = sys.modules.get("ray")
         if ray is None:
             return "unknown"
         ctx = ray.get_runtime_context()
         for attr in ("get_task_function_name", "get_task_name"):
-            getter = getattr(ctx, attr, None)
-            if not callable(getter):
-                continue
-            name = _to_text(getter())
+            name = _resolve_name(getattr(ctx, attr, None))
+            if name:
+                return name
+
+        worker = getattr(ray, "_private", None)
+        worker = getattr(worker, "worker", None)
+        global_worker = getattr(worker, "global_worker", None)
+        for attr in ("current_task_function_name", "current_task_name"):
+            name = _resolve_name(getattr(global_worker, attr, None))
             if name:
                 return name
     except Exception:
@@ -136,9 +237,13 @@ def _get_task_function_name() -> str:
     return "unknown"
 
 
-def _start_fragment(task_id: str) -> TaskFragment:
+def _start_fragment(task_id: str, function_name: str = "") -> TaskFragment:
     now = time.time()
     roar_job_id = str(os.environ.get("ROAR_JOB_ID", "default"))
+    started_at = _task_started_at(task_id) or now
+    resolved_function_name = (
+        function_name or _task_function_name(task_id) or _get_task_function_name()
+    )
     return TaskFragment(
         job_uid=derive_task_uid(roar_job_id, task_id),
         parent_job_uid=str(os.environ.get("ROAR_DRIVER_JOB_UID", "")),
@@ -146,50 +251,839 @@ def _start_fragment(task_id: str) -> TaskFragment:
         ray_worker_id=_get_worker_id() or "",
         ray_node_id=_get_node_id() or "",
         ray_actor_id=_get_actor_id(),
-        function_name=_get_task_function_name(),
-        started_at=now,
+        function_name=resolved_function_name,
+        started_at=started_at,
         ended_at=now,
         exit_code=0,
     )
 
 
-def _finalise_fragment(fragment: TaskFragment) -> None:
+def _register_task_timing(task_id: str, function_name: str) -> None:
+    if not task_id:
+        return
+    with _task_timing_lock:
+        _task_timing_state[task_id] = {
+            "started_at": time.time(),
+            "function_name": function_name or "",
+            "lineage_observed": False,
+        }
+
+
+def _mark_task_lineage_observed(task_id: str, function_name: str = "") -> None:
+    if not task_id:
+        return
+    with _task_timing_lock:
+        state = _task_timing_state.get(task_id)
+        if state is None:
+            state = {
+                "started_at": time.time(),
+                "function_name": function_name or "",
+                "lineage_observed": True,
+            }
+            _task_timing_state[task_id] = state
+            return
+        state["lineage_observed"] = True
+        if function_name and not state.get("function_name"):
+            state["function_name"] = function_name
+
+
+def _task_started_at(task_id: str) -> float | None:
+    if not task_id:
+        return None
+    with _task_timing_lock:
+        state = _task_timing_state.get(task_id)
+    if not isinstance(state, dict):
+        return None
+    started_at = state.get("started_at")
+    if isinstance(started_at, (int, float)):
+        return float(started_at)
+    return None
+
+
+def _task_function_name(task_id: str) -> str:
+    if not task_id:
+        return ""
+    with _task_timing_lock:
+        state = _task_timing_state.get(task_id)
+    if not isinstance(state, dict):
+        return ""
+    function_name = state.get("function_name")
+    return function_name if isinstance(function_name, str) else ""
+
+
+def _emit_fragment(fragment: TaskFragment) -> None:
+    session_id = str(os.environ.get("ROAR_SESSION_ID", "")).strip()
+    token = str(os.environ.get("ROAR_FRAGMENT_TOKEN", "")).strip()
+    glaas_url = str(os.environ.get("GLAAS_URL") or "").strip()
+
+    if session_id and token and glaas_url:
+        streamer = GlaasFragmentStreamer(
+            session_id=session_id,
+            token=token,
+            glaas_url=glaas_url,
+        )
+        streamer.append_fragment(fragment.to_dict())
+        streamer.close()
+        return
+
+    project_dir = str(os.environ.get("ROAR_PROJECT_DIR", "")).strip()
+    if not project_dir:
+        return
+
+    collect_fragments(
+        fragments=[fragment.to_dict()],
+        project_dir=project_dir,
+        driver_job_uid=os.environ.get("ROAR_JOB_ID"),
+    )
+
+
+def _emit_task_timing_fragment(task_id: str, *, function_name: str, exit_code: int) -> None:
+    if not task_id:
+        return
+    with _task_timing_lock:
+        state = _task_timing_state.pop(task_id, None)
+    if not isinstance(state, dict) or not state.get("lineage_observed"):
+        return
+
+    started_at = state.get("started_at")
+    if not isinstance(started_at, (int, float)):
+        return
+
+    fragment = _start_fragment(task_id, function_name or str(state.get("function_name") or ""))
+    fragment.started_at = float(started_at)
     fragment.ended_at = time.time()
+    fragment.exit_code = exit_code
     _emit_fragment(fragment)
 
 
-def _flush_current_fragment() -> None:
-    global _current_fragment, _current_task_id
+def _append_fragment_ref(fragment: TaskFragment, kind: str, ref: ArtifactRef) -> None:
+    if kind == "write":
+        fragment.writes.append(ref)
+        return
+    fragment.reads.append(ref)
+
 
-    fragment = _current_fragment
-    if fragment is None:
+def _emit_local_event_immediately(event: IOEvent) -> None:
+    streamer_instance = _ensure_direct_streamer()
+    if streamer_instance is None:
         return
 
-    _current_fragment = None
-    _current_task_id = None
-    _finalise_fragment(fragment)
+    fragment = _start_fragment(event.task_id, event.function_name)
+    ref = ArtifactRef(
+        path=event.path,
+        hash=event.hash_value,
+        hash_algorithm=event.hash_algorithm,
+        size=event.size,
+        capture_method=event.capture_method,
+    )
+    _append_fragment_ref(fragment, event.kind, ref)
+    fragment.ended_at = time.time()
 
+    with _direct_streamer_lock:
+        try:
+            streamer_instance.append_fragment(fragment.to_dict())
+            if not streamer_instance.flush():
+                _get_logger().warning(
+                    "Failed to eagerly flush Ray local event for task %s",
+                    fragment.ray_task_id,
+                )
+        except Exception as exc:
+            _get_logger().warning("Failed to eagerly append Ray local event: %s", exc)
 
-def _check_task_boundary() -> None:
-    """Called before each I/O event; rotates fragment if task changed."""
-    global _current_fragment, _current_task_id
 
-    task_id = _get_task_id()
-    actor_id = _get_actor_id()
-    attribution = _actor_attribution_mode
+def _emit_native_entries_immediately(task_id: str, entries: list[tuple[str, ArtifactRef]]) -> None:
+    if not task_id or not entries:
+        return
 
-    boundary_id = task_id
-    if attribution == "per_actor" and actor_id:
-        boundary_id = actor_id
+    _mark_task_lineage_observed(task_id)
+    streamer_instance = _ensure_direct_streamer()
+    if streamer_instance is None:
+        return
+
+    fragment = _start_fragment(task_id)
+    for kind, ref in entries:
+        _append_fragment_ref(fragment, kind, ref)
+    fragment.ended_at = time.time()
+
+    with _direct_streamer_lock:
+        try:
+            streamer_instance.append_fragment(fragment.to_dict())
+            if not streamer_instance.flush():
+                _get_logger().warning(
+                    "Failed to eagerly flush Ray native events for task %s",
+                    fragment.ray_task_id,
+                )
+        except Exception as exc:
+            _get_logger().warning("Failed to eagerly append Ray native events: %s", exc)
+
+
+def _drain_native_tracer_events() -> list[tuple[str, str, ArtifactRef]]:
+    """Drain buffered native tracer events. Called by collector thread."""
+    with _native_lock:
+        events = list(_native_events_buffer)
+        _native_events_buffer.clear()
+    return events
+
+
+def _register_native_child_pid(pid: int | None, task_id: str) -> None:
+    if not task_id or not isinstance(pid, int) or pid <= 0:
+        return
+    with _native_child_task_lock:
+        _native_child_task_ids[pid] = task_id
+
+
+def _unregister_native_child_pid(pid: int | None) -> None:
+    if not isinstance(pid, int) or pid <= 0:
+        return
+    with _native_child_task_lock:
+        _native_child_task_ids.pop(pid, None)
+
+
+def _register_native_thread_task(thread_id: int | None, task_id: str) -> None:
+    if not task_id or not isinstance(thread_id, int) or thread_id <= 0:
+        return
+    with _native_thread_task_lock:
+        _native_thread_task_ids[thread_id] = task_id
+        _recent_native_thread_task_ids.pop(thread_id, None)
+
+
+def _unregister_native_thread_task(thread_id: int | None, task_id: str | None = None) -> None:
+    if not isinstance(thread_id, int) or thread_id <= 0:
+        return
+    with _native_thread_task_lock:
+        current = _native_thread_task_ids.get(thread_id)
+        if current is None:
+            return
+        if task_id is not None and current != task_id:
+            return
+        _native_thread_task_ids.pop(thread_id, None)
+        _recent_native_thread_task_ids[thread_id] = (
+            current,
+            time.monotonic() + _RECENT_NATIVE_THREAD_BINDING_LINGER_SECONDS,
+        )
+
+
+def _recent_native_thread_task_id(thread_id: int) -> str:
+    with _native_thread_task_lock:
+        recent = _recent_native_thread_task_ids.get(thread_id)
+        if recent is None:
+            return ""
+        task_id, expires_at = recent
+        if time.monotonic() <= expires_at:
+            return task_id
+        _recent_native_thread_task_ids.pop(thread_id, None)
+        return ""
+
+
+def _bound_native_task_id_for_event(pid: int | None, thread_id: int | None) -> str:
+    if not isinstance(pid, int) or pid <= 0:
+        pid = None
+    if pid is not None and pid != os.getpid():
+        with _native_child_task_lock:
+            return _native_child_task_ids.get(pid, "")
+    if isinstance(thread_id, int) and thread_id > 0:
+        with _native_thread_task_lock:
+            current = _native_thread_task_ids.get(thread_id, "")
+        if current:
+            return current
+        return _recent_native_thread_task_id(thread_id)
+    with _native_child_task_lock:
+        return _native_child_task_ids.get(pid, "") if pid is not None else ""
+
+
+def _current_native_launch_task_id() -> str:
+    task_id = getattr(_native_task_launch_context, "task_id", "")
+    return task_id if isinstance(task_id, str) else ""
+
+
+def _flush_current_task_native_events_immediately() -> None:
+    task_id = _get_current_task_id()
+    if not task_id:
+        return
+
+    deadline = time.monotonic() + _TASK_BOUNDARY_NATIVE_FLUSH_WAIT_SECONDS
+    quiet_deadline: float | None = None
+    pending_by_task: dict[str, list[tuple[str, ArtifactRef]]] = {}
+
+    while True:
+        batch = _drain_native_tracer_events()
+        if batch:
+            for bound_task_id, kind, ref in batch:
+                resolved_task_id = bound_task_id or task_id
+                if not resolved_task_id:
+                    continue
+                pending_by_task.setdefault(resolved_task_id, []).append((kind, ref))
+            quiet_deadline = time.monotonic() + _TASK_BOUNDARY_NATIVE_QUIET_PERIOD_SECONDS
+
+        now = time.monotonic()
+        if quiet_deadline is not None:
+            if now >= quiet_deadline or now >= deadline:
+                break
+        elif now >= deadline:
+            break
+
+        remaining = deadline - now
+        if remaining <= 0:
+            break
+        time.sleep(min(_TASK_BOUNDARY_NATIVE_POLL_INTERVAL_SECONDS, remaining))
+
+    for resolved_task_id, entries in pending_by_task.items():
+        _emit_native_entries_immediately(resolved_task_id, entries)
+
+
+def _patch_subprocess_for_native_task_attribution() -> None:
+    current_popen = getattr(subprocess, "Popen", None)
+    if not isinstance(current_popen, type):
+        return
+    if getattr(current_popen, "_roar_patched", False):
+        return
+
+    class _TrackedPopen(_real_subprocess_popen):  # type: ignore[misc, valid-type]
+        _roar_patched = True
+
+        def __init__(self, *args, **kwargs):
+            super().__init__(*args, **kwargs)
+            _register_native_child_pid(self.pid, _current_native_launch_task_id())
+
+        def _roar_maybe_unregister(self, result: int | None = None) -> None:
+            if result is not None or self.returncode is not None:
+                _unregister_native_child_pid(self.pid)
+
+        def poll(self):
+            result = super().poll()
+            self._roar_maybe_unregister(result)
+            return result
+
+        def wait(self, timeout=None):
+            result = super().wait(timeout=timeout)
+            self._roar_maybe_unregister(result)
+            return result
+
+        def communicate(self, input=None, timeout=None):
+            result = super().communicate(input=input, timeout=timeout)
+            self._roar_maybe_unregister(0 if self.returncode is not None else None)
+            return result
+
+        def __exit__(self, exc_type, exc_val, exc_tb):
+            try:
+                return super().__exit__(exc_type, exc_val, exc_tb)
+            finally:
+                self._roar_maybe_unregister(0 if self.returncode is not None else None)
+
+    subprocess.Popen = _TrackedPopen  # type: ignore[misc]
+
+
+def _roar_thread_start(self, *args, **kwargs):
+    if not getattr(self, "_roar_native_thread_wrapped", False):
+        task_id = _current_native_launch_task_id()
+        if task_id:
+            original_run = self.run
+
+            @functools.wraps(original_run)
+            def _roar_run(*run_args, **run_kwargs):
+                thread_id = threading.get_native_id()
+                _register_native_thread_task(thread_id, task_id)
+                try:
+                    return original_run(*run_args, **run_kwargs)
+                finally:
+                    _unregister_native_thread_task(thread_id, task_id)
+
+            self.run = _roar_run  # type: ignore[method-assign]
+            self._roar_native_thread_wrapped = True  # type: ignore[attr-defined]
+
+    return _real_thread_start(self, *args, **kwargs)
+
+
+def _activate_threading_patch_for_native_task_attribution() -> None:
+    global _native_threading_patch_refcount
+
+    with _native_threading_patch_lock:
+        if _native_threading_patch_refcount == 0:
+            threading.Thread.start = _roar_thread_start  # type: ignore[method-assign]
+        _native_threading_patch_refcount += 1
+
+
+def _deactivate_threading_patch_for_native_task_attribution() -> None:
+    global _native_threading_patch_refcount
+
+    with _native_threading_patch_lock:
+        if _native_threading_patch_refcount <= 0:
+            _native_threading_patch_refcount = 0
+            threading.Thread.start = _real_thread_start  # type: ignore[method-assign]
+            return
+
+        _native_threading_patch_refcount -= 1
+        if _native_threading_patch_refcount == 0:
+            threading.Thread.start = _real_thread_start  # type: ignore[method-assign]
+
+
+def _wrap_task_executor_for_native_flush(
+    function: Callable[..., Any],
+    *,
+    function_name: str = "",
+) -> Callable[..., Any]:
+    if getattr(function, "_roar_native_flush_wrapped", False):
+        return function
+
+    @functools.wraps(function)
+    def _wrapped(*args, **kwargs):
+        task_id = _get_current_task_id()
+        resolved_function_name = function_name or _get_task_function_name()
+        thread_id = threading.get_native_id()
+        previous_launch_task_id = _current_native_launch_task_id()
+        _native_task_launch_context.task_id = task_id
+        _activate_threading_patch_for_native_task_attribution()
+        _register_native_thread_task(thread_id, task_id)
+        _register_task_timing(task_id, resolved_function_name)
+        exit_code = 0
+        try:
+            return function(*args, **kwargs)
+        except Exception:
+            exit_code = 1
+            raise
+        finally:
+            _deactivate_threading_patch_for_native_task_attribution()
+            if previous_launch_task_id:
+                _native_task_launch_context.task_id = previous_launch_task_id
+            else:
+                with contextlib.suppress(AttributeError):
+                    delattr(_native_task_launch_context, "task_id")
+            with contextlib.suppress(Exception):
+                _flush_current_task_native_events_immediately()
+            with contextlib.suppress(Exception):
+                _emit_task_timing_fragment(
+                    task_id,
+                    function_name=resolved_function_name,
+                    exit_code=exit_code,
+                )
+            _unregister_native_thread_task(thread_id, task_id)
+
+    cast(Any, _wrapped)._roar_native_flush_wrapped = True
+    for attr in ("name", "method"):
+        if hasattr(function, attr):
+            with contextlib.suppress(Exception):
+                setattr(_wrapped, attr, getattr(function, attr))
+    return _wrapped
+
+
+def _patch_ray_task_execution_for_native_flush() -> None:
+    try:
+        from ray._private.function_manager import FunctionActorManager, FunctionExecutionInfo
+    except Exception:
+        return
+
+    current_get_execution_info = getattr(FunctionActorManager, "get_execution_info", None)
+    if callable(current_get_execution_info) and not getattr(
+        current_get_execution_info, "_roar_patched", False
+    ):
+
+        def _roar_get_execution_info(self, job_id, function_descriptor):
+            info = current_get_execution_info(self, job_id, function_descriptor)
+            wrapped_function = _wrap_task_executor_for_native_flush(
+                info.function,
+                function_name=str(info.function_name or ""),
+            )
+            if wrapped_function is info.function:
+                return info
+
+            wrapped_info = FunctionExecutionInfo(
+                function=wrapped_function,
+                function_name=info.function_name,
+                max_calls=info.max_calls,
+            )
+
+            function_id = getattr(function_descriptor, "function_id", None)
+            if function_id is not None:
+                with contextlib.suppress(Exception):
+                    self._function_execution_info[function_id] = wrapped_info
+            return wrapped_info
+
+        _roar_get_execution_info._roar_patched = True  # type: ignore[attr-defined]
+        FunctionActorManager.get_execution_info = _roar_get_execution_info  # type: ignore[method-assign]
+
+    current_make_actor_method_executor = getattr(
+        FunctionActorManager,
+        "_make_actor_method_executor",
+        None,
+    )
+    if callable(current_make_actor_method_executor) and not getattr(
+        current_make_actor_method_executor, "_roar_patched", False
+    ):
+
+        def _roar_make_actor_method_executor(self, method_name, method):
+            wrapped_method = _wrap_task_executor_for_native_flush(
+                method,
+                function_name=str(method_name or ""),
+            )
+            return current_make_actor_method_executor(self, method_name, wrapped_method)
+
+        _roar_make_actor_method_executor._roar_patched = True  # type: ignore[attr-defined]
+        FunctionActorManager._make_actor_method_executor = _roar_make_actor_method_executor  # type: ignore[method-assign]
+
+
+def _start_native_tracer_socket() -> None:
+    """Bind to the Unix socket path set by roar_worker_wrapper.sh.
+
+    The wrapper script creates a temp directory and sets
+    ROAR_PRELOAD_TRACE_SOCK *before* exec-ing Python so that the
+    LD_PRELOAD .so caches the path on its first libc interposition.
+    We bind to that same path here — the .so will connect (or reconnect)
+    on its next I/O call.
+
+    If the env var isn't set (e.g. no wrapper), we create our own path,
+    but the .so won't find it unless it hasn't cached yet.
+    """
+    sock_path = os.environ.get("ROAR_PRELOAD_TRACE_SOCK")
+    if sock_path:
+        sock_dir = os.path.dirname(sock_path)
+    else:
+        sock_dir = tempfile.mkdtemp(prefix="roar-trace-")
+        sock_path = os.path.join(sock_dir, "trace.sock")
+        os.environ["ROAR_PRELOAD_TRACE_SOCK"] = sock_path
+
+    # Remove stale socket file if it exists (e.g. from a previous run)
+    with contextlib.suppress(FileNotFoundError):
+        os.unlink(sock_path)
+
+    server = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+    server.bind(sock_path)
+    server.listen(8)
+    server.settimeout(1.0)
+
+    with contextlib.suppress(Exception):
+        server.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 65536)
+
+    def _listener() -> None:
+        while not _shutdown_event.is_set():
+            try:
+                conn, _ = server.accept()
+                threading.Thread(
+                    target=_handle_preload_connection,
+                    args=(conn,),
+                    name="roar-preload-conn",
+                    daemon=True,
+                ).start()
+            except TimeoutError:
+                continue
+            except OSError:
+                break
+        server.close()
+        try:
+            os.unlink(sock_path)
+            os.rmdir(sock_dir)
+        except OSError:
+            pass
+
+    threading.Thread(
+        target=_listener,
+        name="roar-preload-listener",
+        daemon=True,
+    ).start()
+
+
+def _handle_preload_connection(conn: socket.socket) -> None:
+    """Read length-prefixed msgpack TraceEvent frames from one .so connection."""
+    conn.settimeout(1.0)
+    buf = bytearray()
+    try:
+        while not _shutdown_event.is_set():
+            try:
+                data = conn.recv(65536)
+                if not data:
+                    break
+                buf.extend(data)
+                _parse_and_buffer_frames(buf)
+            except TimeoutError:
+                continue
+            except OSError:
+                break
+    finally:
+        _parse_and_buffer_frames(buf)
+        conn.close()
+
+
+def _parse_and_buffer_frames(buf: bytearray) -> None:
+    """Extract complete frames from buffer, convert to ArtifactRef, buffer them."""
+    try:
+        import msgpack
+    except ImportError:
+        return
+
+    while len(buf) >= 4:
+        length = struct.unpack_from("<I", buf, 0)[0]
+        if len(buf) < 4 + length:
+            break
+        payload = buf[4 : 4 + length]
+        del buf[: 4 + length]
+        try:
+            event = msgpack.unpackb(payload, raw=False)
+        except Exception:
+            continue
+        if not isinstance(event, dict):
+            continue
+
+        kind_str = event.get("kind", "")
+        path = event.get("path", "")
+        if not isinstance(path, str) or not path or not _should_track_local_path(path):
+            continue
+        pid_value = event.get("pid")
+        pid = pid_value if isinstance(pid_value, int) else None
+        thread_id_value = event.get("thread_id")
+        thread_id = thread_id_value if isinstance(thread_id_value, int) else None
+
+        kind = "write" if kind_str == "write" else "read"
+        ref = ArtifactRef(
+            path=path,
+            hash=None,
+            hash_algorithm="",
+            size=0,
+            capture_method="native",
+        )
+        with _native_lock:
+            _native_events_buffer.append(
+                (_bound_native_task_id_for_event(pid, thread_id), kind, ref)
+            )
+
+
+def _parse_proxy_log_lines(lines: list[str]) -> list[tuple[str, ArtifactRef]]:
+    """Parse proxy log lines into (kind, ArtifactRef) pairs."""
+    results: list[tuple[str, ArtifactRef]] = []
+    for line in lines:
+        match = _PROXY_LOG_RE.match(line)
+        if not match:
+            continue
+        op, s3_uri, size_str, etag = match.groups()
+        if op in ("CreateMultipartUpload", "Other"):
+            continue
+        kind = "write" if op in _S3_WRITE_OPS else "read"
+        ref = ArtifactRef(
+            path=s3_uri,
+            hash=etag.strip('"') if etag else None,
+            hash_algorithm="etag" if etag else "",
+            size=int(size_str) if size_str else 0,
+            capture_method="proxy",
+        )
+        results.append((kind, ref))
+    return results
+
+
+def _is_loopback_proxy_endpoint(url: str) -> bool:
+    text = str(url).strip().lower()
+    return text.startswith("http://127.0.0.1:") or text.startswith("http://localhost:")
 
-    if boundary_id != _current_task_id:
-        if _current_fragment is not None:
-            _finalise_fragment(_current_fragment)
-        _current_task_id = boundary_id
-        if boundary_id:
-            _current_fragment = _start_fragment(boundary_id)
-        else:
-            _current_fragment = None
+
+def _local_proxy_port() -> int:
+    raw_value = str(os.environ.get("ROAR_PROXY_PORT", "")).strip()
+    if raw_value.isdigit():
+        port = int(raw_value)
+        if 1024 < port <= 65535:
+            return port
+    return _DEFAULT_LOCAL_PROXY_PORT
+
+
+def _ensure_direct_streamer() -> GlaasFragmentStreamer | None:
+    global _direct_streamer
+
+    if _direct_streamer is not None:
+        return _direct_streamer
+
+    session_id = os.environ.get("ROAR_SESSION_ID")
+    token = os.environ.get("ROAR_FRAGMENT_TOKEN")
+    glaas_url = os.environ.get("GLAAS_URL")
+    if not (session_id and token and glaas_url):
+        return None
+
+    with _direct_streamer_lock:
+        if _direct_streamer is not None:
+            return _direct_streamer
+
+        try:
+            _direct_streamer = GlaasFragmentStreamer(
+                session_id=session_id,
+                token=token,
+                glaas_url=glaas_url,
+            )
+        except Exception as exc:
+            _get_logger().warning(
+                "Failed to initialize direct Ray fragment streamer for session %s: %s",
+                session_id,
+                exc,
+            )
+            return None
+
+    return _direct_streamer
+
+
+def _start_collector() -> None:
+    global _collector_thread
+
+    if _collector_thread is not None and _collector_thread.is_alive():
+        return
+
+    _shutdown_event.clear()
+
+    def _collector_loop() -> None:
+        fragment: TaskFragment | None = None
+        events_since_flush = 0
+        last_flush = time.monotonic()
+        last_activity = last_flush
+
+        def _ensure_streamer() -> GlaasFragmentStreamer | None:
+            return _ensure_direct_streamer()
+
+        def _flush_fragment_batch(*, continuation: bool) -> None:
+            nonlocal events_since_flush, fragment, last_flush
+
+            if fragment is None or not (fragment.reads or fragment.writes):
+                if continuation and fragment is not None:
+                    fragment = _start_fragment(fragment.ray_task_id)
+                    last_flush = time.monotonic()
+                return
+
+            current_task_id = fragment.ray_task_id
+            fragment.ended_at = time.time()
+            streamer_instance = _ensure_streamer()
+            if streamer_instance is not None:
+                with _direct_streamer_lock:
+                    try:
+                        streamer_instance.append_fragment(fragment.to_dict())
+                        if not streamer_instance.flush():
+                            _get_logger().warning(
+                                "Failed to flush Ray fragment batch for task %s",
+                                fragment.ray_task_id,
+                            )
+                    except Exception as exc:
+                        _get_logger().warning("Failed to append Ray fragment: %s", exc)
+
+            fragment = _start_fragment(current_task_id) if continuation else None
+            events_since_flush = 0
+            last_flush = time.monotonic()
+
+        def _ensure_fragment(task_id: str, function_name: str = "") -> TaskFragment:
+            nonlocal fragment, events_since_flush, last_flush
+
+            normalized_task_id = task_id or ""
+            if fragment is None:
+                fragment = _start_fragment(normalized_task_id, function_name)
+                events_since_flush = 0
+                last_flush = time.monotonic()
+            elif normalized_task_id != fragment.ray_task_id:
+                _flush_fragment_batch(continuation=False)
+                fragment = _start_fragment(normalized_task_id, function_name)
+                events_since_flush = 0
+                last_flush = time.monotonic()
+            return fragment
+
+        def _process_event(event: IOEvent) -> None:
+            nonlocal events_since_flush, last_activity
+
+            _mark_task_lineage_observed(event.task_id, event.function_name)
+            current_fragment = _ensure_fragment(event.task_id, event.function_name)
+            ref = ArtifactRef(
+                path=event.path,
+                hash=event.hash_value,
+                hash_algorithm=event.hash_algorithm,
+                size=event.size,
+                capture_method=event.capture_method,
+            )
+            _append_fragment_ref(current_fragment, event.kind, ref)
+            events_since_flush += 1
+            last_activity = time.monotonic()
+
+        def _process_native_entries(entries: list[tuple[str, str, ArtifactRef]]) -> None:
+            nonlocal events_since_flush, last_activity
+
+            if not entries:
+                return
+
+            current_task_id = _get_current_task_id()
+            for bound_task_id, kind, ref in entries:
+                resolved_task_id = bound_task_id or current_task_id
+                _mark_task_lineage_observed(resolved_task_id)
+                current_fragment = _ensure_fragment(resolved_task_id)
+                _append_fragment_ref(current_fragment, kind, ref)
+                events_since_flush += 1
+            last_activity = time.monotonic()
+
+        while True:
+            if _shutdown_event.is_set():
+                try:
+                    event = _event_queue.get_nowait()
+                except queue.Empty:
+                    break
+            else:
+                try:
+                    event = _event_queue.get(timeout=0.1)
+                except queue.Empty:
+                    event = None
+
+            if event is not None:
+                _process_event(event)
+                while True:
+                    try:
+                        queued_event = _event_queue.get_nowait()
+                    except queue.Empty:
+                        break
+                    if queued_event is None:
+                        _shutdown_event.set()
+                        break
+                    _process_event(queued_event)
+            elif _shutdown_event.is_set():
+                break
+
+            native_entries = _drain_native_tracer_events()
+            if native_entries:
+                _process_native_entries(native_entries)
+
+            now = time.monotonic()
+            if events_since_flush > 0 and (
+                events_since_flush >= _FLUSH_THRESHOLD_EVENTS
+                or now - last_flush >= _FLUSH_INTERVAL_SECONDS
+                or now - last_activity >= _IDLE_FLUSH_INTERVAL_SECONDS
+            ):
+                _flush_fragment_batch(continuation=True)
+
+        while True:
+            try:
+                queued_event = _event_queue.get_nowait()
+            except queue.Empty:
+                break
+            if queued_event is None:
+                continue
+            _process_event(queued_event)
+
+        _process_native_entries(_drain_native_tracer_events())
+        _flush_fragment_batch(continuation=False)
+        _close_direct_streamer()
+
+    _collector_thread = threading.Thread(
+        target=_collector_loop,
+        name="roar-fragment-collector",
+        daemon=True,
+    )
+    _collector_thread.start()
+
+
+def _shutdown_collector() -> None:
+    _shutdown_event.set()
+    _event_queue.put(None)
+    if _collector_thread and _collector_thread.is_alive():
+        _collector_thread.join(timeout=10)
+    _close_direct_streamer()
+
+
+def _close_direct_streamer() -> None:
+    global _direct_streamer
+
+    with _direct_streamer_lock:
+        if _direct_streamer is None:
+            return
+        try:
+            _direct_streamer.close()
+        except Exception as exc:
+            _get_logger().warning("Failed to close direct Ray fragment streamer: %s", exc)
+        finally:
+            _direct_streamer = None
 
 
 def _is_write_mode(mode: str) -> bool:
@@ -197,55 +1091,66 @@ def _is_write_mode(mode: str) -> bool:
 
 
 def _should_track_local_path(path: str) -> bool:
-    return path.startswith("/shared/")
+    normalized = os.path.abspath(path)
+    return not normalized.startswith(("/proc/", "/sys/", "/dev/"))
 
 
-def _log_write(
+def _log_read(
     *,
     path: str,
     hash_value: str | None,
     hash_algorithm: str,
     size: int,
     capture_method: str,
+    task_id: str | None = None,
+    function_name: str | None = None,
 ) -> None:
-    if _current_fragment is None:
-        return
-
-    _current_fragment.writes.append(
-        ArtifactRef(
-            path=path,
-            hash=hash_value,
-            hash_algorithm=hash_algorithm,
-            size=size,
-            capture_method=capture_method,
-        )
+    resolved_task_id = task_id if task_id is not None else _resolved_task_id()
+    resolved_function_name = (
+        function_name if function_name is not None else _get_task_function_name()
     )
-    _current_fragment.ended_at = time.time()
-    _emit_fragment(_current_fragment)
+    event = IOEvent(
+        "read",
+        resolved_task_id,
+        resolved_function_name,
+        path,
+        hash_value,
+        hash_algorithm,
+        size,
+        capture_method,
+    )
+    _mark_task_lineage_observed(resolved_task_id, resolved_function_name)
+    _event_queue.put(event)
+    _emit_local_event_immediately(event)
 
 
-def _log_read(
+def _log_write(
     *,
     path: str,
     hash_value: str | None,
     hash_algorithm: str,
     size: int,
     capture_method: str,
+    task_id: str | None = None,
+    function_name: str | None = None,
 ) -> None:
-    if _current_fragment is None:
-        return
-
-    _current_fragment.reads.append(
-        ArtifactRef(
-            path=path,
-            hash=hash_value,
-            hash_algorithm=hash_algorithm,
-            size=size,
-            capture_method=capture_method,
-        )
+    resolved_task_id = task_id if task_id is not None else _resolved_task_id()
+    resolved_function_name = (
+        function_name if function_name is not None else _get_task_function_name()
     )
-    _current_fragment.ended_at = time.time()
-    _emit_fragment(_current_fragment)
+    event = IOEvent(
+        "write",
+        resolved_task_id,
+        resolved_function_name,
+        path,
+        hash_value,
+        hash_algorithm,
+        size,
+        capture_method,
+    )
+    _mark_task_lineage_observed(resolved_task_id, resolved_function_name)
+    _event_queue.put(event)
+    _emit_local_event_immediately(event)
 
 
 class _TrackedWriteFile:
@@ -320,7 +1225,9 @@ def __getattr__(self, name: str):
 
 
 def _tracking_open(*args, **kwargs):
-    _check_task_boundary()
+    if _startup_complete and not _proxy_configured:
+        _configure_local_proxy_endpoint()
+
     handle = _real_open(*args, **kwargs)
 
     raw_path = args[0] if args else kwargs.get("file")
@@ -344,6 +1251,59 @@ def _tracking_open(*args, **kwargs):
     return handle
 
 
+def _patch_pandas_parquet() -> None:
+    try:
+        import pandas as pd
+    except Exception:
+        return
+
+    original_to_parquet = getattr(pd.DataFrame, "to_parquet", None)
+    if not callable(original_to_parquet):
+        return
+    if getattr(original_to_parquet, "_roar_worker_patched", False):
+        return
+
+    def _tracked_to_parquet(self, path, *args, **kwargs):
+        result = original_to_parquet(self, path, *args, **kwargs)
+        try:
+            if isinstance(path, (str, bytes, os.PathLike)):
+                resolved = os.path.abspath(os.fspath(path))
+                if _should_track_local_path(resolved):
+                    _log_write(
+                        path=resolved,
+                        hash_value=None,
+                        hash_algorithm=_active_hash_algorithm(),
+                        size=0,
+                        capture_method="tracer",
+                    )
+        except Exception:
+            pass
+        return result
+
+    _tracked_to_parquet._roar_worker_patched = True  # type: ignore[attr-defined]
+    pd.DataFrame.to_parquet = _tracked_to_parquet
+
+
+def _patch_tempfile() -> None:
+    if getattr(tempfile, "_roar_worker_tempfile_patched", False):
+        return
+
+    real_named_temporary_file = tempfile.NamedTemporaryFile
+
+    def _tracked_named_temporary_file(*args, **kwargs):
+        handle = real_named_temporary_file(*args, **kwargs)
+        try:
+            path = os.path.abspath(os.fspath(handle.name))
+            if _should_track_local_path(path) and _is_write_mode(str(getattr(handle, "mode", ""))):
+                return _TrackedWriteFile(handle, path=path, capture_method="python")
+        except Exception:
+            pass
+        return handle
+
+    tempfile.NamedTemporaryFile = _tracked_named_temporary_file
+    tempfile._roar_worker_tempfile_patched = True  # type: ignore[attr-defined]
+
+
 def _normalize_etag(value: Any) -> str | None:
     text = _to_text(value)
     if not text:
@@ -353,7 +1313,9 @@ def _normalize_etag(value: Any) -> str | None:
     return text or None
 
 
-def _extract_bucket_key(args, kwargs) -> tuple[str | None, str | None]:
+def _extract_bucket_key(
+    args: tuple[Any, ...], kwargs: dict[str, Any]
+) -> tuple[str | None, str | None]:
     bucket = kwargs.get("Bucket")
     key = kwargs.get("Key")
     if bucket and key:
@@ -363,22 +1325,7 @@ def _extract_bucket_key(args, kwargs) -> tuple[str | None, str | None]:
     return None, None
 
 
-def _extract_upload_file_params(args, kwargs) -> tuple[str | None, str | None, str | None]:
-    filename = kwargs.get("Filename")
-    bucket = kwargs.get("Bucket")
-    key = kwargs.get("Key")
-
-    if filename is None and len(args) >= 1:
-        filename = args[0]
-    if bucket is None and len(args) >= 2:
-        bucket = args[1]
-    if key is None and len(args) >= 3:
-        key = args[2]
-
-    return _to_text(filename), _to_text(bucket), _to_text(key)
-
-
-def _body_size_bytes(body: Any) -> int:
+def _payload_size_bytes(body: Any) -> int:
     if body is None:
         return 0
     if isinstance(body, str):
@@ -386,194 +1333,316 @@ def _body_size_bytes(body: Any) -> int:
     if isinstance(body, (bytes, bytearray, memoryview)):
         return len(body)
 
-    seek = getattr(body, "seek", None)
+    try:
+        length = len(body)  # type: ignore[arg-type]
+    except Exception:
+        length = None
+    if isinstance(length, int) and length >= 0:
+        return length
+
     tell = getattr(body, "tell", None)
-    if callable(seek) and callable(tell):
+    seek = getattr(body, "seek", None)
+    if callable(tell) and callable(seek):
         try:
+            current = int(tell())
             seek(0, os.SEEK_END)
-            size_value = tell()
-            seek(0)
-            if isinstance(size_value, int):
-                return max(0, size_value)
-            return max(0, int(size_value))
+            end = int(tell())
+            seek(current, os.SEEK_SET)
+            return max(0, end)
         except Exception:
             return 0
 
     return 0
 
 
-def _wrap_s3_client(client):
-    if getattr(client, "_roar_s3_wrapped", False):
-        return client
+def _response_size_bytes(response: Any) -> int:
+    if not isinstance(response, dict):
+        return 0
+    for field in ("ContentLength", "Size", "content_length", "size"):
+        value = response.get(field)
+        if isinstance(value, int) and value >= 0:
+            return value
+        if isinstance(value, str) and value.isdigit():
+            return int(value)
+    return 0
 
-    real_put_object = getattr(client, "put_object", None)
-    if callable(real_put_object):
 
-        def _tracked_put_object(*args, **kwargs):
-            _check_task_boundary()
-            response = real_put_object(*args, **kwargs)
-            bucket, key = _extract_bucket_key(args, kwargs)
-            if bucket and key and _current_fragment is not None:
-                body = kwargs.get("Body")
-                size = _body_size_bytes(body)
-
-                _log_write(
-                    path=f"s3://{bucket}/{key}",
-                    hash_value=_normalize_etag(
-                        response.get("ETag") if isinstance(response, dict) else None
-                    ),
-                    hash_algorithm="etag",
-                    size=size,
-                    capture_method="proxy",
-                )
-            return response
+def _log_s3_write(*args, **kwargs) -> None:
+    task_id = str(kwargs.pop("_roar_task_id", "") or "")
+    function_name = str(kwargs.pop("_roar_function_name", "") or "")
+    if not task_id:
+        task_id = _resolved_task_id()
+    if not function_name:
+        function_name = _get_task_function_name()
+    if not task_id:
+        return
 
-        client.put_object = _tracked_put_object
+    bucket, key = _extract_bucket_key(args, kwargs)
+    if not bucket or not key:
+        return
 
-    real_upload_file = getattr(client, "upload_file", None)
-    if callable(real_upload_file):
+    response = kwargs.pop("_roar_response", None)
+    size = _payload_size_bytes(kwargs.get("Body"))
+    etag = _normalize_etag(response.get("ETag")) if isinstance(response, dict) else None
+    _log_write(
+        path=f"s3://{bucket}/{key}",
+        hash_value=etag,
+        hash_algorithm="etag" if etag else "",
+        size=size,
+        capture_method="proxy",
+        task_id=task_id,
+        function_name=function_name,
+    )
 
-        def _tracked_upload_file(*args, **kwargs):
-            _check_task_boundary()
-            response = real_upload_file(*args, **kwargs)
-            filename, bucket, key = _extract_upload_file_params(args, kwargs)
-            if bucket and key and _current_fragment is not None:
-                size = 0
-                if filename:
-                    try:
-                        size = max(0, int(os.path.getsize(filename)))
-                    except (OSError, ValueError, TypeError):
-                        size = 0
-
-                _log_write(
-                    path=f"s3://{bucket}/{key}",
-                    hash_value=None,
-                    hash_algorithm="etag",
-                    size=size,
-                    capture_method="proxy",
-                )
-            return response
 
-        client.upload_file = _tracked_upload_file
+def _log_s3_read(*args, **kwargs) -> None:
+    task_id = str(kwargs.pop("_roar_task_id", "") or "")
+    function_name = str(kwargs.pop("_roar_function_name", "") or "")
+    if not task_id:
+        task_id = _resolved_task_id()
+    if not function_name:
+        function_name = _get_task_function_name()
+    if not task_id:
+        return
 
-    real_get_object = getattr(client, "get_object", None)
-    if callable(real_get_object):
+    bucket, key = _extract_bucket_key(args, kwargs)
+    if not bucket or not key:
+        return
 
-        def _tracked_get_object(*args, **kwargs):
-            _check_task_boundary()
-            response = real_get_object(*args, **kwargs)
-            bucket, key = _extract_bucket_key(args, kwargs)
-            if bucket and key and _current_fragment is not None:
-                size_value = response.get("ContentLength") if isinstance(response, dict) else None
-                try:
-                    size = int(size_value) if size_value is not None else 0
-                except (TypeError, ValueError):
-                    size = 0
-
-                _log_read(
-                    path=f"s3://{bucket}/{key}",
-                    hash_value=_normalize_etag(
-                        response.get("ETag") if isinstance(response, dict) else None
-                    ),
-                    hash_algorithm="etag",
-                    size=size,
-                    capture_method="proxy",
-                )
-            return response
+    response = kwargs.pop("_roar_response", None)
+    etag = _normalize_etag(response.get("ETag")) if isinstance(response, dict) else None
+    _log_read(
+        path=f"s3://{bucket}/{key}",
+        hash_value=etag,
+        hash_algorithm="etag" if etag else "",
+        size=_response_size_bytes(response),
+        capture_method="proxy",
+        task_id=task_id,
+        function_name=function_name,
+    )
 
-        client.get_object = _tracked_get_object
 
-    client._roar_s3_wrapped = True
-    return client
+def _track_s3_api_call(
+    operation_name: str,
+    api_params: dict[str, Any],
+    response: Any,
+    *,
+    task_id: str | None = None,
+    function_name: str | None = None,
+) -> None:
+    resolved_task_id = task_id if task_id is not None else _resolved_task_id()
+    resolved_function_name = (
+        function_name if function_name is not None else _get_task_function_name()
+    )
+    if not resolved_task_id:
+        return
 
+    bucket = _to_text(api_params.get("Bucket"))
+    key = _to_text(api_params.get("Key"))
+    if not bucket or not key:
+        return
 
-def _get_collector_actor():
-    global _collector_actor
+    if operation_name in _S3_WRITE_OPS:
+        _log_s3_write(
+            Bucket=bucket,
+            Key=key,
+            Body=api_params.get("Body"),
+            _roar_response=response,
+            _roar_task_id=resolved_task_id,
+            _roar_function_name=resolved_function_name,
+        )
+        return
 
-    if _collector_actor is not None:
-        return _collector_actor
+    _log_s3_read(
+        Bucket=bucket,
+        Key=key,
+        _roar_response=response,
+        _roar_task_id=resolved_task_id,
+        _roar_function_name=resolved_function_name,
+    )
 
-    try:
-        import ray
-    except Exception:
-        _collector_actor = None
-        return None
 
-    actor_name = f"roar-log-collector-{os.environ.get('ROAR_JOB_ID', 'default')}"
-    try:
-        _collector_actor = ray.get_actor(actor_name, namespace="roar")
-    except Exception:
-        _collector_actor = None
+def _wrap_s3_client_method(client: Any, method_name: str, operation_name: str) -> None:
+    real_method = getattr(client, method_name, None)
+    if not callable(real_method) or getattr(real_method, "_roar_patched", False):
+        return
 
-    return _collector_actor
+    @functools.wraps(real_method)
+    def _wrapped(*args, **kwargs):
+        task_id = _resolved_task_id()
+        function_name = _get_task_function_name()
+        _s3_tracking_scope.active = True
+        try:
+            response = real_method(*args, **kwargs)
+        finally:
+            _s3_tracking_scope.active = False
 
+        api_params = dict(kwargs) if isinstance(kwargs, dict) else {}
+        if not api_params:
+            bucket, key = _extract_bucket_key(args, kwargs)
+            if bucket:
+                api_params["Bucket"] = bucket
+            if key:
+                api_params["Key"] = key
+        with contextlib.suppress(Exception):
+            _track_s3_api_call(
+                operation_name,
+                api_params,
+                response,
+                task_id=task_id,
+                function_name=function_name,
+            )
+        return response
+
+    _wrapped._roar_patched = True  # type: ignore[attr-defined]
+    setattr(client, method_name, _wrapped)
+
+
+def _wrap_s3_client(client: Any) -> Any:
+    if getattr(client, "_roar_s3_wrapped", False):
+        return client
 
-def _emit_fragment(fragment: TaskFragment) -> None:
-    actor = _get_collector_actor()
-    if actor is None:
-        return
+    method_operations = {
+        "put_object": "PutObject",
+        "get_object": "GetObject",
+        "delete_object": "DeleteObject",
+        "upload_part": "UploadPart",
+        "complete_multipart_upload": "CompleteMultipartUpload",
+    }
+    for method_name, operation_name in method_operations.items():
+        _wrap_s3_client_method(client, method_name, operation_name)
 
-    try:
-        append_fragment = getattr(actor, "append_fragment", None)
-        remote = getattr(append_fragment, "remote", None) if append_fragment is not None else None
-        if callable(remote):
-            remote(fragment.to_dict())
-    except Exception:
-        pass
+    client._roar_s3_wrapped = True
+    return client
 
 
 def _patch_boto3() -> None:
     try:
         import boto3
+        from botocore.client import BaseClient
     except Exception:
         return
 
-    if getattr(boto3, "_roar_worker_boto3_patched", False):
+    real_make_api_call = getattr(BaseClient, "_make_api_call", None)
+    if not callable(real_make_api_call) or getattr(real_make_api_call, "_roar_patched", False):
         return
 
-    real_client = boto3.client
+    def _tracking_make_api_call(self, operation_name, api_params):
+        response = real_make_api_call(self, operation_name, api_params)
+        if getattr(_s3_tracking_scope, "active", False):
+            return response
+        with contextlib.suppress(Exception):
+            meta = getattr(self, "meta", None)
+            service_model = getattr(meta, "service_model", None)
+            service_name = str(getattr(service_model, "service_name", "") or "").lower()
+            if service_name == "s3":
+                _track_s3_api_call(
+                    str(operation_name or ""),
+                    api_params if isinstance(api_params, dict) else {},
+                    response,
+                )
+        return response
+
+    _tracking_make_api_call._roar_patched = True  # type: ignore[attr-defined]
+    BaseClient._make_api_call = _tracking_make_api_call  # type: ignore[method-assign]
+
+    real_boto3_client = getattr(boto3, "client", None)
+    if callable(real_boto3_client) and not getattr(real_boto3_client, "_roar_patched", False):
+
+        @functools.wraps(real_boto3_client)
+        def _tracking_boto3_client(*args, **kwargs):
+            client = real_boto3_client(*args, **kwargs)
+            service_name = ""
+            if args:
+                service_name = str(args[0] or "").lower()
+            elif "service_name" in kwargs:
+                service_name = str(kwargs.get("service_name") or "").lower()
+            if service_name == "s3":
+                return _wrap_s3_client(client)
+            return client
+
+        _tracking_boto3_client._roar_patched = True  # type: ignore[attr-defined]
+        boto3.client = _tracking_boto3_client  # type: ignore[assignment]
+
+    session_client = getattr(boto3.session.Session, "client", None)
+    if callable(session_client) and not getattr(session_client, "_roar_patched", False):
 
-    def _tracking_client(service_name, *args, **kwargs):
-        client = real_client(service_name, *args, **kwargs)
-        if str(service_name).lower() != "s3":
+        @functools.wraps(session_client)
+        def _tracking_session_client(self, service_name, *args, **kwargs):
+            client = session_client(self, service_name, *args, **kwargs)
+            if str(service_name or "").lower() == "s3":
+                return _wrap_s3_client(client)
             return client
-        return _wrap_s3_client(client)
 
-    boto3.client = _tracking_client
-    boto3._roar_worker_boto3_patched = True
+        _tracking_session_client._roar_patched = True  # type: ignore[attr-defined]
+        boto3.session.Session.client = _tracking_session_client  # type: ignore[assignment]
 
 
-def _patch_pandas_parquet() -> None:
+def _configure_local_proxy_endpoint() -> None:
+    """Point S3 traffic at the local node proxy on the well-known loopback port."""
+    global _proxy_configured
+
+    if _proxy_configured:
+        return
+    port = _local_proxy_port()
+
+    # Configure the proxy endpoint.
+    upstream = str(os.environ.get("ROAR_UPSTREAM_S3_ENDPOINT", "")).strip()
+    original = str(os.environ.get("AWS_ENDPOINT_URL", "")).strip()
+    if not upstream and original and not _is_loopback_proxy_endpoint(original):
+        os.environ["ROAR_UPSTREAM_S3_ENDPOINT"] = original
+    os.environ["AWS_ENDPOINT_URL"] = f"http://127.0.0.1:{port}"
+    print(f"[roar-worker] set AWS_ENDPOINT_URL=http://127.0.0.1:{port}")
+
+    os.environ["ROAR_PROXY_PORT"] = str(port)
+    _proxy_configured = True
+    print("[roar-worker] proxy endpoint configured")
+
+
+def _configure_proxy_in_background() -> None:
+    """Compatibility wrapper for older call sites."""
+    _configure_local_proxy_endpoint()
+
+
+def _resolve_preload_library_for_worker_exec() -> str | None:
+    explicit = str(os.environ.get("ROAR_PRELOAD_LIB", "")).strip()
+    if explicit and os.path.exists(explicit):
+        return explicit
+
+    for candidate_name in ("libroar_tracer_preload.so", "libroar-tracer-preload.so"):
+        candidate = Path.cwd() / candidate_name
+        if candidate.exists():
+            return str(candidate.resolve())
+
     try:
-        import pandas as pd
+        import roar
+        from roar.services.execution.tracer_backends import find_preload_library
+
+        roar_package_dir = Path(roar.__file__).resolve().parent
+        library_path = find_preload_library(roar_package_dir)
     except Exception:
-        return
+        return None
 
-    original_to_parquet = getattr(pd.DataFrame, "to_parquet", None)
-    if not callable(original_to_parquet):
-        return
-    if getattr(original_to_parquet, "_roar_worker_patched", False):
+    if library_path and os.path.exists(library_path):
+        return str(Path(library_path).resolve())
+    return None
+
+
+def _prepare_preload_env_for_worker_exec() -> None:
+    library_path = _resolve_preload_library_for_worker_exec()
+    if not library_path:
         return
 
-    def _tracked_to_parquet(self, path, *args, **kwargs):
-        result = original_to_parquet(self, path, *args, **kwargs)
-        try:
-            _check_task_boundary()
-            if isinstance(path, (str, bytes, os.PathLike)):
-                resolved = os.path.abspath(os.fspath(path))
-                _log_write(
-                    path=resolved,
-                    hash_value=None,
-                    hash_algorithm=_active_hash_algorithm(),
-                    size=0,
-                    capture_method="tracer",
-                )
-        except Exception:
-            pass
-        return result
+    current_ld_preload = str(os.environ.get("LD_PRELOAD", "")).strip()
+    current_entries = [entry for entry in re.split(r"[\s:]+", current_ld_preload) if entry]
+    if library_path not in current_entries:
+        current_entries.insert(0, library_path)
+        os.environ["LD_PRELOAD"] = " ".join(current_entries)
 
-    _tracked_to_parquet._roar_worker_patched = True  # type: ignore[attr-defined]
-    pd.DataFrame.to_parquet = _tracked_to_parquet
+    sock_path = str(os.environ.get("ROAR_PRELOAD_TRACE_SOCK", "")).strip()
+    if not sock_path:
+        sock_dir = tempfile.mkdtemp(prefix="roar-trace-")
+        os.environ["ROAR_PRELOAD_TRACE_SOCK"] = os.path.join(sock_dir, "trace.sock")
 
 
 def _startup() -> None:
@@ -583,17 +1652,28 @@ def _startup() -> None:
         return
 
     _actor_attribution_mode = _get_actor_attribution()
+    if "libroar_tracer_preload" in os.environ.get("LD_PRELOAD", ""):
+        _start_native_tracer_socket()
     builtins.open = _tracking_open
+    _patch_subprocess_for_native_task_attribution()
     _patch_boto3()
     _patch_pandas_parquet()
-    atexit.register(_flush_current_fragment)
+    _patch_tempfile()
+    _patch_ray_task_execution_for_native_flush()
+    _start_collector()
+    atexit.register(_shutdown_collector)
     _startup_complete = True
 
+    _configure_local_proxy_endpoint()
+
 
 def _run_worker_entrypoint(argv: list[str]) -> None:
     if not argv:
         return
 
+    # Set preload vars immediately before exec so the final Python worker
+    # process, not this bootstrap helper, starts with the interposer active.
+    _prepare_preload_env_for_worker_exec()
     os.execvp("python3", ["python3", *argv])
 
 
diff --git a/roar/ray/s3_key_paths.py b/roar/ray/s3_key_paths.py
new file mode 100644
index 00000000..910a12e7
--- /dev/null
+++ b/roar/ray/s3_key_paths.py
@@ -0,0 +1,57 @@
+from __future__ import annotations
+
+_S3_KEY_PLACEHOLDER_PREFIX = "roar+s3key://"
+
+
+def build_s3_path_or_placeholder(
+    raw_value: str | None,
+    *,
+    bucket_name: str | None = None,
+    bucket_hint: str = "",
+) -> str | None:
+    text = str(raw_value or "").strip()
+    if not text:
+        return None
+    if text.startswith("s3://"):
+        return text
+
+    key = text.lstrip("/")
+    if not key:
+        return None
+
+    bucket = str(bucket_name or "").strip()
+    if bucket:
+        return f"s3://{bucket}/{key}"
+    if bucket_hint:
+        return f"{_S3_KEY_PLACEHOLDER_PREFIX}{bucket_hint}/{key}"
+    return f"{_S3_KEY_PLACEHOLDER_PREFIX}_{key}"
+
+
+def parse_s3_key_placeholder(path: str) -> tuple[str, str] | None:
+    text = str(path or "").strip()
+    if not text.startswith(_S3_KEY_PLACEHOLDER_PREFIX):
+        return None
+
+    remainder = text[len(_S3_KEY_PLACEHOLDER_PREFIX) :]
+    bucket_hint, separator, key = remainder.partition("/")
+    if not separator or not key:
+        return None
+    return bucket_hint, key
+
+
+def s3_object_key(path: str) -> str | None:
+    text = str(path or "").strip()
+    if not text:
+        return None
+
+    placeholder = parse_s3_key_placeholder(text)
+    if placeholder is not None:
+        return placeholder[1]
+
+    if not text.startswith("s3://"):
+        return None
+    remainder = text[len("s3://") :]
+    _bucket, separator, key = remainder.partition("/")
+    if not separator or not key:
+        return None
+    return key
diff --git a/roar/ray/worker.py b/roar/ray/worker.py
deleted file mode 100644
index 1839175d..00000000
--- a/roar/ray/worker.py
+++ /dev/null
@@ -1,550 +0,0 @@
-"""
-roar Ray worker setup hook.
-
-Installed via runtime_env.worker_process_setup_hook when ROAR_WRAP=1.
-Patches builtins.open to capture per-task file I/O and write events through
-either a Ray actor aggregator (default on real clusters) or filesystem logs
-when a shared volume is available.
-"""
-
-from __future__ import annotations
-
-import atexit
-import builtins
-import json
-import os
-import tempfile
-import threading
-import time
-import uuid
-from typing import Any
-
-# Captured at module import time so the hook doesn't recursively call itself.
-_real_open = builtins.open
-
-_LOG_DIR: str = ""
-_SKIP_PREFIXES: tuple[str, ...] = ()
-_BACKEND: str = "filesystem"
-_actor: Any = None
-_event_buffer: list[dict[str, Any]] = []
-_buffer_lock = threading.Lock()
-_FLUSH_THRESHOLD = 50
-_SETUP_COMPLETE = False
-
-
-def setup() -> None:
-    """
-    Called once by Ray when a new worker process starts.
-
-    Sets up the file I/O tracking shim. Writes are non-blocking:
-    each open() call appends a JSON line to the shared log dir.
-    """
-    global _BACKEND, _LOG_DIR, _SETUP_COMPLETE, _SKIP_PREFIXES
-
-    if _SETUP_COMPLETE:
-        return
-
-    _LOG_DIR = os.environ.get("ROAR_LOG_DIR", "/shared/.roar-logs")
-    _BACKEND = _choose_backend()
-    if _BACKEND == "filesystem":
-        os.makedirs(_LOG_DIR, exist_ok=True)
-
-    # Paths we must never recurse into (the log dir itself, /proc, /sys ...)
-    _SKIP_PREFIXES = (
-        _LOG_DIR,
-        "/proc/",
-        "/sys/",
-        "/dev/",
-    )
-
-    builtins.open = _tracking_open
-    _patch_tempfile()
-    if os.environ.get("ROAR_WORKER_PATCH_SDKS", "0").strip() in {"1", "true", "True"}:
-        _patch_boto3()
-        _patch_pandas()
-        _patch_pyarrow_filesystem()
-        _patch_ray_data()
-    if os.environ.get("ROAR_WORKER_CONFIGURE_PROXY", "0").strip() in {"1", "true", "True"}:
-        _configure_local_proxy_endpoint()
-    atexit.register(_flush_worker_buffer)
-    _SETUP_COMPLETE = True
-
-
-def _choose_backend() -> str:
-    configured = os.environ.get("ROAR_LOG_BACKEND")
-    if configured:
-        selected = configured.strip().lower()
-        if selected in {"actor", "filesystem"}:
-            return selected
-
-    if _shared_fs_available(_LOG_DIR):
-        return "filesystem"
-    return "actor"
-
-
-def _shared_fs_available(log_dir: str) -> bool:
-    sentinel_path = os.path.join(log_dir, f".roar-sentinel-{uuid.uuid4().hex}")
-    try:
-        os.makedirs(log_dir, exist_ok=True)
-        with _real_open(sentinel_path, "w", encoding="utf-8") as handle:
-            handle.write("ok")
-        os.unlink(sentinel_path)
-        return True
-    except OSError:
-        return False
-
-
-def _init_actor() -> None:
-    global _actor
-
-    try:
-        import ray
-    except Exception:
-        _actor = None
-        return
-
-    job_id = os.environ.get("ROAR_JOB_ID", "default")
-    actor_name = f"roar-log-collector-{job_id}"
-
-    try:
-        _actor = ray.get_actor(actor_name, namespace="roar")
-        return
-    except Exception:
-        _actor = None
-
-
-def _tracking_open(*args, **kwargs):
-    """Replacement for builtins.open that logs file access with task context."""
-    result = _real_open(*args, **kwargs)
-
-    try:
-        raw_path = args[0] if args else kwargs.get("file", "")
-        if isinstance(raw_path, (str, bytes, os.PathLike)):
-            path = os.path.abspath(_path_to_str(raw_path))
-            mode = args[1] if len(args) > 1 else kwargs.get("mode", "r")
-
-            # Skip our own log files and pseudo-filesystems.
-            if not any(path.startswith(prefix) for prefix in _SKIP_PREFIXES):
-                _log_access(path, str(mode), capture_method="python")
-    except Exception:
-        pass  # Never let tracking errors break user code
-
-    return result
-
-
-def _runtime_context_ids() -> tuple[str | None, str | None]:
-    try:
-        import ray
-
-        is_initialized = getattr(ray, "is_initialized", None)
-        if callable(is_initialized) and not is_initialized():
-            return None, None
-
-        ctx = ray.get_runtime_context()
-        task_id = _to_text(ctx.get_task_id())
-        node_id = _to_text(ctx.get_node_id())
-        return task_id, node_id
-    except Exception:
-        return None, None
-
-
-def _to_text(value: Any) -> str | None:
-    if value is None:
-        return None
-    if isinstance(value, bytes):
-        try:
-            return value.hex()
-        except Exception:
-            return value.decode("utf-8", errors="ignore")
-    text = str(value)
-    return text or None
-
-
-def _path_to_str(path: str | bytes | os.PathLike[Any]) -> str:
-    value = os.fspath(path)
-    if isinstance(value, bytes):
-        return value.decode("utf-8", errors="ignore")
-    return str(value)
-
-
-def _normalize_etag(value: Any) -> str | None:
-    text = _to_text(value)
-    if not text:
-        return None
-    if len(text) >= 2 and text[0] == text[-1] and text[0] in {'"', "'"}:
-        text = text[1:-1]
-    return text or None
-
-
-def _log_access(
-    path: str,
-    mode: str,
-    *,
-    source_type: str | None = None,
-    capture_method: str | None = None,
-    operation: str | None = None,
-    hash_value: str | None = None,
-    byte_range: str | None = None,
-) -> None:
-    """Record one file access event for later collection."""
-    task_id, node_id = _runtime_context_ids()
-    if not task_id:
-        return
-
-    payload: dict[str, Any] = {
-        "path": path,
-        "mode": mode,
-        "task_id": task_id,
-        "ts": time.time(),
-    }
-    if node_id:
-        payload["node_id"] = node_id
-    if source_type:
-        payload["source_type"] = source_type
-    if capture_method:
-        payload["capture_method"] = capture_method
-    if operation:
-        payload["operation"] = operation
-    if hash_value:
-        payload["hash"] = hash_value
-    if byte_range:
-        payload["byte_range"] = byte_range
-
-    if _BACKEND == "actor":
-        if _actor is None:
-            _init_actor()
-        if _actor is not None:
-            with _buffer_lock:
-                _event_buffer.append(payload)
-                should_flush = len(_event_buffer) >= _FLUSH_THRESHOLD
-            if should_flush or any(flag in mode for flag in ("w", "a", "x", "+")):
-                _flush_to_actor()
-            return
-
-    _write_to_file(task_id, payload)
-
-
-def _write_to_file(task_id: str, payload: dict[str, Any]) -> None:
-    log_file = os.path.join(_LOG_DIR, f"{task_id}.jsonl")
-    entry = json.dumps(payload)
-    os.makedirs(_LOG_DIR, exist_ok=True)
-    # Use _real_open so we don't recurse through our own hook.
-    with _real_open(log_file, "a", encoding="utf-8") as fh:
-        fh.write(entry + "\n")
-
-
-def _flush_to_actor() -> None:
-    if _BACKEND != "actor" or _actor is None:
-        return
-
-    with _buffer_lock:
-        if not _event_buffer:
-            return
-        batch = list(_event_buffer)
-        _event_buffer.clear()
-
-    try:
-        _actor.append_batch.remote(batch)
-    except Exception:
-        _write_batch_to_filesystem(batch)
-
-
-def _write_batch_to_filesystem(batch: list[dict[str, Any]]) -> None:
-    for payload in batch:
-        task_id = _to_text(payload.get("task_id"))
-        if task_id:
-            _write_to_file(task_id, payload)
-
-
-def _flush_worker_buffer() -> None:
-    _flush_to_actor()
-
-
-def _patch_tempfile() -> None:
-    if getattr(tempfile, "_roar_worker_tempfile_patched", False):
-        return
-
-    real_named_temporary_file = tempfile.NamedTemporaryFile
-
-    def _tracked_named_temporary_file(*args, **kwargs):
-        handle = real_named_temporary_file(*args, **kwargs)
-        try:
-            path = os.path.abspath(_path_to_str(handle.name))
-            _log_access(path, "w", capture_method="python")
-        except Exception:
-            pass
-        return handle
-
-    tempfile.NamedTemporaryFile = _tracked_named_temporary_file
-    tempfile._roar_worker_tempfile_patched = True  # type: ignore[attr-defined]
-
-
-def _configure_local_proxy_endpoint() -> None:
-    if os.environ.get("AWS_ENDPOINT_URL"):
-        return
-
-    job_id = os.environ.get("ROAR_JOB_ID")
-    if not job_id:
-        return
-
-    try:
-        import ray
-    except Exception:
-        return
-
-    try:
-        node_id = _to_text(ray.get_runtime_context().get_node_id())
-    except Exception:
-        return
-    if not node_id:
-        return
-
-    try:
-        from roar.ray._agent_names import build_node_agent_name
-
-        actor_name = build_node_agent_name(job_id, node_id)
-        agent = ray.get_actor(actor_name, namespace="roar")
-        port = ray.get(agent.get_proxy_port.remote(), timeout=5)
-    except Exception:
-        return
-
-    if isinstance(port, int) and port > 0:
-        os.environ.setdefault("AWS_ENDPOINT_URL", f"http://127.0.0.1:{port}")
-
-
-def _patch_boto3() -> None:
-    try:
-        import boto3
-    except Exception:
-        return
-
-    if getattr(boto3, "_roar_worker_boto3_patched", False):
-        return
-
-    real_client = boto3.client
-
-    def _tracking_client(service_name, *args, **kwargs):
-        client = real_client(service_name, *args, **kwargs)
-        if str(service_name).lower() != "s3":
-            return client
-        return _wrap_s3_client(client)
-
-    boto3.client = _tracking_client
-    boto3._roar_worker_boto3_patched = True
-
-
-def _wrap_s3_client(client):
-    if getattr(client, "_roar_worker_s3_wrapped", False):
-        return client
-
-    real_put_object = getattr(client, "put_object", None)
-    if callable(real_put_object):
-
-        def _tracked_put_object(*args, **kwargs):
-            response = real_put_object(*args, **kwargs)
-            bucket, key = _extract_bucket_key(args, kwargs)
-            if bucket and key:
-                _log_access(
-                    f"s3://{bucket}/{key}",
-                    "w",
-                    source_type="s3",
-                    capture_method="proxy",
-                    operation="PutObject",
-                    hash_value=_normalize_etag(
-                        response.get("ETag") if isinstance(response, dict) else None
-                    ),
-                )
-            return response
-
-        client.put_object = _tracked_put_object
-
-    real_get_object = getattr(client, "get_object", None)
-    if callable(real_get_object):
-
-        def _tracked_get_object(*args, **kwargs):
-            response = real_get_object(*args, **kwargs)
-            bucket, key = _extract_bucket_key(args, kwargs)
-            if bucket and key:
-                _log_access(
-                    f"s3://{bucket}/{key}",
-                    "r",
-                    source_type="s3",
-                    capture_method="proxy",
-                    operation="GetObject",
-                    hash_value=_normalize_etag(
-                        response.get("ETag") if isinstance(response, dict) else None
-                    ),
-                    byte_range=_to_text(kwargs.get("Range")),
-                )
-            return response
-
-        client.get_object = _tracked_get_object
-
-    client._roar_worker_s3_wrapped = True
-    return client
-
-
-def _extract_bucket_key(args, kwargs) -> tuple[str | None, str | None]:
-    bucket = kwargs.get("Bucket")
-    key = kwargs.get("Key")
-    if bucket and key:
-        return _to_text(bucket), _to_text(key)
-    if len(args) >= 2:
-        return _to_text(args[0]), _to_text(args[1])
-    return None, None
-
-
-def _patch_pandas() -> None:
-    try:
-        import pandas as pd
-    except Exception:
-        return
-
-    original_to_parquet = getattr(pd.DataFrame, "to_parquet", None)
-    if not callable(original_to_parquet):
-        return
-    if getattr(original_to_parquet, "_roar_worker_patched", False):
-        return
-
-    def _tracked_to_parquet(self, path, *args, **kwargs):
-        result = original_to_parquet(self, path, *args, **kwargs)
-        try:
-            if isinstance(path, (str, bytes, os.PathLike)):
-                resolved = os.path.abspath(_path_to_str(path))
-                if not any(resolved.startswith(prefix) for prefix in _SKIP_PREFIXES):
-                    # Treat parquet capture as tracer-level for Ray Data/Arrow parity.
-                    _log_access(resolved, "w", capture_method="tracer")
-        except Exception:
-            pass
-        return result
-
-    _tracked_to_parquet._roar_worker_patched = True  # type: ignore[attr-defined]
-    pd.DataFrame.to_parquet = _tracked_to_parquet
-
-
-def _patch_pyarrow_filesystem() -> None:
-    """
-    Capture Arrow filesystem reads/writes done by Ray Data worker internals.
-
-    Ray Data relies heavily on pyarrow C++ file IO paths that bypass Python's
-    builtins.open. Wrapping filesystem stream open methods closes that gap.
-    """
-    try:
-        import pyarrow.fs as pafs
-    except Exception:
-        return
-
-    if getattr(pafs, "_roar_worker_fs_patched", False):
-        return
-
-    wrappers = {
-        "open_input_file": "r",
-        "open_input_stream": "r",
-        "open_output_stream": "w",
-        "open_append_stream": "a",
-    }
-
-    for method_name, mode in wrappers.items():
-        original_method = getattr(pafs.FileSystem, method_name, None)
-        if not callable(original_method):
-            continue
-
-        def _make_wrapper(original, mode_value):
-            def _wrapped(self, path, *args, **kwargs):
-                result = original(self, path, *args, **kwargs)
-                _log_arrow_access(path, mode_value)
-                return result
-
-            _wrapped._roar_worker_patched = True
-            return _wrapped
-
-        try:
-            setattr(pafs.FileSystem, method_name, _make_wrapper(original_method, mode))
-        except Exception:
-            continue
-
-    pafs._roar_worker_fs_patched = True
-
-
-def _patch_ray_data() -> None:
-    """
-    Fallback capture for Ray Data APIs when Arrow filesystem monkeypatching
-    is unavailable in the worker runtime.
-    """
-    try:
-        import ray.data as ray_data
-    except Exception:
-        return
-
-    if getattr(ray_data, "_roar_worker_ray_data_patched", False):
-        return
-
-    for method_name, mode in (
-        ("read_csv", "r"),
-        ("read_parquet", "r"),
-        ("read_json", "r"),
-        ("read_text", "r"),
-    ):
-        original_method = getattr(ray_data, method_name, None)
-        if not callable(original_method) or getattr(original_method, "_roar_worker_patched", False):
-            continue
-
-        def _make_read_wrapper(original, mode_value):
-            def _wrapped(paths, *args, **kwargs):
-                result = original(paths, *args, **kwargs)
-                for path in _iter_data_paths(paths):
-                    _log_arrow_access(path, mode_value)
-                return result
-
-            _wrapped._roar_worker_patched = True
-            return _wrapped
-
-        setattr(ray_data, method_name, _make_read_wrapper(original_method, mode))
-
-    try:
-        from ray.data.dataset import Dataset
-    except Exception:
-        ray_data._roar_worker_ray_data_patched = True  # type: ignore[attr-defined]
-        return
-
-    for method_name, mode in (
-        ("write_parquet", "w"),
-        ("write_csv", "w"),
-        ("write_json", "w"),
-    ):
-        original_method = getattr(Dataset, method_name, None)
-        if not callable(original_method) or getattr(original_method, "_roar_worker_patched", False):
-            continue
-
-        def _make_write_wrapper(original, mode_value):
-            def _wrapped(self, path, *args, **kwargs):
-                result = original(self, path, *args, **kwargs)
-                _log_arrow_access(path, mode_value)
-                return result
-
-            _wrapped._roar_worker_patched = True
-            return _wrapped
-
-        setattr(Dataset, method_name, _make_write_wrapper(original_method, mode))
-
-    ray_data._roar_worker_ray_data_patched = True  # type: ignore[attr-defined]
-
-
-def _iter_data_paths(paths: Any) -> list[str]:
-    if isinstance(paths, (str, bytes, os.PathLike)):
-        return [_path_to_str(paths)]
-    if isinstance(paths, (list, tuple, set)):
-        return [_path_to_str(path) for path in paths if isinstance(path, (str, bytes, os.PathLike))]
-    return []
-
-
-def _log_arrow_access(path: Any, mode: str) -> None:
-    if not isinstance(path, (str, bytes, os.PathLike)):
-        return
-    try:
-        resolved = os.path.abspath(_path_to_str(path))
-    except Exception:
-        return
-    if any(resolved.startswith(prefix) for prefix in _SKIP_PREFIXES):
-        return
-    _log_access(resolved, mode, capture_method="tracer")
diff --git a/roar/services/execution/__init__.py b/roar/services/execution/__init__.py
index 4e239e4d..2dd0c9a7 100644
--- a/roar/services/execution/__init__.py
+++ b/roar/services/execution/__init__.py
@@ -1,12 +1,7 @@
 """Execution services for roar run/build commands."""
 
-from .args import RunArgumentParser
-from .coordinator import RunCoordinator
-from .dag_resolver import DAGReferenceResolver
-from .execution_service import ExecutionRequest, ExecutionService, GitValidationResult
-from .proxy import ProxyService
-from .signal_handler import ProcessSignalHandler
-from .tracer import TracerService
+from importlib import import_module
+from typing import Any
 
 __all__ = [
     "DAGReferenceResolver",
@@ -19,3 +14,31 @@
     "RunCoordinator",
     "TracerService",
 ]
+
+_LAZY_IMPORTS = {
+    "DAGReferenceResolver": (".dag_resolver", "DAGReferenceResolver"),
+    "ExecutionRequest": (".execution_service", "ExecutionRequest"),
+    "ExecutionService": (".execution_service", "ExecutionService"),
+    "GitValidationResult": (".execution_service", "GitValidationResult"),
+    "ProcessSignalHandler": (".signal_handler", "ProcessSignalHandler"),
+    "ProxyService": (".proxy", "ProxyService"),
+    "RunArgumentParser": (".args", "RunArgumentParser"),
+    "RunCoordinator": (".coordinator", "RunCoordinator"),
+    "TracerService": (".tracer", "TracerService"),
+}
+
+
+def __getattr__(name: str) -> Any:
+    try:
+        module_name, attr_name = _LAZY_IMPORTS[name]
+    except KeyError as exc:
+        raise AttributeError(f"module {__name__!r} has no attribute {name!r}") from exc
+
+    module = import_module(module_name, __name__)
+    value = getattr(module, attr_name)
+    globals()[name] = value
+    return value
+
+
+def __dir__() -> list[str]:
+    return sorted(set(globals()) | set(__all__))
diff --git a/roar/services/execution/coordinator.py b/roar/services/execution/coordinator.py
index a2e983e9..3beb4338 100644
--- a/roar/services/execution/coordinator.py
+++ b/roar/services/execution/coordinator.py
@@ -120,6 +120,9 @@ def execute(self, ctx: RunContext) -> RunResult:
                     upstream_url=existing_endpoint,
                 )
                 extra_env = {"AWS_ENDPOINT_URL": f"http://127.0.0.1:{proxy_handle.port}"}
+                # Preserve the real upstream for cluster-side proxies (ray job submit).
+                if existing_endpoint:
+                    extra_env["ROAR_UPSTREAM_S3_ENDPOINT"] = existing_endpoint
                 self.logger.debug("Proxy started on port %d", proxy_handle.port)
             except Exception as e:
                 self.logger.warning("Failed to start proxy: %s", e)
diff --git a/roar/services/execution/inject/sitecustomize.py b/roar/services/execution/inject/sitecustomize.py
index 4d7e152c..0b94690a 100644
--- a/roar/services/execution/inject/sitecustomize.py
+++ b/roar/services/execution/inject/sitecustomize.py
@@ -3,6 +3,7 @@
 import contextlib
 import json
 import os
+import subprocess
 import sys
 import tempfile
 import threading
@@ -69,12 +70,28 @@ def tracking_open(*args, **kwargs):
 # ------------------------------------------------------------------------------
 _real_import = builtins.__import__
 _ray_patched = False
+_driver_phase_s3_clients_patched = False
 
 
 def tracking_import(name, globals=None, locals=None, fromlist=(), level=0):
-    global _ray_patched
+    global _driver_phase_s3_clients_patched, _ray_patched
     imported_modules.add(name)
     module = _real_import(name, globals, locals, fromlist, level)
+    if (
+        not _driver_phase_s3_clients_patched
+        and os.environ.get("ROAR_DRIVER_PHASE_PROXY_URL")
+        and (
+            name == "boto3"
+            or name.startswith("boto3.")
+            or name == "botocore"
+            or name.startswith("botocore.")
+        )
+    ):
+        try:
+            _patch_driver_phase_s3_clients()
+            _driver_phase_s3_clients_patched = True
+        except Exception:
+            pass
     if (
         not _ray_patched
         and os.environ.get("ROAR_WRAP") == "1"
@@ -239,17 +256,21 @@ def _write_log():
 # Ray integration (active only when ROAR_WRAP=1)
 # ------------------------------------------------------------------------------
 
-_DEFAULT_RAY_LOG_DIR = "/shared/.roar-logs"
 _DEFAULT_RAY_NODE_POLL_INTERVAL_SECONDS = 5.0
 _NODE_AGENT_RESOURCE_FRACTION = 0.0001
 _WORKER_SETUP_HOOK_ENV_VAR = "__RAY_WORKER_PROCESS_SETUP_HOOK_ENV_VAR"
 _WORKER_SETUP_HOOK = "roar.ray.roar_worker._startup"
+_WORKER_PY_EXECUTABLE = "roar-worker"
 _ray_node_poller_lock = threading.Lock()
 _ray_node_poller_stop = threading.Event()
 _ray_node_poller_thread = None
 _ray_node_agents_lock = threading.Lock()
 _ray_node_agents = {}
 _ray_collect_pre_shutdown_registered = False
+_real_subprocess_popen = subprocess.Popen
+_driver_phase_subprocess_patched = False
+_driver_phase_counter = 0
+_driver_phase_counter_lock = threading.Lock()
 
 
 def _patch_ray_init(ray_module) -> None:
@@ -264,6 +285,35 @@ def _roar_ray_init(*args, **kwargs):
         if not ray_config["enabled"]:
             return _real_ray_init(*args, **kwargs)
 
+        if os.environ.get("ROAR_JOB_INSTRUMENTED") == "1":
+            runtime_env = dict(kwargs.pop("runtime_env", None) or {})
+            submitted_job_id = (
+                os.environ.get("ROAR_JOB_ID")
+                or dict(runtime_env.get("env_vars", {}) or {}).get("ROAR_JOB_ID")
+                or uuid.uuid4().hex[:8]
+            )
+            kwargs["runtime_env"] = _sanitize_worker_runtime_env_for_ray(
+                ray_module,
+                _prepare_instrumented_job_worker_runtime_env(runtime_env, str(submitted_job_id)),
+            )
+            result = _real_ray_init(*args, **kwargs)
+            _register_pre_shutdown_ray_collection()
+
+            # ROAR_JOB_ID is injected by _ray_job_submit.py into runtime_env env_vars,
+            # so both the driver and all workers see the same value.
+            if _node_agents_enabled():
+                # Spawn node agents synchronously so proxies are ready before
+                # any tasks execute. Workers connect to the proxy via the
+                # well-known port injected in AWS_ENDPOINT_URL.
+                try:
+                    print(f"[roar] spawning node agents for job {submitted_job_id}")
+                    _spawn_node_agents(ray_module, str(submitted_job_id))
+                    print(f"[roar] node agents spawned (count={len(_ray_node_agents)})")
+                except Exception as exc:
+                    print(f"[roar] WARNING: _spawn_node_agents failed: {exc}")
+                _start_ray_node_poller(ray_module)
+            return result
+
         runtime_env = dict(kwargs.pop("runtime_env", None) or {})
         env_vars = dict(runtime_env.get("env_vars", {}) or {})
         if ray_config["pip_install"]:
@@ -271,41 +321,30 @@ def _roar_ray_init(*args, **kwargs):
         else:
             runtime_env.pop("pip", None)
 
-        job_id = os.environ.get("ROAR_JOB_ID") or env_vars.get("ROAR_JOB_ID")
+        job_id = (
+            os.environ.get("ROAR_JOB_ID")
+            or env_vars.get("ROAR_JOB_ID")
+            or os.environ.get("RAY_JOB_ID")
+        )
         if not job_id:
             job_id = uuid.uuid4().hex[:8]
         job_id = str(job_id)
         driver_job_uid = str(os.environ.get("ROAR_JOB_ID", ""))
-
-        env_vars["ROAR_WORKER"] = "1"
-        env_vars["ROAR_LOG_DIR"] = ray_config["log_dir"]
-        env_vars["ROAR_LOG_BACKEND"] = "actor"
-        env_vars["ROAR_JOB_ID"] = job_id
-        env_vars["ROAR_DRIVER_JOB_UID"] = driver_job_uid
-        os.environ.setdefault("ROAR_LOG_DIR", ray_config["log_dir"])
         os.environ.setdefault("ROAR_JOB_ID", job_id)
-        for key in (
-            "AWS_ENDPOINT_URL",
-            "AWS_ACCESS_KEY_ID",
-            "AWS_SECRET_ACCESS_KEY",
-            "AWS_SESSION_TOKEN",
-            "AWS_DEFAULT_REGION",
-            "AWS_REGION",
-        ):
-            value = os.environ.get(key)
-            if value:
-                env_vars.setdefault(key, value)
-        runtime_env["env_vars"] = env_vars
+        runtime_env["env_vars"] = _prepare_worker_env_vars(
+            runtime_env.get("env_vars", {}),
+            job_id=job_id,
+            driver_job_uid=driver_job_uid,
+        )
         runtime_env = _prepare_worker_runtime_env(runtime_env, job_id)
         runtime_env = _sanitize_worker_runtime_env_for_ray(ray_module, runtime_env)
         kwargs["runtime_env"] = runtime_env
         result = _real_ray_init(*args, **kwargs)
         _register_pre_shutdown_ray_collection()
-        _ensure_collector_actor(ray_module, job_id)
         if _node_agents_enabled():
             threading.Thread(
                 target=_spawn_node_agents,
-                args=(ray_module, job_id, str(ray_config["log_dir"])),
+                args=(ray_module, job_id),
                 name="roar-ray-node-agent-bootstrap",
                 daemon=True,
             ).start()
@@ -315,59 +354,420 @@ def _roar_ray_init(*args, **kwargs):
     ray_module.init = _roar_ray_init
 
 
+def _prepare_instrumented_job_worker_runtime_env(runtime_env, job_id: str):
+    """Augment worker runtime_env inside a pre-instrumented Ray submit job.
+
+    Keep submit-time `pip` and `working_dir` untouched to avoid runtime-env
+    merge conflicts with the job-level runtime_env supplied by `ray job submit`.
+    The submit rewrite already injects env_vars and the setup hook at the
+    job level. Nested `ray.init()` only needs the roar worker executable so
+    preload can activate before the final Python exec.
+    """
+    del job_id
+    runtime_env = dict(runtime_env or {})
+    runtime_env["py_executable"] = _WORKER_PY_EXECUTABLE
+    return runtime_env
+
+
 def _patch_ray_shutdown(ray_module) -> None:
     real_ray_shutdown = getattr(ray_module, "shutdown", None)
     if not callable(real_ray_shutdown):
         return
+    if getattr(real_ray_shutdown, "_roar_patched", False):
+        return
 
     def _roar_ray_shutdown(*args, **kwargs):
         proxy_logs = _collect_node_agent_logs(ray_module) if _node_agents_enabled() else {}
         _collect_ray_io(proxy_logs=proxy_logs)
         return real_ray_shutdown(*args, **kwargs)
 
+    _roar_ray_shutdown._roar_patched = True  # type: ignore[attr-defined]
     ray_module.shutdown = _roar_ray_shutdown
 
 
-def _ensure_collector_actor(ray_module, job_id: str) -> None:
-    actor_name = f"roar-log-collector-{job_id}"
+def _phase_capture_enabled() -> bool:
+    raw = os.environ.get("ROAR_DRIVER_PHASE_CAPTURE", "").strip().lower()
+    return raw in {"1", "true", "yes", "on"}
+
 
-    try:
-        ray_module.get_actor(actor_name, namespace="roar")
+def _patch_driver_phase_s3_clients() -> None:
+    import boto3
+
+    phase_proxy_url = str(os.environ.get("ROAR_DRIVER_PHASE_PROXY_URL", "")).strip()
+    if not phase_proxy_url:
         return
-    except Exception:
-        pass
 
+    current_endpoint = str(os.environ.get("AWS_ENDPOINT_URL", "")).strip()
+    session_client = getattr(boto3.session.Session, "client", None)
+    if callable(session_client) and not getattr(
+        session_client, "_roar_driver_phase_patched", False
+    ):
+        real_session_client = session_client
+
+        def _session_client(self, service_name, *args, **kwargs):
+            if service_name == "s3":
+                requested_endpoint = str(kwargs.get("endpoint_url") or "").strip()
+                if not requested_endpoint or requested_endpoint == current_endpoint:
+                    kwargs["endpoint_url"] = phase_proxy_url
+            return real_session_client(self, service_name, *args, **kwargs)
+
+        _session_client._roar_driver_phase_patched = True  # type: ignore[attr-defined]
+        boto3.session.Session.client = _session_client
+
+    root_client = getattr(boto3, "client", None)
+    if callable(root_client) and not getattr(root_client, "_roar_driver_phase_patched", False):
+        real_root_client = root_client
+
+        def _client(service_name, *args, **kwargs):
+            if service_name == "s3":
+                requested_endpoint = str(kwargs.get("endpoint_url") or "").strip()
+                if not requested_endpoint or requested_endpoint == current_endpoint:
+                    kwargs["endpoint_url"] = phase_proxy_url
+            return real_root_client(service_name, *args, **kwargs)
+
+        _client._roar_driver_phase_patched = True  # type: ignore[attr-defined]
+        boto3.client = _client
+
+
+def _next_driver_phase_counter() -> int:
+    global _driver_phase_counter
+    with _driver_phase_counter_lock:
+        _driver_phase_counter += 1
+        return _driver_phase_counter
+
+
+def _coerce_subprocess_argv(args) -> list[str] | None:
+    if not isinstance(args, (list, tuple)):
+        return None
+    return [str(item) for item in args if item is not None]
+
+
+def _extract_state_file_arg(argv: list[str]) -> str | None:
+    for index, value in enumerate(argv):
+        if value == "--state-file" and index + 1 < len(argv):
+            return argv[index + 1]
+        if value.startswith("--state-file="):
+            return value.split("=", 1)[1]
+    return None
+
+
+def _extract_phase_label(argv: list[str]) -> str | None:
+    candidate = ""
+    if len(argv) >= 3 and os.path.basename(argv[0]).startswith("python") and argv[1] == "-m":
+        candidate = argv[2].split(".")[-1]
+    elif len(argv) >= 2 and os.path.basename(argv[0]).startswith("python"):
+        candidate = os.path.basename(argv[1])
+    elif argv:
+        candidate = os.path.basename(argv[0])
+
+    candidate = os.path.splitext(candidate)[0]
+    if candidate.startswith("run_"):
+        candidate = candidate[len("run_") :]
+    candidate = candidate.strip().lower()
+    return candidate or None
+
+
+def _load_json_file(path: str | None) -> dict:
+    if not path:
+        return {}
     try:
-        from roar.ray.actor import RoarLogCollectorActor
+        with _real_open(path, "r", encoding="utf-8") as handle:
+            payload = json.load(handle)
+    except Exception:
+        return {}
+    return payload if isinstance(payload, dict) else {}
+
+
+def _coerce_phase_timestamp(value: object) -> float | None:
+    if isinstance(value, bool):
+        return None
+    if isinstance(value, (int, float)):
+        return float(value)
+    if isinstance(value, str):
+        text = value.strip()
+        if not text:
+            return None
+        try:
+            return float(text)
+        except ValueError:
+            return None
+    return None
+
+
+def _resolve_driver_phase_timestamp(
+    phase_label: str,
+    state: dict,
+    *,
+    suffix: str,
+) -> float | None:
+    normalized_phase = str(phase_label or "").strip().lower()
+    for key in (
+        f"{normalized_phase}_{suffix}" if normalized_phase else "",
+        f"phase_{suffix}",
+    ):
+        if not key:
+            continue
+        resolved = _coerce_phase_timestamp(state.get(key))
+        if resolved is not None:
+            return resolved
+    return None
+
 
-        session_id = os.environ.get("ROAR_SESSION_ID")
-        fragment_token = os.environ.get("ROAR_FRAGMENT_TOKEN")
-        glaas_url = os.environ.get("GLAAS_URL") or os.environ.get("GLAAS_API_URL")
+def _append_driver_phase_state_refs(
+    fragment, phase_label: str, pre_state: dict, post_state: dict, env
+) -> None:
+    from roar.ray.fragment import ArtifactRef
+    from roar.ray.s3_key_paths import build_s3_path_or_placeholder
 
-        actor_options = RoarLogCollectorActor.options(
-            name=actor_name,
-            namespace="roar",
-            lifetime="detached",
-            num_cpus=0,
+    seen_reads = {str(ref.path) for ref in fragment.reads}
+    seen_writes = {str(ref.path) for ref in fragment.writes}
+
+    def _bucket_path(bucket_key: str, key: str | None) -> str | None:
+        bucket = str(env.get(bucket_key) or os.environ.get(bucket_key) or "").strip()
+        return build_s3_path_or_placeholder(
+            key,
+            bucket_name=bucket,
+            bucket_hint=bucket_key,
         )
 
-        if session_id and fragment_token and glaas_url:
-            actor = actor_options.remote(
-                session_id=session_id,
-                token=fragment_token,
-                glaas_url=glaas_url,
+    def _append_read(path: str | None) -> None:
+        if not path or path in seen_reads:
+            return
+        fragment.reads.append(
+            ArtifactRef(
+                path=path,
+                hash=None,
+                hash_algorithm="",
+                size=0,
+                capture_method="python",
             )
-        else:
-            actor = actor_options.remote()
-
-        get_fn = getattr(ray_module, "get", None)
-        if callable(get_fn):
-            get_all = getattr(actor, "get_all", None)
-            remote = getattr(get_all, "remote", None) if get_all is not None else None
-            if callable(remote):
-                get_fn(remote(), timeout=10)
-    except Exception:
-        pass
+        )
+        seen_reads.add(path)
+
+    def _append_write(path: str | None) -> None:
+        if not path or path in seen_writes:
+            return
+        fragment.writes.append(
+            ArtifactRef(
+                path=path,
+                hash=None,
+                hash_algorithm="",
+                size=0,
+                capture_method="python",
+            )
+        )
+        seen_writes.add(path)
+
+    pre_shards = [str(item) for item in pre_state.get("shard_keys", []) if item]
+    post_shards = [str(item) for item in post_state.get("shard_keys", []) if item]
+    pre_processed_key = str(pre_state.get("processed_key") or "").strip()
+    post_processed_key = str(post_state.get("processed_key") or "").strip()
+    pre_model_key = str(pre_state.get("model_key") or "").strip()
+    post_model_key = str(post_state.get("model_key") or "").strip()
+    pre_metrics_key = str(pre_state.get("metrics_key") or "").strip()
+    post_metrics_key = str(post_state.get("metrics_key") or "").strip()
+    pre_report_key = str(pre_state.get("report_key") or "").strip()
+    post_report_key = str(post_state.get("report_key") or "").strip()
+
+    if phase_label in {"training", "evaluation"}:
+        for shard_key in pre_shards:
+            _append_read(_bucket_path("S3_DATA_BUCKET", shard_key))
+
+    if phase_label == "training" and pre_processed_key:
+        _append_read(_bucket_path("S3_DATA_BUCKET", pre_processed_key))
+
+    if phase_label == "evaluation" and pre_model_key:
+        _append_read(_bucket_path("S3_MODELS_BUCKET", pre_model_key))
+
+    if post_shards and post_shards != pre_shards:
+        for shard_key in post_shards:
+            _append_write(_bucket_path("S3_DATA_BUCKET", shard_key))
+
+    if post_processed_key and post_processed_key != pre_processed_key:
+        _append_write(_bucket_path("S3_DATA_BUCKET", post_processed_key))
+
+    if post_model_key and post_model_key != pre_model_key:
+        _append_write(_bucket_path("S3_MODELS_BUCKET", post_model_key))
+
+    if post_metrics_key and post_metrics_key != pre_metrics_key:
+        _append_write(_bucket_path("S3_RESULTS_BUCKET", post_metrics_key))
+
+    if post_report_key and post_report_key != pre_report_key:
+        _append_write(_bucket_path("S3_RESULTS_BUCKET", post_report_key))
+
+
+def _build_driver_phase_capture(args, kwargs):
+    if not _phase_capture_enabled() or kwargs.get("shell"):
+        return None
+
+    argv = _coerce_subprocess_argv(args)
+    if not argv:
+        return None
+
+    state_file = _extract_state_file_arg(argv)
+    if not state_file:
+        return None
+
+    phase_label = _extract_phase_label(argv)
+    if not phase_label:
+        return None
+
+    return {
+        "phase_label": phase_label,
+        "state_file": state_file,
+        "phase_index": _next_driver_phase_counter(),
+        "pre_state": _load_json_file(state_file),
+        "service": None,
+        "handle": None,
+        "env": {},
+        "finalized": False,
+    }
+
+
+def _emit_driver_phase_fragment(
+    capture: dict, *, exit_code: int, started_at: float, ended_at: float
+) -> None:
+    from roar.ray.fragment import TaskFragment, derive_task_uid
+    from roar.ray.proxy_fragments import build_proxy_fragment, emit_fragment
+
+    service = capture.get("service")
+    handle = capture.get("handle")
+    entries = []
+    if service is not None and handle is not None:
+        entries = service.stop_for_run(handle)
+
+    phase_label = str(capture.get("phase_label") or "")
+    phase_index = int(capture.get("phase_index") or 0)
+    task_id = f"driver_phase:{phase_label}:{phase_index}"
+    roar_job_id = str(os.environ.get("ROAR_JOB_ID", "default"))
+    post_state = _load_json_file(str(capture.get("state_file") or ""))
+    resolved_started_at = (
+        _resolve_driver_phase_timestamp(phase_label, post_state, suffix="started_at") or started_at
+    )
+    resolved_ended_at = (
+        _resolve_driver_phase_timestamp(phase_label, post_state, suffix="ended_at") or ended_at
+    )
+    if resolved_ended_at < resolved_started_at:
+        resolved_ended_at = max(float(ended_at), resolved_started_at)
+
+    fragment = build_proxy_fragment(
+        entries,
+        function_name=phase_label,
+        task_id=task_id,
+        parent_job_uid=roar_job_id,
+        started_at=resolved_started_at,
+        ended_at=resolved_ended_at,
+        exit_code=exit_code,
+        recorded_at=resolved_ended_at,
+    )
+    if fragment is None:
+        fragment = TaskFragment(
+            job_uid=derive_task_uid(roar_job_id, task_id),
+            parent_job_uid=roar_job_id,
+            ray_task_id=task_id,
+            ray_worker_id="",
+            ray_node_id="driver",
+            ray_actor_id=None,
+            function_name=phase_label,
+            started_at=resolved_started_at,
+            ended_at=resolved_ended_at,
+            exit_code=exit_code,
+            recorded_at=resolved_ended_at,
+        )
+
+    _append_driver_phase_state_refs(
+        fragment,
+        phase_label=phase_label,
+        pre_state=capture.get("pre_state") or {},
+        post_state=post_state,
+        env=capture.get("env") or {},
+    )
+    if fragment.reads or fragment.writes:
+        emit_fragment(fragment)
+
+
+def _patch_driver_phase_subprocess_capture() -> None:
+    global _driver_phase_subprocess_patched
+
+    if _driver_phase_subprocess_patched:
+        return
+
+    class _TrackedDriverPhasePopen(_real_subprocess_popen):  # type: ignore[misc, valid-type]
+        def __init__(self, args, *popen_args, **popen_kwargs):
+            capture = _build_driver_phase_capture(args, popen_kwargs)
+            self._roar_phase_capture = capture
+            self._roar_phase_started_at = None
+
+            if capture is not None:
+                child_env = dict(popen_kwargs.get("env") or os.environ)
+                capture["env"] = child_env
+                try:
+                    from roar.services.execution.proxy import ProxyService
+
+                    service = ProxyService()
+                    handle = service.start_for_run(
+                        session_id=str(os.environ.get("ROAR_SESSION_ID", "")).strip() or None,
+                        job_id=str(os.environ.get("ROAR_JOB_ID", "")).strip() or None,
+                        upstream_url=str(os.environ.get("ROAR_UPSTREAM_S3_ENDPOINT", "")).strip()
+                        or None,
+                    )
+                    capture["service"] = service
+                    capture["handle"] = handle
+                    child_env["ROAR_DRIVER_PHASE_PROXY_URL"] = f"http://127.0.0.1:{handle.port}"
+                except Exception as exc:
+                    _warn_roar(
+                        "Failed to start driver phase proxy for %s: %s",
+                        capture["phase_label"],
+                        exc,
+                    )
+                popen_kwargs["env"] = child_env
+
+            super().__init__(args, *popen_args, **popen_kwargs)
+            if capture is not None:
+                self._roar_phase_started_at = __import__("time").time()
+
+        def _roar_finalize(self) -> None:
+            capture = getattr(self, "_roar_phase_capture", None)
+            if not isinstance(capture, dict) or capture.get("finalized"):
+                return
+            returncode = _real_subprocess_popen.poll(self)
+            if returncode is None:
+                return
+            capture["finalized"] = True
+            started_at = float(self._roar_phase_started_at or __import__("time").time())
+            ended_at = __import__("time").time()
+            try:
+                _emit_driver_phase_fragment(
+                    capture,
+                    exit_code=int(returncode),
+                    started_at=started_at,
+                    ended_at=ended_at,
+                )
+            except Exception as exc:
+                _warn_roar(
+                    "Failed to emit driver phase lineage for %s: %s",
+                    capture.get("phase_label"),
+                    exc,
+                )
+
+        def wait(self, *args, **kwargs):
+            result = super().wait(*args, **kwargs)
+            self._roar_finalize()
+            return result
+
+        def communicate(self, *args, **kwargs):
+            result = super().communicate(*args, **kwargs)
+            self._roar_finalize()
+            return result
+
+        def poll(self):
+            result = super().poll()
+            if result is not None:
+                self._roar_finalize()
+            return result
+
+    subprocess.Popen = _TrackedDriverPhasePopen  # type: ignore[misc]
+    _driver_phase_subprocess_patched = True
 
 
 def _node_agents_enabled() -> bool:
@@ -375,43 +775,95 @@ def _node_agents_enabled() -> bool:
     return raw in {"1", "true", "yes", "on"}
 
 
+def _prepare_worker_env_vars(
+    existing_env_vars,
+    *,
+    job_id: str,
+    driver_job_uid: str,
+):
+    env_vars = dict(existing_env_vars or {})
+    env_vars["ROAR_JOB_ID"] = job_id
+    env_vars["ROAR_DRIVER_JOB_UID"] = str(driver_job_uid)
+
+    for key in (
+        "AWS_ENDPOINT_URL",
+        "AWS_ACCESS_KEY_ID",
+        "AWS_SECRET_ACCESS_KEY",
+        "AWS_SESSION_TOKEN",
+        "AWS_DEFAULT_REGION",
+        "AWS_REGION",
+        "ROAR_PROXY_PORT",
+        "ROAR_UPSTREAM_S3_ENDPOINT",
+        "ROAR_PROJECT_DIR",
+        "ROAR_JOB_INSTRUMENTED",
+        "ROAR_RAY_NODE_AGENTS",
+    ):
+        value = os.environ.get(key)
+        if value:
+            env_vars.setdefault(key, value)
+
+    cluster_glaas_url = os.environ.get("ROAR_CLUSTER_GLAAS_URL") or os.environ.get("GLAAS_URL")
+    if cluster_glaas_url:
+        env_vars.setdefault("GLAAS_URL", cluster_glaas_url)
+
+    for key in ("ROAR_SESSION_ID", "ROAR_FRAGMENT_TOKEN"):
+        value = os.environ.get(key)
+        if value:
+            env_vars.setdefault(key, value)
+
+    return env_vars
+
+
 def _prepare_worker_runtime_env(runtime_env, job_id: str):
     import shutil
 
     runtime_env = dict(runtime_env or {})
-    tmp_dir = tempfile.mkdtemp(prefix=f"roar-worker-env-{job_id[:8]}-")
+    prepared_working_dir: str | None = None
 
     existing_working_dir = runtime_env.get("working_dir")
     if isinstance(existing_working_dir, str) and existing_working_dir.strip():
         if os.path.isdir(existing_working_dir):
+            prepared_working_dir = tempfile.mkdtemp(prefix=f"roar-worker-env-{job_id[:8]}-")
             with _SuppressTracking():
-                _merge_working_dir(existing_working_dir, tmp_dir)
+                _merge_working_dir(existing_working_dir, prepared_working_dir)
         else:
             _warn_roar(
                 "Skipping working_dir merge for non-local path %s while preparing Ray worker wrapper",
                 existing_working_dir,
             )
+            prepared_working_dir = existing_working_dir
 
-    try:
-        from pathlib import Path
+    if not prepared_working_dir:
+        prepared_working_dir = tempfile.mkdtemp(prefix=f"roar-worker-env-{job_id[:8]}-")
 
-        import roar
-        from roar.services.execution.tracer_backends import find_preload_library
+    if os.path.isdir(prepared_working_dir):
+        try:
+            from pathlib import Path
 
-        roar_package_dir = Path(roar.__file__).resolve().parent
-        with _SuppressTracking():
-            shutil.copytree(roar_package_dir, os.path.join(tmp_dir, "roar"), dirs_exist_ok=True)
+            import roar
+            from roar.services.execution.tracer_backends import find_preload_library
 
-            preload_library = find_preload_library(roar_package_dir)
-            if preload_library:
-                shutil.copy2(preload_library, os.path.join(tmp_dir, "libroar_tracer_preload.so"))
-    except Exception:
-        pass
+            roar_package_dir = Path(roar.__file__).resolve().parent
+            with _SuppressTracking():
+                shutil.copytree(
+                    roar_package_dir,
+                    os.path.join(prepared_working_dir, "roar"),
+                    dirs_exist_ok=True,
+                )
+
+                preload_library = find_preload_library(roar_package_dir)
+                if preload_library:
+                    shutil.copy2(
+                        preload_library,
+                        os.path.join(prepared_working_dir, "libroar_tracer_preload.so"),
+                    )
+        except Exception:
+            pass
 
     env_vars = dict(runtime_env.get("env_vars", {}) or {})
     env_vars[_WORKER_SETUP_HOOK_ENV_VAR] = _WORKER_SETUP_HOOK
-    runtime_env["working_dir"] = tmp_dir
-    runtime_env["py_executable"] = "roar-worker"
+    runtime_env["working_dir"] = prepared_working_dir
+    runtime_env["py_executable"] = _WORKER_PY_EXECUTABLE
     runtime_env["worker_process_setup_hook"] = _WORKER_SETUP_HOOK
     runtime_env["env_vars"] = env_vars
     return runtime_env
@@ -456,6 +908,8 @@ def _write_worker_wrapper(tmp_dir: str) -> None:
                 "#!/usr/bin/env bash\n"
                 'if [ -f "./libroar_tracer_preload.so" ]; then\n'
                 '    export LD_PRELOAD="$(pwd)/libroar_tracer_preload.so"\n'
+                "    SOCK_DIR=$(mktemp -d /tmp/roar-trace-XXXXXX)\n"
+                '    export ROAR_PRELOAD_TRACE_SOCK="$SOCK_DIR/trace.sock"\n'
                 "fi\n"
                 'exec python3 "$@"\n'
             )
@@ -493,21 +947,23 @@ def _warn_roar(message: str, *args) -> None:
         sys.stderr.write(text + "\n")
 
 
-def _spawn_node_agents(ray_module, job_id: str, log_dir: str) -> None:
+def _spawn_node_agents(ray_module, job_id: str) -> None:
     try:
         from roar.ray.node_agent import RoarNodeAgent, build_node_agent_name
-    except Exception:
+    except Exception as exc:
+        print(f"[roar] cannot import RoarNodeAgent: {exc}")
         return
 
     try:
         nodes = ray_module.nodes()
-    except Exception:
+    except Exception as exc:
+        print(f"[roar] ray.nodes() failed: {exc}")
         return
 
-    for node in nodes:
-        if not isinstance(node, dict) or not node.get("Alive"):
-            continue
+    alive_nodes = [n for n in nodes if isinstance(n, dict) and n.get("Alive")]
+    print(f"[roar] cluster has {len(alive_nodes)} alive nodes (of {len(nodes)} total)")
 
+    for node in alive_nodes:
         node_id = str(node.get("NodeID") or "")
         if not node_id:
             continue
@@ -520,6 +976,7 @@ def _spawn_node_agents(ray_module, job_id: str, log_dir: str) -> None:
         agent = None
         try:
             agent = ray_module.get_actor(actor_name, namespace="roar")
+            print(f"[roar] found existing agent {actor_name}")
         except Exception:
             remote_options = {
                 "name": actor_name,
@@ -537,15 +994,25 @@ def _spawn_node_agents(ray_module, job_id: str, log_dir: str) -> None:
             try:
                 agent = RoarNodeAgent.options(**remote_options).remote(
                     job_id=job_id,
-                    log_dir=log_dir,
                 )
-            except Exception:
+                print(f"[roar] spawned agent {actor_name} on node {node_id[:8]}")
+            except Exception as exc:
+                print(f"[roar] FAILED to spawn agent {actor_name}: {exc}")
                 agent = None
 
         if agent is not None:
             with _ray_node_agents_lock:
                 _ray_node_agents[node_id] = {"name": actor_name, "actor": agent}
 
+    # Wait for all agents to be ready (proxy listening) before returning.
+    with _ray_node_agents_lock:
+        agents_to_wait = list(_ray_node_agents.values())
+    for info in agents_to_wait:
+        agent = info.get("actor")
+        if agent is not None:
+            with contextlib.suppress(Exception):
+                ray_module.get(agent.get_proxy_port.remote(), timeout=15)
+
 
 def _collect_node_agent_logs(ray_module) -> dict[str, dict]:
     with _ray_node_agents_lock:
@@ -645,7 +1112,6 @@ def _prime_new_ray_nodes(ray_module, seen_node_ids):
         _spawn_node_agents(
             ray_module,
             job_id=str(os.environ.get("ROAR_JOB_ID", "default")),
-            log_dir=str(os.environ.get("ROAR_LOG_DIR", _DEFAULT_RAY_LOG_DIR)),
         )
 
     seen_node_ids.update(current_node_ids)
@@ -693,7 +1159,6 @@ def _node_resource_key(ray_module, node_id: str) -> str | None:
 def _load_ray_config() -> dict[str, object]:
     config_enabled = True
     config_pip_install = False
-    config_log_dir = _DEFAULT_RAY_LOG_DIR
 
     try:
         from roar.config import load_config
@@ -703,9 +1168,6 @@ def _load_ray_config() -> dict[str, object]:
         ray_section = config.get("ray", {})
         if isinstance(ray_section, dict):
             config_enabled = bool(ray_section.get("enabled", True))
-            maybe_log_dir = ray_section.get("log_dir")
-            if isinstance(maybe_log_dir, str) and maybe_log_dir.strip():
-                config_log_dir = maybe_log_dir
 
         explicit_pip_install = _load_explicit_ray_pip_install(start_dir)
         if explicit_pip_install is not None:
@@ -713,14 +1175,9 @@ def _load_ray_config() -> dict[str, object]:
     except Exception:
         pass
 
-    env_log_dir = os.environ.get("ROAR_LOG_DIR")
-    if env_log_dir:
-        config_log_dir = env_log_dir
-
     return {
         "enabled": config_enabled,
         "pip_install": config_pip_install,
-        "log_dir": config_log_dir,
     }
 
 
@@ -803,41 +1260,199 @@ def _register_pre_shutdown_ray_collection() -> None:
 
     Ray registers its own shutdown hook during init. Registering this hook
     afterwards ensures worker logs are collected before Ray tears down.
+
+    The hook explicitly invokes the patched `ray.shutdown()` path while the
+    interpreter is still healthy. That is more reliable than trying to make
+    remote Ray calls from a late atexit hook after shutdown has already begun.
     """
     global _ray_collect_pre_shutdown_registered
 
     if _ray_collect_pre_shutdown_registered:
         return
 
-    atexit.register(_collect_ray_io)
+    atexit.register(_shutdown_ray_at_exit)
     _ray_collect_pre_shutdown_registered = True
 
 
+def _shutdown_ray_at_exit() -> None:
+    ray_module = sys.modules.get("ray")
+    if ray_module is None:
+        _collect_ray_io()
+        return
+
+    shutdown = getattr(ray_module, "shutdown", None)
+    is_initialized = getattr(ray_module, "is_initialized", None)
+    if callable(is_initialized):
+        with contextlib.suppress(Exception):
+            if not is_initialized():
+                return
+
+    if callable(shutdown):
+        with contextlib.suppress(Exception):
+            shutdown()
+            return
+
+    _collect_ray_io()
+
+
 def _collect_ray_io(proxy_logs: dict[str, dict] | None = None) -> None:
-    """Atexit hook: collect worker I/O logs and write to the roar DB."""
+    """Shutdown hook: parse proxy logs from node agents and emit as GLaaS fragments."""
     if os.environ.get("ROAR_WRAP") != "1":
         return
+
+    if proxy_logs is None and _node_agents_enabled():
+        ray_module = sys.modules.get("ray")
+        if ray_module is not None:
+            with contextlib.suppress(Exception):
+                proxy_logs = _collect_node_agent_logs(ray_module)
+
+    if not proxy_logs:
+        return
+
     try:
-        log_dir = os.environ.get("ROAR_LOG_DIR")
-        if not log_dir:
-            ray_config = _load_ray_config()
-            log_dir = str(ray_config["log_dir"])
-
-        # Fast path: skip the heavy collector import if there's nothing to collect.
-        # Worker fragment files are .json files written to log_dir by Ray workers.
-        has_worker_logs = os.path.isdir(log_dir) and any(
-            f.endswith(".json") for f in os.listdir(log_dir)
+        import time as _time
+
+        from roar.ray.fragment import TaskFragment, derive_task_uid
+        from roar.ray.roar_worker import _parse_proxy_log_lines
+    except Exception:
+        return
+
+    now = _time.time()
+    roar_job_id = str(os.environ.get("ROAR_JOB_ID", "default"))
+    driver_job_uid = str(os.environ.get("ROAR_JOB_ID", ""))
+
+    fragments: list[dict[str, object]] = []
+    parsed_refs: list = []
+    for node_id, payload in proxy_logs.items():
+        if not isinstance(payload, dict):
+            continue
+        lines = payload.get("proxy_log_lines", [])
+        if not isinstance(lines, list):
+            continue
+        parsed = _parse_proxy_log_lines([str(line) for line in lines if line])
+        if not parsed:
+            continue
+
+        parsed_refs.extend(parsed)
+        runtime_node_id = str(payload.get("node_id") or node_id or "")
+        proxy_task_id = f"proxy:{runtime_node_id or 'unknown'}"
+        fragment = TaskFragment(
+            job_uid=derive_task_uid(roar_job_id, proxy_task_id),
+            parent_job_uid=driver_job_uid,
+            ray_task_id=proxy_task_id,
+            ray_worker_id="",
+            ray_node_id=runtime_node_id,
+            ray_actor_id=None,
+            function_name="s3_proxy",
+            started_at=now,
+            ended_at=now,
+            exit_code=0,
         )
-        if not has_worker_logs and not proxy_logs:
+        for kind, ref in parsed:
+            if kind == "write":
+                fragment.writes.append(ref)
+            else:
+                fragment.reads.append(ref)
+        if fragment.reads or fragment.writes:
+            fragments.append(fragment.to_dict())
+
+    if not fragments:
+        return
+
+    # Try GLaaS fragment streaming first, fall back to direct DB write.
+    session_id = os.environ.get("ROAR_SESSION_ID", "")
+    fragment_token = os.environ.get("ROAR_FRAGMENT_TOKEN", "")
+    glaas_url = os.environ.get("GLAAS_URL") or ""
+
+    if session_id and fragment_token and glaas_url:
+        try:
+            from roar.ray.glaas_fragment_streamer import GlaasFragmentStreamer
+
+            streamer = GlaasFragmentStreamer(
+                session_id=session_id,
+                token=fragment_token,
+                glaas_url=glaas_url,
+            )
+            for fragment in fragments:
+                streamer.append_fragment(fragment)
+            streamer.close()
             return
+        except Exception:
+            pass
 
-        from roar.ray.collector import collect
+    # Fallback: write directly to local roar.db (works when driver has project access).
+    _write_proxy_artifacts_to_db(parsed_refs)
 
-        collect(
-            project_dir=os.environ.get("ROAR_PROJECT_DIR"),
-            log_dir=log_dir,
-            proxy_logs=proxy_logs or {},
-        )
+
+def _write_proxy_artifacts_to_db(parsed: list) -> None:
+    """Write proxy-captured artifacts directly to roar.db."""
+    import sqlite3
+
+    project_dir = os.environ.get("ROAR_PROJECT_DIR", "")
+    if not project_dir:
+        # Try to find .roar in common locations.
+        for candidate in [os.getcwd(), os.environ.get("HOME", "")]:
+            if candidate and os.path.isfile(os.path.join(candidate, ".roar", "roar.db")):
+                project_dir = candidate
+                break
+    if not project_dir:
+        return
+
+    db_path = os.path.join(project_dir, ".roar", "roar.db")
+    if not os.path.isfile(db_path):
+        return
+
+    try:
+        import time as _time
+
+        conn = sqlite3.connect(db_path, timeout=10)
+        try:
+            # Discover available columns.
+            cursor = conn.execute("PRAGMA table_info(artifacts)")
+            columns = {row[1] for row in cursor.fetchall()}
+            now = _time.time()
+
+            for _kind, ref in parsed:
+                artifact_id = uuid.uuid4().hex
+                fields = ["id", "size", "first_seen_at", "first_seen_path", "kind", "metadata"]
+                values: list = [artifact_id, ref.size or 0, now, ref.path, "primitive", "{}"]
+
+                if "path" in columns:
+                    fields.append("path")
+                    values.append(ref.path)
+                if "hash" in columns:
+                    fields.append("hash")
+                    values.append(ref.hash)
+                if "source_type" in columns:
+                    fields.append("source_type")
+                    values.append("s3" if ref.path.startswith("s3://") else None)
+                if "capture_method" in columns:
+                    fields.append("capture_method")
+                    values.append(ref.capture_method or "proxy")
+
+                placeholders = ", ".join("?" for _ in fields)
+                field_list = ", ".join(fields)
+                conn.execute(
+                    f"INSERT OR IGNORE INTO artifacts ({field_list}) VALUES ({placeholders})",
+                    values,
+                )
+
+                # Record hash if available.
+                if ref.hash and "artifact_hashes" in {
+                    row[0]
+                    for row in conn.execute(
+                        "SELECT name FROM sqlite_master WHERE type='table'"
+                    ).fetchall()
+                }:
+                    conn.execute(
+                        "INSERT OR IGNORE INTO artifact_hashes (artifact_id, algorithm, digest) "
+                        "VALUES (?, ?, ?)",
+                        (artifact_id, ref.hash_algorithm or "etag", ref.hash),
+                    )
+
+            conn.commit()
+        finally:
+            conn.close()
     except Exception:
         pass
 
@@ -846,6 +1461,13 @@ def _stop_ray_node_poller() -> None:
     _ray_node_poller_stop.set()
 
 
+if _phase_capture_enabled():
+    _patch_driver_phase_subprocess_capture()
+
+if os.environ.get("ROAR_DRIVER_PHASE_PROXY_URL"):
+    with contextlib.suppress(Exception):
+        _patch_driver_phase_s3_clients()
+
+
 atexit.register(_write_log)
-atexit.register(_collect_ray_io)
 atexit.register(_stop_ray_node_poller)
diff --git a/roar/services/execution/provenance/data_loader.py b/roar/services/execution/provenance/data_loader.py
index fb1a8d4a..4421c336 100644
--- a/roar/services/execution/provenance/data_loader.py
+++ b/roar/services/execution/provenance/data_loader.py
@@ -82,6 +82,10 @@ def _normalize_files(self, data: dict) -> list[dict]:
                     "read": bool(record.get("read", False)),
                     "written": bool(record.get("written", False)),
                 }
+                for key in ("read_threads", "written_threads"):
+                    value = record.get(key)
+                    if isinstance(value, list):
+                        item[key] = [thread for thread in value if isinstance(thread, int)]
                 if "chunks_read" in record:
                     item["chunks_read"] = record.get("chunks_read")
                 if "chunks_written" in record:
diff --git a/roar/services/execution/proxy.py b/roar/services/execution/proxy.py
index a29b2f90..696a4267 100644
--- a/roar/services/execution/proxy.py
+++ b/roar/services/execution/proxy.py
@@ -137,6 +137,7 @@ def start_for_run(
         session_id: str | None = None,
         job_id: str | None = None,
         upstream_url: str | None = None,
+        port: int | None = None,
     ) -> ProxyHandle:
         """Start a proxy for a single `roar run`.
 
@@ -153,10 +154,11 @@ def start_for_run(
                 "  cargo build --release --manifest-path rust/Cargo.toml -p roar-proxy"
             )
 
-        # Find a free port
-        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-            s.bind(("127.0.0.1", 0))
-            port = s.getsockname()[1]
+        if port is None:
+            # Find a free port
+            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+                s.bind(("127.0.0.1", 0))
+                port = s.getsockname()[1]
 
         cmd = [proxy_path, "--port", str(port)]
         if session_id:
diff --git a/roar/services/execution/tracer_backends.py b/roar/services/execution/tracer_backends.py
index 3e32303e..d05eeced 100644
--- a/roar/services/execution/tracer_backends.py
+++ b/roar/services/execution/tracer_backends.py
@@ -11,7 +11,9 @@
 import shutil
 import subprocess
 import sys
+import tempfile
 from dataclasses import dataclass
+from functools import cache
 from pathlib import Path
 
 from ...core.tracer_modes import TRACER_BACKEND_ORDER
@@ -251,7 +253,8 @@ def preload_readiness(package_path: Path, launcher_path: str | None = None) -> T
     library_path = find_preload_library(package_path)
     if not library_path:
         return TracerReadiness(ok=False, reason="preload library not found")
-    return TracerReadiness(ok=True, reason=None)
+
+    return _probe_preload_launcher(launcher_path, library_path)
 
 
 def preload_is_ready(
@@ -320,3 +323,49 @@ def _find_binary(package_path: Path, binary_name: str) -> str | None:
 
     resolved = shutil.which(binary_name)
     return resolved if resolved else None
+
+
+@cache
+def _probe_preload_launcher(launcher_path: str, library_path: str) -> TracerReadiness:
+    env = dict(os.environ)
+    env["ROAR_PRELOAD_LIB"] = library_path
+
+    with tempfile.TemporaryDirectory(prefix="roar-preload-check-") as tmp_dir:
+        report_path = Path(tmp_dir) / "probe.json"
+        command = [launcher_path, str(report_path), sys.executable, "-c", "pass"]
+
+        try:
+            result = subprocess.run(
+                command,
+                capture_output=True,
+                text=True,
+                timeout=15,
+                check=False,
+                env=env,
+            )
+        except OSError as exc:
+            return TracerReadiness(ok=False, reason=f"preload launcher failed to exec: {exc}")
+        except subprocess.TimeoutExpired:
+            return TracerReadiness(ok=False, reason="preload launcher probe timed out")
+
+        if result.returncode != 0:
+            detail = _first_nonempty_line(result.stderr) or _first_nonempty_line(result.stdout)
+            if detail:
+                return TracerReadiness(ok=False, reason=f"preload launcher probe failed: {detail}")
+            return TracerReadiness(
+                ok=False,
+                reason=f"preload launcher probe failed with exit code {result.returncode}",
+            )
+
+        if not report_path.exists():
+            return TracerReadiness(ok=False, reason="preload launcher probe produced no report")
+
+    return TracerReadiness(ok=True, reason=None)
+
+
+def _first_nonempty_line(text: str) -> str | None:
+    for raw_line in text.splitlines():
+        line = raw_line.strip()
+        if line:
+            return line[:200]
+    return None
diff --git a/roar/services/put/composite_builder.py b/roar/services/put/composite_builder.py
index a338dfa1..4d995270 100644
--- a/roar/services/put/composite_builder.py
+++ b/roar/services/put/composite_builder.py
@@ -43,6 +43,7 @@ class CompositeLeaf:
     size: int
     component_type: str | None
     leaf_kind: str = "file"
+    component_algorithm: str = "blake3"
 
 
 @dataclass(frozen=True)
@@ -85,6 +86,22 @@ def build_for_root(
         Returns None when no hashable leaf components are available.
         """
         leaves = self._collect_leaves(root_path, resolved_sources, hashes_by_path)
+        return self.build_for_leaves(
+            root_path=str(root_path),
+            leaves=leaves,
+            session_hash=session_hash,
+            source_type=source_type,
+        )
+
+    def build_for_leaves(
+        self,
+        *,
+        root_path: str,
+        leaves: list[CompositeLeaf],
+        session_hash: str,
+        source_type: str | None,
+    ) -> CompositeBuildResult | None:
+        """Build a composite payload from already-resolved leaf components."""
         if not leaves:
             return None
 
@@ -183,6 +200,7 @@ def _collect_leaves(
                     size=component_size,
                     component_type=component_type,
                     leaf_kind=leaf_kind,
+                    component_algorithm="blake3",
                 )
             )
 
@@ -229,7 +247,7 @@ def _component_payload(leaf: CompositeLeaf) -> dict[str, Any]:
         return {
             "relative_path": leaf.relative_path,
             "leaf_kind": leaf.leaf_kind,
-            "component_algorithm": "blake3",
+            "component_algorithm": leaf.component_algorithm,
             "component_digest": leaf.digest,
             "component_size": leaf.size,
             "component_type": leaf.component_type,
diff --git a/roar/services/registration/register_service.py b/roar/services/registration/register_service.py
index a428c078..6d21a966 100644
--- a/roar/services/registration/register_service.py
+++ b/roar/services/registration/register_service.py
@@ -12,6 +12,7 @@
 """
 
 import os
+import re
 from collections.abc import Callable
 from dataclasses import dataclass, field
 from functools import cached_property
@@ -47,6 +48,9 @@
 
 boto3 = None
 
+_STEP_REFERENCE_RE = re.compile(r"^@(?:B)?\d+$", re.IGNORECASE)
+_SESSION_HASH_RE = re.compile(r"^[a-f0-9]{8,64}$")
+
 
 def _ensure_boto3():
     global boto3
@@ -151,6 +155,145 @@ def session_service(self) -> SessionRegistrationService:
         """Get or create session service."""
         return self._session_service or SessionRegistrationService()
 
+    def register_lineage_target(
+        self,
+        target: str,
+        roar_dir: Path,
+        cwd: Path,
+        dry_run: bool = False,
+        as_blake3: bool = False,
+        skip_confirmation: bool = False,
+        confirm_callback: Callable[[list[str]], bool] | None = None,
+    ) -> RegisterResult:
+        """Register lineage for an artifact path, step reference, or session hash."""
+        normalized_target = target.strip()
+        if self._is_step_reference(normalized_target):
+            return self.register_step_lineage(
+                step_reference=normalized_target,
+                roar_dir=roar_dir,
+                cwd=cwd,
+                dry_run=dry_run,
+                as_blake3=as_blake3,
+                skip_confirmation=skip_confirmation,
+                confirm_callback=confirm_callback,
+            )
+
+        resolved_path = self._resolve_path(normalized_target, cwd)
+        if resolved_path and (self._is_s3_url(normalized_target) or os.path.exists(resolved_path)):
+            return self.register_artifact_lineage(
+                artifact_path=normalized_target,
+                roar_dir=roar_dir,
+                cwd=cwd,
+                dry_run=dry_run,
+                as_blake3=as_blake3,
+                skip_confirmation=skip_confirmation,
+                confirm_callback=confirm_callback,
+            )
+
+        if self._looks_like_session_hash(normalized_target):
+            return self.register_session_lineage(
+                session_hash=normalized_target,
+                roar_dir=roar_dir,
+                cwd=cwd,
+                dry_run=dry_run,
+                as_blake3=as_blake3,
+                skip_confirmation=skip_confirmation,
+                confirm_callback=confirm_callback,
+            )
+
+        return self.register_artifact_lineage(
+            artifact_path=normalized_target,
+            roar_dir=roar_dir,
+            cwd=cwd,
+            dry_run=dry_run,
+            as_blake3=as_blake3,
+            skip_confirmation=skip_confirmation,
+            confirm_callback=confirm_callback,
+        )
+
+    def register_step_lineage(
+        self,
+        step_reference: str,
+        roar_dir: Path,
+        cwd: Path,
+        dry_run: bool = False,
+        as_blake3: bool = False,
+        skip_confirmation: bool = False,
+        confirm_callback: Callable[[list[str]], bool] | None = None,
+    ) -> RegisterResult:
+        """Register lineage for a local DAG step reference like ``@4``."""
+        parsed = self._parse_step_reference(step_reference)
+        if parsed is None:
+            return RegisterResult(success=False, error=f"Invalid DAG reference: {step_reference}")
+        step_number, is_build = parsed
+
+        with create_database_context(roar_dir) as db_ctx:
+            session = db_ctx.sessions.get_active()
+            if not session:
+                return RegisterResult(
+                    success=False,
+                    error="No active session. Run 'roar run' to create a session first.",
+                )
+            lineage = self.lineage_collector.collect_step(
+                session_id=int(session["id"]),
+                step_number=step_number,
+                roar_dir=roar_dir,
+                job_type="build" if is_build else None,
+            )
+
+        if not lineage.jobs:
+            return RegisterResult(
+                success=False,
+                error=f"No tracked jobs found for DAG reference {step_reference}.",
+            )
+
+        representative_hash = self._select_representative_hash(lineage)
+        return self._register_collected_lineage(
+            lineage=lineage,
+            roar_dir=roar_dir,
+            cwd=cwd,
+            session_id=int(lineage.pipeline["id"]) if lineage.pipeline else None,
+            artifact_hash=representative_hash,
+            dry_run=dry_run,
+            as_blake3=as_blake3,
+            skip_confirmation=skip_confirmation,
+            confirm_callback=confirm_callback,
+        )
+
+    def register_session_lineage(
+        self,
+        session_hash: str,
+        roar_dir: Path,
+        cwd: Path,
+        dry_run: bool = False,
+        as_blake3: bool = False,
+        skip_confirmation: bool = False,
+        confirm_callback: Callable[[list[str]], bool] | None = None,
+    ) -> RegisterResult:
+        """Register the complete local session identified by a GLaaS session hash or prefix."""
+        with create_database_context(roar_dir) as db_ctx:
+            session, resolved_hash, error = self._resolve_session_target(
+                db_ctx=db_ctx,
+                roar_dir=roar_dir,
+                session_hash=session_hash,
+            )
+            if session is None:
+                return RegisterResult(success=False, error=error or "Session not found.")
+            lineage = self.lineage_collector.collect_session(int(session["id"]), roar_dir)
+
+        return self._register_collected_lineage(
+            lineage=lineage,
+            roar_dir=roar_dir,
+            cwd=cwd,
+            session_id=int(session["id"]),
+            artifact_hash="",
+            dry_run=dry_run,
+            as_blake3=as_blake3,
+            skip_confirmation=skip_confirmation,
+            confirm_callback=confirm_callback,
+            session_hash_override=resolved_hash,
+        )
+
     def register_artifact_lineage(
         self,
         artifact_path: str,
@@ -237,6 +380,39 @@ def register_artifact_lineage(
                 )
 
             self._logger.debug("Active session: %d", session["id"])
+            lineage = self.lineage_collector.collect([artifact_hash], roar_dir)
+
+        return self._register_collected_lineage(
+            lineage=lineage,
+            roar_dir=roar_dir,
+            cwd=cwd,
+            session_id=int(session["id"]),
+            artifact_hash=artifact_hash,
+            dry_run=dry_run,
+            as_blake3=as_blake3,
+            skip_confirmation=skip_confirmation,
+            confirm_callback=confirm_callback,
+        )
+
+    def _register_collected_lineage(
+        self,
+        *,
+        lineage: LineageData,
+        roar_dir: Path,
+        cwd: Path,
+        session_id: int | None,
+        artifact_hash: str,
+        dry_run: bool,
+        as_blake3: bool,
+        skip_confirmation: bool,
+        confirm_callback: Callable[[list[str]], bool] | None,
+        session_hash_override: str | None = None,
+    ) -> RegisterResult:
+        self._logger.debug(
+            "Collected lineage: %d jobs, %d artifacts",
+            len(lineage.jobs),
+            len(lineage.artifacts),
+        )
 
         # Step 5: Get git context
         git_context = self._get_git_context(cwd)
@@ -248,7 +424,7 @@ def register_artifact_lineage(
         # Step 5.5: Check for uncommitted changes (required for tagging)
         tagging_enabled = config_get("registration.tagging.enabled")
         if tagging_enabled is None:
-            tagging_enabled = True  # Default to enabled
+            tagging_enabled = True
         if tagging_enabled and git_context.commit:
             vcs = GitVCSProvider()
             repo_root = vcs.get_repo_root(str(cwd))
@@ -261,31 +437,19 @@ def register_artifact_lineage(
                         error="Cannot register with uncommitted changes. Commit your changes first.",
                     )
 
-        # Step 6: Collect lineage
-        lineage: LineageData = self.lineage_collector.collect([artifact_hash], roar_dir)
-        self._logger.debug(
-            "Collected lineage: %d jobs, %d artifacts",
-            len(lineage.jobs),
-            len(lineage.artifacts),
-        )
-
-        # Step 7: Compute session hash
-        session_hash = self.session_service.compute_session_hash(
+        session_hash = session_hash_override or self.session_service.compute_session_hash(
             roar_dir=str(roar_dir),
-            session_id=session["id"],
+            session_id=session_id,
         )
         self._logger.debug("Session hash: %s", session_hash[:12])
 
-        # Step 7.5: Detect secrets in lineage data
         detected_secrets: list[str] = []
         if self.omit_filter:
             detected_secrets = self._detect_secrets_in_lineage(lineage, git_context)
             self._logger.debug("Detected %d potential secret types", len(detected_secrets))
 
             if detected_secrets and not skip_confirmation:
-                # Need confirmation from user
                 if confirm_callback is None:
-                    # No callback provided, abort
                     return RegisterResult(
                         success=False,
                         session_hash=session_hash,
@@ -294,7 +458,6 @@ def register_artifact_lineage(
                         aborted_by_user=True,
                     )
 
-                # Ask user for confirmation
                 if not confirm_callback(detected_secrets):
                     return RegisterResult(
                         success=False,
@@ -304,11 +467,9 @@ def register_artifact_lineage(
                         aborted_by_user=True,
                     )
 
-            # Filter secrets from jobs before registration
             if detected_secrets or self.omit_filter.enabled:
                 lineage = self._filter_lineage_secrets(lineage, git_context)
 
-        # Step 8: Dry-run mode - return counts without calling API
         if dry_run:
             return RegisterResult(
                 success=True,
@@ -321,18 +482,15 @@ def register_artifact_lineage(
                 secrets_redacted=bool(detected_secrets),
             )
 
-        # Step 8.5: Upgrade S3 etag-only artifact hashes to blake3 (optional)
         if as_blake3:
             self.upgrade_s3_etags_to_blake3(roar_dir=roar_dir, lineage=lineage)
 
-        # Step 9: Check GLaaS configuration
         if not self.glaas_client.is_configured():
             return RegisterResult(
                 success=False,
                 error="GLaaS not configured. Run 'roar config set glaas.url <url>' first.",
             )
 
-        # Step 10: Health check
         try:
             self.glaas_client.health_check()
         except Exception as e:
@@ -341,7 +499,6 @@ def register_artifact_lineage(
                 error=f"GLaaS health check failed: {e}",
             )
 
-        # Step 11: Register session
         session_result = self.session_service.register(session_hash, git_context)
         if not session_result.success:
             return RegisterResult(
@@ -350,15 +507,15 @@ def register_artifact_lineage(
                 error=f"Session registration failed: {session_result.error}",
             )
 
-        # Step 12: Register lineage via coordinator
         batch_result: BatchRegistrationResult = self.coordinator.register_lineage(
             session_hash=session_hash,
             git_context=git_context,
-            jobs=lineage.jobs,
+            jobs=self._order_jobs_for_registration(
+                self._normalize_jobs_for_registration(lineage.jobs)
+            ),
             artifacts=self._prepare_artifacts(lineage.artifacts, session_hash),
         )
 
-        # Step 13: Create git tag if enabled
         if tagging_enabled and git_context.commit:
             tag_name = f"roar/{git_context.commit[:8]}"
             vcs = GitVCSProvider()
@@ -368,7 +525,6 @@ def register_artifact_lineage(
                 if not success:
                     self._logger.debug("Failed to create git tag: %s", tag_error)
 
-        # Build result
         if batch_result.errors:
             self._logger.warning("Registration completed with errors: %s", batch_result.errors)
 
@@ -384,6 +540,159 @@ def register_artifact_lineage(
             secrets_redacted=bool(detected_secrets),
         )
 
+    def _resolve_session_target(
+        self,
+        *,
+        db_ctx,
+        roar_dir: Path,
+        session_hash: str,
+    ) -> tuple[dict | None, str | None, str | None]:
+        candidates: list[tuple[dict, str]] = []
+        for session in db_ctx.sessions.get_all():
+            resolved_hash = self.session_service.compute_session_hash(
+                roar_dir=str(roar_dir),
+                session_id=int(session["id"]),
+            )
+            if resolved_hash.startswith(session_hash):
+                candidates.append((session, resolved_hash))
+
+        if len(candidates) == 1:
+            return candidates[0][0], candidates[0][1], None
+        if len(candidates) > 1:
+            return (
+                None,
+                None,
+                (
+                    f"Ambiguous session hash prefix '{session_hash}'. "
+                    "Provide more characters to select a single local session."
+                ),
+            )
+
+        local_session = db_ctx.sessions.get_by_hash_prefix(session_hash)
+        if local_session:
+            resolved_hash = self.session_service.compute_session_hash(
+                roar_dir=str(roar_dir),
+                session_id=int(local_session["id"]),
+            )
+            return local_session, resolved_hash, None
+
+        return None, None, f"No local session matches '{session_hash}'."
+
+    def _select_representative_hash(self, lineage: LineageData) -> str:
+        hashes = sorted(str(hash_value) for hash_value in lineage.artifact_hashes if hash_value)
+        if len(hashes) == 1:
+            return hashes[0]
+        return ""
+
+    def _normalize_jobs_for_registration(self, jobs: list[dict]) -> list[dict]:
+        normalized = [dict(job) for job in jobs]
+        known_job_uids = {
+            str(job["job_uid"]) for job in normalized if isinstance(job.get("job_uid"), str)
+        }
+        root_candidates = [job for job in normalized if self._is_local_parent_candidate(job)]
+        if not root_candidates:
+            root_candidates = [
+                job for job in normalized if not str(job.get("command", "")).startswith("ray_task:")
+            ]
+
+        for job in normalized:
+            parent_uid = str(job.get("parent_job_uid") or "").strip()
+            if not parent_uid or parent_uid in known_job_uids:
+                continue
+
+            inferred_parent_uid = self._infer_local_parent_uid(job, root_candidates)
+            if inferred_parent_uid:
+                job["parent_job_uid"] = inferred_parent_uid
+            else:
+                job["parent_job_uid"] = None
+
+        return normalized
+
+    def _order_jobs_for_registration(self, jobs: list[dict]) -> list[dict]:
+        jobs_by_uid = {
+            str(job["job_uid"]): job for job in jobs if isinstance(job.get("job_uid"), str)
+        }
+        ordered: list[dict] = []
+        seen: set[str] = set()
+
+        def visit(job: dict) -> None:
+            parent_uid = job.get("parent_job_uid")
+            if isinstance(parent_uid, str) and parent_uid:
+                parent = jobs_by_uid.get(parent_uid)
+                if parent is not None:
+                    visit(parent)
+
+            visit_key = str(job.get("job_uid") or f"id:{job.get('id')}")
+            if visit_key in seen:
+                return
+            seen.add(visit_key)
+            ordered.append(job)
+
+        for job in sorted(
+            jobs,
+            key=lambda item: (
+                int(item.get("step_number") or 0),
+                float(item.get("timestamp") or 0.0),
+                int(item.get("id") or 0),
+            ),
+        ):
+            visit(job)
+
+        return ordered
+
+    def _infer_local_parent_uid(self, job: dict, candidates: list[dict]) -> str | None:
+        job_step = int(job.get("step_number") or 0)
+        job_timestamp = float(job.get("timestamp") or 0.0)
+
+        eligible = [
+            candidate
+            for candidate in candidates
+            if (
+                int(candidate.get("step_number") or 0) < job_step
+                or (
+                    int(candidate.get("step_number") or 0) == job_step
+                    and float(candidate.get("timestamp") or 0.0) <= job_timestamp
+                )
+            )
+        ]
+        if not eligible:
+            return None
+
+        preferred = max(eligible, key=self._parent_candidate_sort_key)
+        inferred_uid = preferred.get("job_uid")
+        return str(inferred_uid) if inferred_uid else None
+
+    def _is_local_parent_candidate(self, job: dict) -> bool:
+        command = str(job.get("command", "") or "")
+        job_type = str(job.get("job_type", "") or "")
+        return not command.startswith("ray_task:") and job_type != "build"
+
+    def _parent_candidate_sort_key(self, job: dict) -> tuple[int, int, float, int]:
+        command = str(job.get("command", "") or "")
+        return (
+            1 if "ray job submit" in command else 0,
+            int(job.get("step_number") or 0),
+            float(job.get("timestamp") or 0.0),
+            int(job.get("id") or 0),
+        )
+
+    def _is_step_reference(self, target: str) -> bool:
+        return bool(_STEP_REFERENCE_RE.match(target))
+
+    def _looks_like_session_hash(self, target: str) -> bool:
+        return bool(_SESSION_HASH_RE.match(target))
+
+    def _parse_step_reference(self, reference: str) -> tuple[int, bool] | None:
+        if not self._is_step_reference(reference):
+            return None
+        step_ref = reference[1:]
+        is_build = step_ref.upper().startswith("B")
+        if is_build:
+            step_ref = step_ref[1:]
+        if not step_ref.isdigit():
+            return None
+        return int(step_ref), is_build
+
     def _resolve_path(self, path: str, cwd: Path) -> str | None:
         """Resolve artifact path to absolute path."""
         if os.path.isabs(path):
diff --git a/roar/services/upload/lineage_collector.py b/roar/services/upload/lineage_collector.py
index 29318a9d..77889094 100644
--- a/roar/services/upload/lineage_collector.py
+++ b/roar/services/upload/lineage_collector.py
@@ -113,6 +113,95 @@ def collect(
             pipeline=pipeline,
         )
 
+    def collect_step(
+        self,
+        session_id: int,
+        step_number: int,
+        roar_dir: Path,
+        job_type: str | None = None,
+    ) -> LineageData:
+        """Collect lineage for a visible DAG step within a session."""
+        with create_database_context(roar_dir) as ctx_db:
+            session = ctx_db.sessions.get(session_id)
+            if not session:
+                return LineageData()
+
+            step_jobs = self._get_step_jobs(ctx_db, session_id, step_number, job_type=job_type)
+            if not step_jobs:
+                return LineageData(pipeline=session)
+
+            hydrated_step_jobs = [self._hydrate_job(ctx_db, job) for job in step_jobs]
+            target_hashes = sorted(
+                {
+                    digest
+                    for job in hydrated_step_jobs
+                    for digest in job.get("_output_hashes", [])
+                    if digest
+                }
+            )
+            if not target_hashes:
+                target_hashes = sorted(
+                    {
+                        digest
+                        for job in hydrated_step_jobs
+                        for digest in job.get("_input_hashes", [])
+                        if digest
+                    }
+                )
+
+            if target_hashes:
+                lineage_jobs = ctx_db.lineage.get_lineage_jobs(target_hashes)
+                if session:
+                    lineage_jobs = self._add_build_jobs(
+                        ctx_db, session, lineage_jobs, set(target_hashes)
+                    )
+                lineage_jobs = self._add_parent_jobs(ctx_db, lineage_jobs)
+                lineage_jobs = self._add_parent_linked_ray_tasks(ctx_db, lineage_jobs)
+            else:
+                lineage_jobs = []
+
+            seen_ids = {job["id"] for job in lineage_jobs}
+            for job in hydrated_step_jobs:
+                if job["id"] not in seen_ids:
+                    lineage_jobs.append(job)
+                    seen_ids.add(job["id"])
+
+            lineage_jobs.sort(key=lambda job: job["timestamp"])
+            all_hashes = self._collect_all_hashes(lineage_jobs)
+            artifacts = self._get_artifact_info(ctx_db, all_hashes)
+
+        return LineageData(
+            jobs=lineage_jobs,
+            artifacts=artifacts,
+            artifact_hashes=all_hashes,
+            pipeline=session,
+        )
+
+    def collect_session(
+        self,
+        session_id: int,
+        roar_dir: Path,
+    ) -> LineageData:
+        """Collect all jobs and artifacts recorded in a local session."""
+        with create_database_context(roar_dir) as ctx_db:
+            session = ctx_db.sessions.get(session_id)
+            if not session:
+                return LineageData()
+
+            jobs = [self._hydrate_job(ctx_db, job) for job in ctx_db.sessions.get_steps(session_id)]
+            jobs = self._add_parent_jobs(ctx_db, jobs)
+            jobs = self._add_parent_linked_ray_tasks(ctx_db, jobs)
+            jobs.sort(key=lambda job: job["timestamp"])
+            all_hashes = self._collect_all_hashes(jobs)
+            artifacts = self._get_artifact_info(ctx_db, all_hashes)
+
+        return LineageData(
+            jobs=jobs,
+            artifacts=artifacts,
+            artifact_hashes=all_hashes,
+            pipeline=session,
+        )
+
     def _add_build_jobs(
         self,
         ctx_db,
@@ -328,3 +417,55 @@ def _get_artifact_info(self, ctx_db, hashes: set[str]) -> list[dict]:
                 artifact["hash"] = h  # Add the hash we looked up
                 artifacts.append(artifact)
         return artifacts
+
+    def _get_step_jobs(
+        self,
+        ctx_db,
+        session_id: int,
+        step_number: int,
+        job_type: str | None = None,
+    ) -> list[dict]:
+        jobs = []
+        for job in ctx_db.sessions.get_steps(session_id):
+            if int(job.get("step_number") or 0) != step_number:
+                continue
+            normalized_job_type = job.get("job_type")
+            if job_type == "build":
+                if normalized_job_type != "build":
+                    continue
+            elif normalized_job_type == "build":
+                continue
+            jobs.append(job)
+        return jobs
+
+    def _hydrate_job(self, ctx_db, job: dict) -> dict:
+        job_dict = dict(job)
+        job_id = job_dict["id"]
+        inputs = ctx_db.jobs.get_inputs(job_id)
+        outputs = ctx_db.jobs.get_outputs(job_id)
+
+        job_dict["_input_hashes"] = [
+            h for h in (_extract_primary_digest(inp) for inp in inputs) if h
+        ]
+        job_dict["_output_hashes"] = [
+            h for h in (_extract_primary_digest(out) for out in outputs) if h
+        ]
+        job_dict["_inputs"] = [
+            {
+                "hash": h,
+                "path": inp.get("path") or inp.get("first_seen_path", ""),
+                "byte_ranges": inp.get("byte_ranges"),
+            }
+            for inp in inputs
+            if (h := _extract_primary_digest(inp))
+        ]
+        job_dict["_outputs"] = [
+            {
+                "hash": h,
+                "path": out.get("path") or out.get("first_seen_path", ""),
+                "byte_ranges": out.get("byte_ranges"),
+            }
+            for out in outputs
+            if (h := _extract_primary_digest(out))
+        ]
+        return job_dict
diff --git a/roar_inject.pth b/roar_inject.pth
new file mode 100644
index 00000000..3b60c778
--- /dev/null
+++ b/roar_inject.pth
@@ -0,0 +1 @@
+import os; os.environ.get("ROAR_WRAP") == "1" and __import__("importlib").import_module("roar.services.execution.inject.sitecustomize")
diff --git a/rust/crates/tracer-fd/src/lib.rs b/rust/crates/tracer-fd/src/lib.rs
index 945d29eb..e26f9071 100644
--- a/rust/crates/tracer-fd/src/lib.rs
+++ b/rust/crates/tracer-fd/src/lib.rs
@@ -9,6 +9,8 @@ pub struct FdState {
     pub cursor: u64,
     pub was_read: bool,
     pub was_written: bool,
+    pub read_threads: BTreeSet<u32>,
+    pub written_threads: BTreeSet<u32>,
     pub chunks_read: BTreeSet<u64>,
     pub chunks_written: BTreeSet<u64>,
 }
@@ -20,6 +22,8 @@ impl FdState {
             cursor: 0,
             was_read: false,
             was_written: false,
+            read_threads: BTreeSet::new(),
+            written_threads: BTreeSet::new(),
             chunks_read: BTreeSet::new(),
             chunks_written: BTreeSet::new(),
         }
@@ -49,6 +53,8 @@ pub struct FdTracker {
     pub extra_opened_paths: HashSet<String>,
     pub extra_read_paths: HashSet<String>,
     pub extra_written_paths: HashSet<String>,
+    pub extra_read_threads: HashMap<String, BTreeSet<u32>>,
+    pub extra_written_threads: HashMap<String, BTreeSet<u32>>,
 }
 
 impl FdTracker {
@@ -61,6 +67,8 @@ impl FdTracker {
             extra_opened_paths: HashSet::new(),
             extra_read_paths: HashSet::new(),
             extra_written_paths: HashSet::new(),
+            extra_read_threads: HashMap::new(),
+            extra_written_threads: HashMap::new(),
         }
     }
 
@@ -87,17 +95,47 @@ impl FdTracker {
 
     /// Mark that the fd's path was read (without cursor/chunk accounting).
     pub fn mark_read(&mut self, pid: u32, fd: i32) {
+        self.mark_read_internal(pid, fd, None);
+    }
+
+    pub fn mark_read_with_thread(&mut self, pid: u32, fd: i32, thread_id: u32) {
+        self.mark_read_internal(pid, fd, Some(thread_id));
+    }
+
+    fn mark_read_internal(&mut self, pid: u32, fd: i32, thread_id: Option<u32>) {
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_read = true;
             self.extra_read_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.read_threads.insert(thread_id);
+                self.extra_read_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
         }
     }
 
     /// Mark that the fd's path was written (without cursor/chunk accounting).
     pub fn mark_written(&mut self, pid: u32, fd: i32) {
+        self.mark_written_internal(pid, fd, None);
+    }
+
+    pub fn mark_written_with_thread(&mut self, pid: u32, fd: i32, thread_id: u32) {
+        self.mark_written_internal(pid, fd, Some(thread_id));
+    }
+
+    fn mark_written_internal(&mut self, pid: u32, fd: i32, thread_id: Option<u32>) {
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_written = true;
             self.extra_written_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.written_threads.insert(thread_id);
+                self.extra_written_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
         }
     }
 
@@ -109,25 +147,77 @@ impl FdTracker {
     }
 
     pub fn mark_path_read(&mut self, path: String) {
+        self.mark_path_read_internal(path, None);
+    }
+
+    pub fn mark_path_read_with_thread(&mut self, path: String, thread_id: u32) {
+        self.mark_path_read_internal(path, Some(thread_id));
+    }
+
+    fn mark_path_read_internal(&mut self, path: String, thread_id: Option<u32>) {
         if !path.is_empty() {
-            self.extra_read_paths.insert(path);
+            self.extra_read_paths.insert(path.clone());
+            if let Some(thread_id) = thread_id {
+                self.extra_read_threads.entry(path).or_default().insert(thread_id);
+            }
         }
     }
 
     pub fn mark_path_written(&mut self, path: String) {
+        self.mark_path_written_internal(path, None);
+    }
+
+    pub fn mark_path_written_with_thread(&mut self, path: String, thread_id: u32) {
+        self.mark_path_written_internal(path, Some(thread_id));
+    }
+
+    fn mark_path_written_internal(&mut self, path: String, thread_id: Option<u32>) {
         if !path.is_empty() {
-            self.extra_written_paths.insert(path);
+            self.extra_written_paths.insert(path.clone());
+            if let Some(thread_id) = thread_id {
+                self.extra_written_threads
+                    .entry(path)
+                    .or_default()
+                    .insert(thread_id);
+            }
         }
     }
 
     /// Handle sequential read.
     pub fn handle_read(&mut self, pid: u32, fd: i32, bytes: u64) {
+        self.handle_read_internal(pid, fd, bytes, None);
+    }
+
+    pub fn handle_read_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.handle_read_internal(pid, fd, bytes, Some(thread_id));
+    }
+
+    fn handle_read_internal(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        bytes: u64,
+        thread_id: Option<u32>,
+    ) {
         if bytes == 0 {
             return;
         }
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_read = true;
             self.extra_read_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.read_threads.insert(thread_id);
+                self.extra_read_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
             if let Some(chunk_size) = self.chunk_size {
                 mark_chunks(&mut state.chunks_read, state.cursor, bytes, chunk_size);
             }
@@ -137,12 +227,41 @@ impl FdTracker {
 
     /// Handle positional read.
     pub fn handle_pread(&mut self, pid: u32, fd: i32, offset: u64, bytes: u64) {
+        self.handle_pread_internal(pid, fd, offset, bytes, None);
+    }
+
+    pub fn handle_pread_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.handle_pread_internal(pid, fd, offset, bytes, Some(thread_id));
+    }
+
+    fn handle_pread_internal(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: Option<u32>,
+    ) {
         if bytes == 0 {
             return;
         }
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_read = true;
             self.extra_read_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.read_threads.insert(thread_id);
+                self.extra_read_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
             if let Some(chunk_size) = self.chunk_size {
                 mark_chunks(&mut state.chunks_read, offset, bytes, chunk_size);
             }
@@ -151,12 +270,39 @@ impl FdTracker {
 
     /// Handle sequential write.
     pub fn handle_write(&mut self, pid: u32, fd: i32, bytes: u64) {
+        self.handle_write_internal(pid, fd, bytes, None);
+    }
+
+    pub fn handle_write_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.handle_write_internal(pid, fd, bytes, Some(thread_id));
+    }
+
+    fn handle_write_internal(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        bytes: u64,
+        thread_id: Option<u32>,
+    ) {
         if bytes == 0 {
             return;
         }
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_written = true;
             self.extra_written_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.written_threads.insert(thread_id);
+                self.extra_written_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
             if let Some(chunk_size) = self.chunk_size {
                 mark_chunks(&mut state.chunks_written, state.cursor, bytes, chunk_size);
             }
@@ -166,12 +312,41 @@ impl FdTracker {
 
     /// Handle positional write.
     pub fn handle_pwrite(&mut self, pid: u32, fd: i32, offset: u64, bytes: u64) {
+        self.handle_pwrite_internal(pid, fd, offset, bytes, None);
+    }
+
+    pub fn handle_pwrite_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.handle_pwrite_internal(pid, fd, offset, bytes, Some(thread_id));
+    }
+
+    fn handle_pwrite_internal(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: Option<u32>,
+    ) {
         if bytes == 0 {
             return;
         }
         if let Some(state) = self.fd_state.get_mut(&(pid, fd)) {
             state.was_written = true;
             self.extra_written_paths.insert(state.path.clone());
+            if let Some(thread_id) = thread_id {
+                state.written_threads.insert(thread_id);
+                self.extra_written_threads
+                    .entry(state.path.clone())
+                    .or_default()
+                    .insert(thread_id);
+            }
             if let Some(chunk_size) = self.chunk_size {
                 mark_chunks(&mut state.chunks_written, offset, bytes, chunk_size);
             }
@@ -221,8 +396,17 @@ impl FdTracker {
 
     /// Aggregate per-fd state into deterministic per-path report data.
     pub fn build_summary(&self) -> FileSummary {
-        let mut path_map: BTreeMap<String, (bool, bool, BTreeSet<u64>, BTreeSet<u64>)> =
-            BTreeMap::new();
+        let mut path_map: BTreeMap<
+            String,
+            (
+                bool,
+                bool,
+                BTreeSet<u32>,
+                BTreeSet<u32>,
+                BTreeSet<u64>,
+                BTreeSet<u64>,
+            ),
+        > = BTreeMap::new();
 
         for state in self.fd_state.values().chain(self.closed_states.iter()) {
             if state.path.is_empty() {
@@ -230,18 +414,36 @@ impl FdTracker {
             }
             let entry = path_map
                 .entry(state.path.clone())
-                .or_insert_with(|| (false, false, BTreeSet::new(), BTreeSet::new()));
+                .or_insert_with(|| {
+                    (
+                        false,
+                        false,
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                    )
+                });
             entry.0 |= state.was_read;
             entry.1 |= state.was_written;
-            entry.2.extend(&state.chunks_read);
-            entry.3.extend(&state.chunks_written);
+            entry.2.extend(&state.read_threads);
+            entry.3.extend(&state.written_threads);
+            entry.4.extend(&state.chunks_read);
+            entry.5.extend(&state.chunks_written);
         }
 
         for path in &self.extra_opened_paths {
             if !path.is_empty() {
-                path_map
-                    .entry(path.clone())
-                    .or_insert_with(|| (false, false, BTreeSet::new(), BTreeSet::new()));
+                path_map.entry(path.clone()).or_insert_with(|| {
+                    (
+                        false,
+                        false,
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                    )
+                });
             }
         }
         for path in &self.extra_read_paths {
@@ -250,8 +452,20 @@ impl FdTracker {
             }
             let entry = path_map
                 .entry(path.clone())
-                .or_insert_with(|| (false, false, BTreeSet::new(), BTreeSet::new()));
+                .or_insert_with(|| {
+                    (
+                        false,
+                        false,
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                    )
+                });
             entry.0 = true;
+            if let Some(threads) = self.extra_read_threads.get(path) {
+                entry.2.extend(threads);
+            }
         }
         for path in &self.extra_written_paths {
             if path.is_empty() {
@@ -259,13 +473,35 @@ impl FdTracker {
             }
             let entry = path_map
                 .entry(path.clone())
-                .or_insert_with(|| (false, false, BTreeSet::new(), BTreeSet::new()));
+                .or_insert_with(|| {
+                    (
+                        false,
+                        false,
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                        BTreeSet::new(),
+                    )
+                });
             entry.1 = true;
+            if let Some(threads) = self.extra_written_threads.get(path) {
+                entry.3.extend(threads);
+            }
         }
 
         let files: Vec<FileRecord> = path_map
             .into_iter()
-            .map(|(path, (read, written, chunks_r, chunks_w))| {
+            .map(|(path, (read, written, read_threads, written_threads, chunks_r, chunks_w))| {
+                let read_threads = if read_threads.is_empty() {
+                    None
+                } else {
+                    Some(read_threads.into_iter().collect())
+                };
+                let written_threads = if written_threads.is_empty() {
+                    None
+                } else {
+                    Some(written_threads.into_iter().collect())
+                };
                 let chunks_read = if self.chunk_size.is_some() && !chunks_r.is_empty() {
                     Some(chunks_r.into_iter().collect())
                 } else {
@@ -280,6 +516,8 @@ impl FdTracker {
                     path,
                     read,
                     written,
+                    read_threads,
+                    written_threads,
                     chunks_read,
                     chunks_written,
                 }
@@ -346,6 +584,21 @@ mod tests {
         assert!(!tracker.fd_state.contains_key(&(1, 3)));
     }
 
+    #[test]
+    fn test_summary_preserves_thread_ids_per_file_direction() {
+        let mut tracker = FdTracker::new(None);
+        tracker.handle_open(7, 3, "/tmp/threaded.txt".to_string(), 0);
+        tracker.handle_read_with_thread(7, 3, 16, 101);
+        tracker.handle_write_with_thread(7, 3, 8, 202);
+        tracker.handle_close(7, 3);
+
+        let summary = tracker.build_summary();
+        assert_eq!(summary.files.len(), 1);
+        assert_eq!(summary.files[0].path, "/tmp/threaded.txt");
+        assert_eq!(summary.files[0].read_threads, Some(vec![101]));
+        assert_eq!(summary.files[0].written_threads, Some(vec![202]));
+    }
+
     #[test]
     fn test_handle_dup_untracked_source_clears_target_fd() {
         let mut tracker = FdTracker::new(None);
diff --git a/rust/crates/tracer-schema/src/lib.rs b/rust/crates/tracer-schema/src/lib.rs
index f8a3ddca..f4d077ed 100644
--- a/rust/crates/tracer-schema/src/lib.rs
+++ b/rust/crates/tracer-schema/src/lib.rs
@@ -2,6 +2,21 @@ use std::collections::HashMap;
 
 use serde::{Deserialize, Serialize};
 
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(tag = "kind", rename_all = "snake_case")]
+pub enum NativeTraceEvent {
+    Read {
+        pid: u32,
+        thread_id: u32,
+        path: String,
+    },
+    Write {
+        pid: u32,
+        thread_id: u32,
+        path: String,
+    },
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize)]
 pub struct ProcessInfo {
     pub pid: u32,
@@ -16,6 +31,10 @@ pub struct FileRecord {
     pub read: bool,
     pub written: bool,
     #[serde(skip_serializing_if = "Option::is_none")]
+    pub read_threads: Option<Vec<u32>>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub written_threads: Option<Vec<u32>>,
+    #[serde(skip_serializing_if = "Option::is_none")]
     pub chunks_read: Option<Vec<u64>>,
     #[serde(skip_serializing_if = "Option::is_none")]
     pub chunks_written: Option<Vec<u64>>,
diff --git a/rust/tracers/ebpf/common/src/lib.rs b/rust/tracers/ebpf/common/src/lib.rs
index 667fbede..d4788752 100644
--- a/rust/tracers/ebpf/common/src/lib.rs
+++ b/rust/tracers/ebpf/common/src/lib.rs
@@ -34,6 +34,7 @@ pub enum EventType {
 #[repr(C)]
 pub struct SmallEvent {
     pub pid: u32,
+    pub thread_id: u32,
     pub event_type: u16,
     pub _pad: u16,
     pub ret_val: i64,
diff --git a/rust/tracers/ebpf/probe/src/main.rs b/rust/tracers/ebpf/probe/src/main.rs
index 4b5eaef7..817232ce 100644
--- a/rust/tracers/ebpf/probe/src/main.rs
+++ b/rust/tracers/ebpf/probe/src/main.rs
@@ -334,6 +334,7 @@ fn try_sys_exit_rw(ctx: &TracePointContext, event_type: EventType) -> Result<(),
 
     emit_small(&SmallEvent {
         pid,
+        thread_id: bpf_get_current_pid_tgid() as u32,
         event_type: event_type as u16,
         _pad: 0,
         ret_val: ret,
@@ -374,6 +375,7 @@ fn try_sys_exit_close(ctx: &TracePointContext) -> Result<(), i64> {
 
     emit_small(&SmallEvent {
         pid,
+        thread_id: bpf_get_current_pid_tgid() as u32,
         event_type: EventType::Close as u16,
         _pad: 0,
         ret_val: ret,
@@ -432,6 +434,7 @@ fn try_sys_exit_copy_file_range(ctx: &TracePointContext) -> Result<(), i64> {
     // Emit CopyFileRange event: arg0=fd_in, arg1=fd_out, ret_val=bytes
     emit_small(&SmallEvent {
         pid,
+        thread_id: bpf_get_current_pid_tgid() as u32,
         event_type: EventType::CopyFileRange as u16,
         _pad: 0,
         ret_val: ret,
@@ -500,6 +503,7 @@ fn try_sys_exit_dup(ctx: &TracePointContext) -> Result<(), i64> {
     // Emit Dup event: arg0=oldfd, ret_val=newfd
     emit_small(&SmallEvent {
         pid,
+        thread_id: bpf_get_current_pid_tgid() as u32,
         event_type: EventType::Dup as u16,
         _pad: 0,
         ret_val: ret,
diff --git a/rust/tracers/ebpf/userspace/src/daemon.rs b/rust/tracers/ebpf/userspace/src/daemon.rs
index 942b00e2..bded38d4 100644
--- a/rust/tracers/ebpf/userspace/src/daemon.rs
+++ b/rust/tracers/ebpf/userspace/src/daemon.rs
@@ -469,6 +469,7 @@ mod tests {
         // Simulate a write event for pid=100 (run 1) using raw bytes
         let event = roar_ebpf_common::SmallEvent {
             pid: 100,
+            thread_id: 100,
             event_type: roar_ebpf_common::EventType::Write as u16,
             _pad: 0,
             ret_val: 512,
diff --git a/rust/tracers/ebpf/userspace/src/events.rs b/rust/tracers/ebpf/userspace/src/events.rs
index 34148cf8..40e34d46 100644
--- a/rust/tracers/ebpf/userspace/src/events.rs
+++ b/rust/tracers/ebpf/userspace/src/events.rs
@@ -67,6 +67,7 @@ pub fn process_event(state: &mut TracerState, data: &[u8]) {
 
 fn process_small_event(state: &mut TracerState, event: &SmallEvent) {
     let pid = event.pid;
+    let thread_id = event.thread_id;
     let fd = event.arg0 as i32;
 
     let Some(etype) = event_type_from_u16(event.event_type) else {
@@ -77,22 +78,28 @@ fn process_small_event(state: &mut TracerState, event: &SmallEvent) {
     match etype {
         EventType::Read => {
             if event.ret_val > 0 {
-                state.handle_read(pid, fd, event.ret_val as u64);
+                state.handle_read_with_thread(pid, fd, event.ret_val as u64, thread_id);
             }
         }
         EventType::Write => {
             if event.ret_val > 0 {
-                state.handle_write(pid, fd, event.ret_val as u64);
+                state.handle_write_with_thread(pid, fd, event.ret_val as u64, thread_id);
             }
         }
         EventType::PRead => {
             if event.ret_val > 0 {
-                state.handle_pread(pid, fd, event.arg1, event.ret_val as u64);
+                state.handle_pread_with_thread(pid, fd, event.arg1, event.ret_val as u64, thread_id);
             }
         }
         EventType::PWrite => {
             if event.ret_val > 0 {
-                state.handle_pwrite(pid, fd, event.arg1, event.ret_val as u64);
+                state.handle_pwrite_with_thread(
+                    pid,
+                    fd,
+                    event.arg1,
+                    event.ret_val as u64,
+                    thread_id,
+                );
             }
         }
         EventType::Close => {
@@ -114,12 +121,12 @@ fn process_small_event(state: &mut TracerState, event: &SmallEvent) {
         EventType::MmapRead => {
             let length = event.ret_val as u64;
             let offset = event.arg1;
-            state.handle_pread(pid, fd, offset, length);
+            state.handle_pread_with_thread(pid, fd, offset, length, thread_id);
         }
         EventType::MmapWrite => {
             let length = event.ret_val as u64;
             let offset = event.arg1;
-            state.handle_pwrite(pid, fd, offset, length);
+            state.handle_pwrite_with_thread(pid, fd, offset, length, thread_id);
         }
         EventType::Sendfile => {
             // arg0 = in_fd, arg1 = out_fd, ret_val = bytes
@@ -127,8 +134,8 @@ fn process_small_event(state: &mut TracerState, event: &SmallEvent) {
                 let in_fd = event.arg0 as i32;
                 let out_fd = event.arg1 as i32;
                 let bytes = event.ret_val as u64;
-                state.handle_read(pid, in_fd, bytes);
-                state.handle_write(pid, out_fd, bytes);
+                state.handle_read_with_thread(pid, in_fd, bytes, thread_id);
+                state.handle_write_with_thread(pid, out_fd, bytes, thread_id);
             }
         }
         EventType::CopyFileRange => {
@@ -139,8 +146,8 @@ fn process_small_event(state: &mut TracerState, event: &SmallEvent) {
                 let in_fd = event.arg0 as i32;
                 let out_fd = event.arg1 as i32;
                 let bytes = event.ret_val as u64;
-                state.handle_read(pid, in_fd, bytes);
-                state.handle_write(pid, out_fd, bytes);
+                state.handle_read_with_thread(pid, in_fd, bytes, thread_id);
+                state.handle_write_with_thread(pid, out_fd, bytes, thread_id);
             }
         }
         _ => {
@@ -245,6 +252,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Read as u16,
             _pad: 0,
             ret_val: 100,
@@ -259,6 +267,28 @@ mod tests {
         assert_eq!(fd_state.cursor, 100);
     }
 
+    #[test]
+    fn test_process_small_event_tracks_thread_ids_in_summary() {
+        let mut state = TracerState::new(None);
+        state.handle_open(1, 3, "/tmp/test.txt".to_string(), 0);
+
+        let event = SmallEvent {
+            pid: 1,
+            thread_id: 77,
+            event_type: EventType::Write as u16,
+            _pad: 0,
+            ret_val: 12,
+            arg0: 3,
+            arg1: 0,
+        };
+
+        process_small_event(&mut state, &event);
+
+        let report = state.build_report();
+        assert_eq!(report.files.len(), 1);
+        assert_eq!(report.files[0].written_threads, Some(vec![77]));
+    }
+
     #[test]
     fn test_process_small_event_write_zero_bytes_ignored() {
         let mut state = TracerState::new(None);
@@ -266,6 +296,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Write as u16,
             _pad: 0,
             ret_val: 0, // zero bytes
@@ -286,6 +317,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Write as u16,
             _pad: 0,
             ret_val: -1, // error
@@ -355,6 +387,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Read as u16,
             _pad: 0,
             ret_val: 42,
@@ -396,6 +429,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Sendfile as u16,
             _pad: 0,
             ret_val: 1024,
@@ -416,6 +450,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Dup as u16,
             _pad: 0,
             ret_val: 7, // new_fd
@@ -438,6 +473,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Close as u16,
             _pad: 0,
             ret_val: 0,
@@ -457,6 +493,7 @@ mod tests {
 
         let event = SmallEvent {
             pid: 1,
+            thread_id: 11,
             event_type: EventType::Lseek as u16,
             _pad: 0,
             ret_val: 4096, // new offset
diff --git a/rust/tracers/ebpf/userspace/src/state.rs b/rust/tracers/ebpf/userspace/src/state.rs
index 2d8b1884..4c392b25 100644
--- a/rust/tracers/ebpf/userspace/src/state.rs
+++ b/rust/tracers/ebpf/userspace/src/state.rs
@@ -61,18 +61,50 @@ impl TracerState {
         self.fd.handle_read(pid, fd, bytes);
     }
 
+    pub fn handle_read_with_thread(&mut self, pid: u32, fd: i32, bytes: u64, thread_id: u32) {
+        self.fd.handle_read_with_thread(pid, fd, bytes, thread_id);
+    }
+
     pub fn handle_pread(&mut self, pid: u32, fd: i32, offset: u64, bytes: u64) {
         self.fd.handle_pread(pid, fd, offset, bytes);
     }
 
+    pub fn handle_pread_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.fd
+            .handle_pread_with_thread(pid, fd, offset, bytes, thread_id);
+    }
+
     pub fn handle_write(&mut self, pid: u32, fd: i32, bytes: u64) {
         self.fd.handle_write(pid, fd, bytes);
     }
 
+    pub fn handle_write_with_thread(&mut self, pid: u32, fd: i32, bytes: u64, thread_id: u32) {
+        self.fd.handle_write_with_thread(pid, fd, bytes, thread_id);
+    }
+
     pub fn handle_pwrite(&mut self, pid: u32, fd: i32, offset: u64, bytes: u64) {
         self.fd.handle_pwrite(pid, fd, offset, bytes);
     }
 
+    pub fn handle_pwrite_with_thread(
+        &mut self,
+        pid: u32,
+        fd: i32,
+        offset: u64,
+        bytes: u64,
+        thread_id: u32,
+    ) {
+        self.fd
+            .handle_pwrite_with_thread(pid, fd, offset, bytes, thread_id);
+    }
+
     pub fn handle_lseek(&mut self, pid: u32, fd: i32, new_offset: u64) {
         self.fd.handle_lseek(pid, fd, new_offset);
     }
@@ -108,6 +140,10 @@ impl TracerState {
         self.fd.mark_path_written(path);
     }
 
+    pub fn mark_path_written_with_thread(&mut self, path: String, thread_id: u32) {
+        self.fd.mark_path_written_with_thread(path, thread_id);
+    }
+
     // -- Report generation ------------------------------------------------
 
     pub fn build_report(&self) -> TracerOutput {
diff --git a/rust/tracers/preload/build.rs b/rust/tracers/preload/build.rs
index 95eb4ad1..e84847e7 100644
--- a/rust/tracers/preload/build.rs
+++ b/rust/tracers/preload/build.rs
@@ -1,9 +1,7 @@
 fn main() {
-    if std::env::var("CARGO_CFG_TARGET_OS").as_deref() == Ok("macos") {
-        println!("cargo:rerun-if-changed=src/interpose.c");
-        cc::Build::new()
-            .file("src/interpose.c")
-            .warnings(false)
-            .compile("roar_preload_interpose");
-    }
+    println!("cargo:rerun-if-changed=src/interpose.c");
+    cc::Build::new()
+        .file("src/interpose.c")
+        .warnings(false)
+        .compile("roar_preload_interpose");
 }
diff --git a/rust/tracers/preload/src/interpose.c b/rust/tracers/preload/src/interpose.c
index a9fab40c..e5d6ce25 100644
--- a/rust/tracers/preload/src/interpose.c
+++ b/rust/tracers/preload/src/interpose.c
@@ -50,3 +50,141 @@ DYLD_INTERPOSE(roar_interpose_ftruncate, ftruncate);
 int roar_preload_interpose_keep(void) { return 0; }
 
 #endif
+
+#ifndef __APPLE__
+#define _GNU_SOURCE
+
+#include <dlfcn.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stddef.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+extern void roar_preload_emit_path_flags(const char *path, int flags);
+extern void roar_preload_emit_at_path_flags(int dirfd, const char *path, int flags);
+
+static int (*resolve_open_symbol(const char *name))(const char *, int, ...) {
+  return (int (*)(const char *, int, ...))dlsym(RTLD_NEXT, name);
+}
+
+static int (*resolve_openat_symbol(const char *name))(int, const char *, int, ...) {
+  return (int (*)(int, const char *, int, ...))dlsym(RTLD_NEXT, name);
+}
+
+int open(const char *path, int flags, ...) {
+  static int (*real_open)(const char *, int, ...) = NULL;
+  if (real_open == NULL) {
+    real_open = resolve_open_symbol("open");
+  }
+  if (real_open == NULL) {
+    return -1;
+  }
+
+  mode_t mode = 0;
+  int has_mode = (flags & O_CREAT) || (flags & O_TMPFILE);
+  if (has_mode) {
+    va_list args;
+    va_start(args, flags);
+    mode = va_arg(args, int);
+    va_end(args);
+  }
+
+  int ret = has_mode ? real_open(path, flags, mode) : real_open(path, flags);
+  if (ret >= 0) {
+    roar_preload_emit_path_flags(path, flags);
+  }
+  return ret;
+}
+
+int open64(const char *path, int flags, ...) {
+  static int (*real_open64)(const char *, int, ...) = NULL;
+  if (real_open64 == NULL) {
+    real_open64 = resolve_open_symbol("open64");
+  }
+  if (real_open64 == NULL) {
+    return open(path, flags);
+  }
+
+  mode_t mode = 0;
+  int has_mode = (flags & O_CREAT) || (flags & O_TMPFILE);
+  if (has_mode) {
+    va_list args;
+    va_start(args, flags);
+    mode = va_arg(args, int);
+    va_end(args);
+  }
+
+  int ret = has_mode ? real_open64(path, flags, mode) : real_open64(path, flags);
+  if (ret >= 0) {
+    roar_preload_emit_path_flags(path, flags);
+  }
+  return ret;
+}
+
+int openat(int dirfd, const char *path, int flags, ...) {
+  static int (*real_openat)(int, const char *, int, ...) = NULL;
+  if (real_openat == NULL) {
+    real_openat = resolve_openat_symbol("openat");
+  }
+  if (real_openat == NULL) {
+    return -1;
+  }
+
+  mode_t mode = 0;
+  int has_mode = (flags & O_CREAT) || (flags & O_TMPFILE);
+  if (has_mode) {
+    va_list args;
+    va_start(args, flags);
+    mode = va_arg(args, int);
+    va_end(args);
+  }
+
+  int ret = has_mode ? real_openat(dirfd, path, flags, mode) : real_openat(dirfd, path, flags);
+  if (ret >= 0) {
+    roar_preload_emit_at_path_flags(dirfd, path, flags);
+  }
+  return ret;
+}
+
+int openat64(int dirfd, const char *path, int flags, ...) {
+  static int (*real_openat64)(int, const char *, int, ...) = NULL;
+  if (real_openat64 == NULL) {
+    real_openat64 = resolve_openat_symbol("openat64");
+  }
+  if (real_openat64 == NULL) {
+    return openat(dirfd, path, flags);
+  }
+
+  mode_t mode = 0;
+  int has_mode = (flags & O_CREAT) || (flags & O_TMPFILE);
+  if (has_mode) {
+    va_list args;
+    va_start(args, flags);
+    mode = va_arg(args, int);
+    va_end(args);
+  }
+
+  int ret = has_mode ? real_openat64(dirfd, path, flags, mode) : real_openat64(dirfd, path, flags);
+  if (ret >= 0) {
+    roar_preload_emit_at_path_flags(dirfd, path, flags);
+  }
+  return ret;
+}
+
+int creat(const char *path, mode_t mode) {
+  static int (*real_creat)(const char *, mode_t) = NULL;
+  if (real_creat == NULL) {
+    real_creat = (int (*)(const char *, mode_t))dlsym(RTLD_NEXT, "creat");
+  }
+  if (real_creat == NULL) {
+    return -1;
+  }
+
+  int ret = real_creat(path, mode);
+  if (ret >= 0) {
+    roar_preload_emit_path_flags(path, O_WRONLY | O_CREAT | O_TRUNC);
+  }
+  return ret;
+}
+#endif
diff --git a/rust/tracers/preload/src/ipc.rs b/rust/tracers/preload/src/ipc.rs
index 895c9409..8b39deb1 100644
--- a/rust/tracers/preload/src/ipc.rs
+++ b/rust/tracers/preload/src/ipc.rs
@@ -1,8 +1,20 @@
-use serde::{Deserialize, Serialize};
+pub use tracer_schema::NativeTraceEvent as TraceEvent;
 
-#[derive(Debug, Clone, Serialize, Deserialize)]
-#[serde(tag = "kind", rename_all = "snake_case")]
-pub enum TraceEvent {
-    Read { pid: u32, path: String },
-    Write { pid: u32, path: String },
+#[cfg(test)]
+mod tests {
+    use super::TraceEvent;
+
+    #[test]
+    fn trace_event_round_trip_preserves_thread_id() {
+        let event = TraceEvent::Write {
+            pid: 42,
+            thread_id: 99,
+            path: "/tmp/native-thread.txt".to_string(),
+        };
+
+        let payload = rmp_serde::to_vec_named(&event).expect("serialize trace event");
+        let decoded: TraceEvent = rmp_serde::from_slice(&payload).expect("decode trace event");
+
+        assert_eq!(decoded, event);
+    }
 }
diff --git a/rust/tracers/preload/src/lib.rs b/rust/tracers/preload/src/lib.rs
index 1f36f049..2156631a 100644
--- a/rust/tracers/preload/src/lib.rs
+++ b/rust/tracers/preload/src/lib.rs
@@ -27,7 +27,7 @@ static TRACE_SOCK_PATH: OnceLock<Option<String>> = OnceLock::new();
 static REAL_WRITE: OnceLock<Option<WriteFn>> = OnceLock::new();
 
 #[cfg(target_os = "macos")]
-extern "C" {
+unsafe extern "C" {
     fn roar_preload_interpose_keep() -> c_int;
 }
 
@@ -466,6 +466,47 @@ fn get_real_write() -> Option<WriteFn> {
     *REAL_WRITE.get_or_init(|| unsafe { resolve_symbol::<WriteFn>(b"write\0") })
 }
 
+fn write_frame(fd: c_int, frame: &[u8]) -> bool {
+    let mut written = 0usize;
+    while written < frame.len() {
+        let rc: ssize_t;
+        unsafe {
+            #[cfg(target_os = "macos")]
+            {
+                rc = sys_write(
+                    fd,
+                    frame[written..].as_ptr() as *const c_void,
+                    frame.len() - written,
+                );
+            }
+            #[cfg(not(target_os = "macos"))]
+            {
+                let Some(real_write) = get_real_write() else {
+                    return false;
+                };
+                rc = real_write(
+                    fd,
+                    frame[written..].as_ptr() as *const c_void,
+                    frame.len() - written,
+                );
+            }
+        }
+
+        if rc < 0 {
+            let err = get_errno();
+            if err == libc::EINTR {
+                continue;
+            }
+            return false;
+        }
+        if rc == 0 {
+            return false;
+        }
+        written += rc as usize;
+    }
+    true
+}
+
 fn send_event(event: &TraceEvent) {
     let Some(fd) = get_trace_fd() else {
         return;
@@ -478,42 +519,12 @@ fn send_event(event: &TraceEvent) {
     frame.extend_from_slice(&len.to_le_bytes());
     frame.extend_from_slice(&payload);
 
-    let rc: ssize_t;
-    unsafe {
-        #[cfg(target_os = "macos")]
-        {
-            rc = sys_write(fd, frame.as_ptr() as *const c_void, frame.len());
-        }
-        #[cfg(not(target_os = "macos"))]
-        {
-            let Some(real_write) = get_real_write() else {
-                return;
-            };
-            rc = real_write(fd, frame.as_ptr() as *const c_void, frame.len());
-        }
-    }
-
-    if rc < 0 {
+    if !write_frame(fd, &frame) {
         let err = get_errno();
         if err == libc::EBADF || err == libc::EPIPE || err == libc::ENOTCONN {
             invalidate_trace_fd();
             if let Some(new_fd) = get_trace_fd() {
-                unsafe {
-                    #[cfg(target_os = "macos")]
-                    {
-                        let _ = sys_write(new_fd, frame.as_ptr() as *const c_void, frame.len());
-                    }
-                    #[cfg(not(target_os = "macos"))]
-                    {
-                        if let Some(real_write) = get_real_write() {
-                            let _ = real_write(
-                                new_fd,
-                                frame.as_ptr() as *const c_void,
-                                frame.len(),
-                            );
-                        }
-                    }
-                }
+                let _ = write_frame(new_fd, &frame);
             }
         }
     }
@@ -524,6 +535,24 @@ fn current_pid() -> u32 {
     unsafe { libc::getpid() as u32 }
 }
 
+#[cfg(target_os = "macos")]
+fn current_thread_id() -> u32 {
+    let mut thread_id = 0u64;
+    // SAFETY: pthread_threadid_np writes the current thread id into the provided pointer.
+    let rc = unsafe { libc::pthread_threadid_np(0, &mut thread_id) };
+    if rc == 0 {
+        thread_id as u32
+    } else {
+        current_pid()
+    }
+}
+
+#[cfg(not(target_os = "macos"))]
+fn current_thread_id() -> u32 {
+    // SAFETY: syscall(SYS_gettid) has no preconditions on Linux.
+    unsafe { libc::syscall(libc::SYS_gettid) as u32 }
+}
+
 fn fd_path(fd: c_int) -> Option<String> {
     #[cfg(target_os = "macos")]
     {
@@ -604,6 +633,7 @@ fn emit_fd_read(fd: c_int) {
     };
     send_event(&TraceEvent::Read {
         pid: current_pid(),
+        thread_id: current_thread_id(),
         path,
     });
 }
@@ -614,6 +644,7 @@ fn emit_fd_write(fd: c_int) {
     };
     send_event(&TraceEvent::Write {
         pid: current_pid(),
+        thread_id: current_thread_id(),
         path,
     });
 }
@@ -624,6 +655,7 @@ fn emit_path_write(path: String) {
     }
     send_event(&TraceEvent::Write {
         pid: current_pid(),
+        thread_id: current_thread_id(),
         path,
     });
 }
@@ -636,6 +668,27 @@ fn mode_implies_write(mode: &str) -> bool {
     mode.contains('w') || mode.contains('a') || mode.contains('x') || mode.contains('+')
 }
 
+#[cfg(target_os = "macos")]
+const O_TMPFILE_FLAG: c_int = 0;
+
+#[cfg(not(target_os = "macos"))]
+const O_TMPFILE_FLAG: c_int = libc::O_TMPFILE;
+
+fn flags_imply_read(flags: c_int) -> bool {
+    let access_mode = flags & libc::O_ACCMODE;
+    access_mode == libc::O_RDONLY || access_mode == libc::O_RDWR
+}
+
+fn flags_imply_write(flags: c_int) -> bool {
+    let access_mode = flags & libc::O_ACCMODE;
+    access_mode == libc::O_WRONLY
+        || access_mode == libc::O_RDWR
+        || (flags & libc::O_CREAT) != 0
+        || (flags & libc::O_TRUNC) != 0
+        || (flags & libc::O_APPEND) != 0
+        || (flags & O_TMPFILE_FLAG) != 0
+}
+
 fn emit_path_mode(path: String, mode: &str) {
     if path.is_empty() {
         return;
@@ -643,12 +696,34 @@ fn emit_path_mode(path: String, mode: &str) {
     if mode_implies_read(mode) {
         send_event(&TraceEvent::Read {
             pid: current_pid(),
+            thread_id: current_thread_id(),
             path: path.clone(),
         });
     }
     if mode_implies_write(mode) {
         send_event(&TraceEvent::Write {
             pid: current_pid(),
+            thread_id: current_thread_id(),
+            path,
+        });
+    }
+}
+
+fn emit_path_flags(path: String, flags: c_int) {
+    if path.is_empty() {
+        return;
+    }
+    if flags_imply_read(flags) {
+        send_event(&TraceEvent::Read {
+            pid: current_pid(),
+            thread_id: current_thread_id(),
+            path: path.clone(),
+        });
+    }
+    if flags_imply_write(flags) {
+        send_event(&TraceEvent::Write {
+            pid: current_pid(),
+            thread_id: current_thread_id(),
             path,
         });
     }
@@ -669,6 +744,34 @@ fn resolve_at_path(dirfd: c_int, path: *const c_char) -> Option<String> {
     Some(format!("{base}/{path_s}"))
 }
 
+#[cfg_attr(not(target_os = "macos"), no_mangle)]
+pub unsafe extern "C" fn roar_preload_emit_path_flags(path: *const c_char, flags: c_int) {
+    if in_hook() {
+        return;
+    }
+    with_hook_guard(|| {
+        if let Some(path_s) = c_str_to_owned(path) {
+            emit_path_flags(path_s, flags);
+        }
+    });
+}
+
+#[cfg_attr(not(target_os = "macos"), no_mangle)]
+pub unsafe extern "C" fn roar_preload_emit_at_path_flags(
+    dirfd: c_int,
+    path: *const c_char,
+    flags: c_int,
+) {
+    if in_hook() {
+        return;
+    }
+    with_hook_guard(|| {
+        if let Some(path_s) = resolve_at_path(dirfd, path) {
+            emit_path_flags(path_s, flags);
+        }
+    });
+}
+
 unsafe fn resolve_symbol<T: Copy>(symbol: &[u8]) -> Option<T> {
     let ptr = libc::dlsym(libc::RTLD_NEXT, symbol.as_ptr() as *const c_char);
     if ptr.is_null() {
diff --git a/rust/tracers/preload/src/main.rs b/rust/tracers/preload/src/main.rs
index 26614ba4..1db12ce0 100644
--- a/rust/tracers/preload/src/main.rs
+++ b/rust/tracers/preload/src/main.rs
@@ -54,21 +54,29 @@ impl CollectorState {
 
     fn ingest(&mut self, event: TraceEvent) {
         match event {
-            TraceEvent::Read { pid, path } => {
+            TraceEvent::Read {
+                pid,
+                thread_id,
+                path,
+            } => {
                 if path.is_empty() {
                     return;
                 }
                 self.ensure_process(pid);
                 self.fd.mark_path_open(path.clone());
-                self.fd.mark_path_read(path);
+                self.fd.mark_path_read_with_thread(path, thread_id);
             }
-            TraceEvent::Write { pid, path } => {
+            TraceEvent::Write {
+                pid,
+                thread_id,
+                path,
+            } => {
                 if path.is_empty() {
                     return;
                 }
                 self.ensure_process(pid);
                 self.fd.mark_path_open(path.clone());
-                self.fd.mark_path_written(path);
+                self.fd.mark_path_written_with_thread(path, thread_id);
             }
         }
     }
diff --git a/rust/tracers/preload/tests/comprehensive.rs b/rust/tracers/preload/tests/comprehensive.rs
index 597c8728..6bd4ed4b 100644
--- a/rust/tracers/preload/tests/comprehensive.rs
+++ b/rust/tracers/preload/tests/comprehensive.rs
@@ -56,6 +56,7 @@ fn cargo_bin(name: &str) -> String {
 }
 
 fn preload_lib() -> String {
+    let mut candidates: Vec<PathBuf> = Vec::new();
     for debug_dir in target_debug_dirs() {
         for name in [
             "libroar_tracer_preload.dylib",
@@ -65,7 +66,7 @@ fn preload_lib() -> String {
         ] {
             let path = debug_dir.join(name);
             if path.exists() {
-                return path.to_string_lossy().into_owned();
+                candidates.push(path);
             }
         }
         let deps_dir = debug_dir.join("deps");
@@ -79,11 +80,21 @@ fn preload_lib() -> String {
                     || name.starts_with("libroar-tracer-preload"))
                     && (name.ends_with(".dylib") || name.ends_with(".so"));
                 if is_match {
-                    return path.to_string_lossy().into_owned();
+                    candidates.push(path);
                 }
             }
         }
     }
+
+    candidates.sort_by_key(|path| {
+        fs::metadata(path)
+            .and_then(|meta| meta.modified())
+            .ok()
+    });
+    if let Some(path) = candidates.pop() {
+        return path.to_string_lossy().into_owned();
+    }
+
     panic!("preload interposer library not found");
 }
 
diff --git a/rust/tracers/preload/tests/standalone.rs b/rust/tracers/preload/tests/standalone.rs
index 4810b7c5..27f75d1d 100644
--- a/rust/tracers/preload/tests/standalone.rs
+++ b/rust/tracers/preload/tests/standalone.rs
@@ -92,6 +92,7 @@ fn target_debug_dirs() -> Vec<PathBuf> {
 }
 
 fn preload_lib() -> String {
+    let mut candidates: Vec<PathBuf> = Vec::new();
     for debug_dir in target_debug_dirs() {
         let direct = [
             debug_dir.join("libroar_tracer_preload.dylib"),
@@ -101,7 +102,7 @@ fn preload_lib() -> String {
         ];
         for path in direct {
             if path.exists() {
-                return path.to_string_lossy().into_owned();
+                candidates.push(path);
             }
         }
 
@@ -117,13 +118,22 @@ fn preload_lib() -> String {
                         || name.starts_with("libroar-tracer-preload"))
                         && (name.ends_with(".dylib") || name.ends_with(".so"));
                     if is_match {
-                        return path.to_string_lossy().into_owned();
+                        candidates.push(path);
                     }
                 }
             }
         }
     }
 
+    candidates.sort_by_key(|path| {
+        std::fs::metadata(path)
+            .and_then(|meta| meta.modified())
+            .ok()
+    });
+    if let Some(path) = candidates.pop() {
+        return path.to_string_lossy().into_owned();
+    }
+
     panic!("preload interposer library not found in target/debug");
 }
 
diff --git a/rust/tracers/ptrace/src/main.rs b/rust/tracers/ptrace/src/main.rs
index f68d9ef0..1008169c 100644
--- a/rust/tracers/ptrace/src/main.rs
+++ b/rust/tracers/ptrace/src/main.rs
@@ -70,7 +70,7 @@ struct TracerState {
     fd_tracker: FdTracker,
     awaiting_exit: HashSet<i32>, // PIDs waiting for syscall exit stop
     pending_opens: HashMap<i32, (String, u64)>, // pid -> (path, flags)
-    pending_writes: HashMap<i32, String>, // pid -> path (write syscalls pending confirmation)
+    pending_writes: HashMap<i32, (String, u32)>, // tid -> (path, thread_id)
     pending_closes: HashMap<i32, i32>, // pid -> fd (close syscalls pending confirmation)
     pending_chdirs: HashMap<i32, ()>, // pid -> () (chdir pending confirmation)
     pending_fchdirs: HashMap<i32, ()>, // pid -> () (fchdir pending confirmation)
@@ -217,7 +217,7 @@ fn handle_syscall_entry(
             // All read variants have fd in rdi
             let fd = regs.rdi as i32;
             if let Some(pid_u32) = pid_u32 {
-                state.fd_tracker.mark_read(pid_u32, fd);
+                state.fd_tracker.mark_read_with_thread(pid_u32, fd, pid_u32);
             }
         }
         SYS_WRITE | SYS_PWRITE64 | SYS_WRITEV | SYS_PWRITEV | SYS_PWRITEV2 => {
@@ -226,7 +226,7 @@ fn handle_syscall_entry(
             let fd = regs.rdi as i32;
             if let Some(pid_u32) = pid_u32 {
                 if let Some(path) = state.fd_tracker.path_for_fd(pid_u32, fd).cloned() {
-                    state.pending_writes.insert(pid_raw, path);
+                    state.pending_writes.insert(pid_raw, (path, pid_u32));
                 }
             }
         }
@@ -235,10 +235,10 @@ fn handle_syscall_entry(
             let out_fd = regs.rdi as i32;
             let in_fd = regs.rsi as i32;
             if let Some(pid_u32) = pid_u32 {
-                state.fd_tracker.mark_read(pid_u32, in_fd);
+                state.fd_tracker.mark_read_with_thread(pid_u32, in_fd, pid_u32);
                 // Track write as pending - confirm at exit if bytes > 0
                 if let Some(path) = state.fd_tracker.path_for_fd(pid_u32, out_fd).cloned() {
-                    state.pending_writes.insert(pid_raw, path);
+                    state.pending_writes.insert(pid_raw, (path, pid_u32));
                 }
             }
         }
@@ -247,10 +247,10 @@ fn handle_syscall_entry(
             let in_fd = regs.rdi as i32;
             let out_fd = regs.r8 as i32;
             if let Some(pid_u32) = pid_u32 {
-                state.fd_tracker.mark_read(pid_u32, in_fd);
+                state.fd_tracker.mark_read_with_thread(pid_u32, in_fd, pid_u32);
                 // Track write as pending - confirm at exit if bytes > 0
                 if let Some(path) = state.fd_tracker.path_for_fd(pid_u32, out_fd).cloned() {
-                    state.pending_writes.insert(pid_raw, path);
+                    state.pending_writes.insert(pid_raw, (path, pid_u32));
                 }
             }
         }
@@ -272,12 +272,12 @@ fn handle_syscall_entry(
 
                         // Any file-backed mmap is a read
                         if prot & 1 != 0 {
-                            state.fd_tracker.mark_path_read(path.clone());
+                            state.fd_tracker.mark_path_read_with_thread(path.clone(), pid_u32);
                         }
                         // Only MAP_SHARED + PROT_WRITE is a real write (changes go to disk)
                         // MAP_PRIVATE writes are copy-on-write and don't modify the file
                         if is_shared && (prot & 2 != 0) {
-                            state.fd_tracker.mark_path_written(path);
+                            state.fd_tracker.mark_path_written_with_thread(path, pid_u32);
                         }
                     }
                 }
@@ -288,7 +288,11 @@ fn handle_syscall_entry(
             // The destination (newpath) is effectively written
             if let Some(newpath) = read_string_from_tracee(pid, regs.rsi) {
                 let abs_path = resolve_path(&newpath, pid_raw, &mut state.cwd_cache);
-                state.fd_tracker.mark_path_written(abs_path);
+                if let Some(pid_u32) = pid_u32 {
+                    state.fd_tracker.mark_path_written_with_thread(abs_path, pid_u32);
+                } else {
+                    state.fd_tracker.mark_path_written(abs_path);
+                }
             }
         }
         SYS_RENAMEAT | SYS_RENAMEAT2 => {
@@ -296,7 +300,11 @@ fn handle_syscall_entry(
             // The destination (newpath) is effectively written
             if let Some(newpath) = read_string_from_tracee(pid, regs.r10) {
                 let abs_path = resolve_path(&newpath, pid_raw, &mut state.cwd_cache);
-                state.fd_tracker.mark_path_written(abs_path);
+                if let Some(pid_u32) = pid_u32 {
+                    state.fd_tracker.mark_path_written_with_thread(abs_path, pid_u32);
+                } else {
+                    state.fd_tracker.mark_path_written(abs_path);
+                }
             }
         }
         SYS_CHDIR => {
@@ -344,9 +352,9 @@ fn handle_syscall_exit(
         SYS_WRITE | SYS_PWRITE64 | SYS_WRITEV | SYS_PWRITEV | SYS_PWRITEV2 | SYS_SENDFILE
         | SYS_COPY_FILE_RANGE => {
             // Only count as written if bytes were actually written (ret_val > 0)
-            if let Some(path) = state.pending_writes.remove(&pid_raw) {
+            if let Some((path, thread_id)) = state.pending_writes.remove(&pid_raw) {
                 if ret_val > 0 {
-                    state.fd_tracker.mark_path_written(path);
+                    state.fd_tracker.mark_path_written_with_thread(path, thread_id);
                 }
             }
         }
diff --git a/scripts/build_wheel_with_bins.sh b/scripts/build_wheel_with_bins.sh
new file mode 100755
index 00000000..54d5531a
--- /dev/null
+++ b/scripts/build_wheel_with_bins.sh
@@ -0,0 +1,201 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ROOT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
+OUT_DIR="${1:-$ROOT_DIR/dist}"
+RUST_MANIFEST="$ROOT_DIR/rust/Cargo.toml"
+BIN_DIR="$ROOT_DIR/roar/bin"
+HOST_RELEASE_DIR="$ROOT_DIR/rust/target/release"
+LINUX_PORTABLE_RELEASE_DIR="$ROOT_DIR/rust/target/x86_64-unknown-linux-gnu/release"
+LINUX_GLIBC_FLOOR="2.17"
+
+mkdir -p "$OUT_DIR" "$BIN_DIR"
+
+echo "▶ Syncing packaged Rust artifacts into roar/bin..."
+python3 "$ROOT_DIR/scripts/sync_packaged_rust_artifacts.py"
+
+declare -a packages_to_build=()
+declare -A binaries_to_sync=()
+sync_preload_lib=0
+build_output_dir="$HOST_RELEASE_DIR"
+
+queue_package() {
+  local package="$1"
+  local existing
+  for existing in "${packages_to_build[@]:-}"; do
+    if [[ "$existing" == "$package" ]]; then
+      return
+    fi
+  done
+  packages_to_build+=("$package")
+}
+
+max_glibc_version() {
+  local path="$1"
+  if [[ ! -f "$path" ]] || ! command -v objdump >/dev/null 2>&1; then
+    return 0
+  fi
+
+  objdump -p "$path" \
+    | awk '/GLIBC_/ {print $NF}' \
+    | sed -E 's/.*GLIBC_([0-9]+\.[0-9]+).*/\1/' \
+    | sort -V \
+    | tail -n 1
+}
+
+needs_portable_linux_rebuild() {
+  local path="$1"
+  if [[ "$(uname -s)" != "Linux" ]] || [[ ! -f "$path" ]]; then
+    return 1
+  fi
+
+  local max_glibc
+  max_glibc="$(max_glibc_version "$path")"
+  if [[ -z "$max_glibc" ]]; then
+    return 1
+  fi
+
+  if [[ "$(printf '%s\n%s\n' "$LINUX_GLIBC_FLOOR" "$max_glibc" | sort -V | tail -n 1)" != "$LINUX_GLIBC_FLOOR" ]]; then
+    echo "▶ Rebuilding $(basename "$path") for glibc portability (found GLIBC_${max_glibc}, need <= GLIBC_${LINUX_GLIBC_FLOOR})"
+    return 0
+  fi
+
+  return 1
+}
+
+ensure_binary() {
+  local package="$1"
+  local binary="$2"
+  local dst="$BIN_DIR/$binary"
+  if [[ ! -f "$dst" ]] || needs_portable_linux_rebuild "$dst"; then
+    queue_package "$package"
+    binaries_to_sync["$binary"]=1
+  fi
+}
+
+ensure_preload_library() {
+  local found_any=0
+  local needs_rebuild=0
+  local library
+  for library in \
+    libroar_tracer_preload.so \
+    libroar-tracer-preload.so \
+    libroar_tracer_preload.dylib \
+    libroar-tracer-preload.dylib
+  do
+    local dst="$BIN_DIR/$library"
+    if [[ -f "$dst" ]]; then
+      found_any=1
+      if needs_portable_linux_rebuild "$dst"; then
+        needs_rebuild=1
+      fi
+    fi
+  done
+
+  if [[ "$found_any" -eq 0 ]] || [[ "$needs_rebuild" -eq 1 ]]; then
+    queue_package "roar-tracer-preload"
+    sync_preload_lib=1
+  fi
+}
+
+setup_portable_linux_builder() {
+  if ! cargo zigbuild --help >/dev/null 2>&1; then
+    echo "▶ Installing cargo-zigbuild..."
+    cargo install cargo-zigbuild
+  fi
+
+  if ! command -v python-zig >/dev/null 2>&1; then
+    echo "▶ Installing ziglang tool..."
+    uv tool install ziglang
+  fi
+
+  export CARGO_ZIGBUILD_ZIG_PATH="${CARGO_ZIGBUILD_ZIG_PATH:-$(command -v python-zig)}"
+  build_output_dir="$LINUX_PORTABLE_RELEASE_DIR"
+}
+
+resolve_built_artifact() {
+  local name="$1"
+  local candidate
+  for candidate in "$build_output_dir/$name" "$HOST_RELEASE_DIR/$name"; do
+    if [[ -f "$candidate" ]]; then
+      printf '%s\n' "$candidate"
+      return 0
+    fi
+  done
+  return 1
+}
+
+ensure_binary "roar-proxy" "roar-proxy"
+ensure_binary "roar-tracer" "roar-tracer"
+ensure_binary "roar-tracer-ebpf" "roar-tracer-ebpf"
+ensure_binary "roar-tracer-ebpf" "roard"
+ensure_binary "roar-tracer-preload" "roar-tracer-preload"
+ensure_preload_library
+
+if ((${#packages_to_build[@]} > 0)); then
+  echo "▶ Building packaged Rust binaries: ${packages_to_build[*]}"
+  if [[ "$(uname -s)" == "Linux" ]]; then
+    setup_portable_linux_builder
+    build_cmd=(
+      cargo zigbuild
+      --release
+      --manifest-path "$RUST_MANIFEST"
+      --target x86_64-unknown-linux-gnu.2.17
+    )
+  else
+    build_cmd=(cargo build --release --manifest-path "$RUST_MANIFEST")
+  fi
+
+  for package in "${packages_to_build[@]}"; do
+    build_cmd+=(-p "$package")
+  done
+  "${build_cmd[@]}"
+else
+  echo "▶ Packaged Rust binaries already present in roar/bin"
+fi
+
+echo "▶ Syncing packaged binaries into roar/bin..."
+for binary in roar-proxy roar-tracer roar-tracer-ebpf roar-tracer-preload roard; do
+  if [[ -z "${binaries_to_sync[$binary]:-}" ]]; then
+    continue
+  fi
+
+  src="$(resolve_built_artifact "$binary")" || {
+    echo "error: expected binary not found for packaging: $binary" >&2
+    exit 1
+  }
+  install -m 0755 "$src" "$BIN_DIR/$binary"
+done
+
+if [[ "$sync_preload_lib" -eq 1 ]]; then
+  copied_lib=0
+  for library in \
+    libroar_tracer_preload.so \
+    libroar-tracer-preload.so \
+    libroar_tracer_preload.dylib \
+    libroar-tracer-preload.dylib
+  do
+    src="$(resolve_built_artifact "$library" || true)"
+    if [[ -n "$src" ]]; then
+      install -m 0755 "$src" "$BIN_DIR/$library"
+      copied_lib=1
+    fi
+  done
+
+  if [[ "$copied_lib" -ne 1 ]]; then
+    echo "error: no preload interposer library available for packaging" >&2
+    exit 1
+  fi
+fi
+
+echo "▶ Building roar wheel into $OUT_DIR..."
+uv build --wheel --out-dir "$OUT_DIR"
+
+echo "▶ Verifying wheel contents..."
+(
+  cd "$ROOT_DIR"
+  ROAR_WHEEL_GLOB="$OUT_DIR/roar_cli-*.whl" python3 scripts/ci/verify_wheel_contents.py
+)
+
+echo "✓ Wheel build complete"
diff --git a/scripts/ci/verify_wheel_contents.py b/scripts/ci/verify_wheel_contents.py
index 928a2b46..8dd655fe 100644
--- a/scripts/ci/verify_wheel_contents.py
+++ b/scripts/ci/verify_wheel_contents.py
@@ -4,13 +4,19 @@
 
 import glob
 import os
+import re
+import shutil
+import subprocess
+import tempfile
 import zipfile
+from pathlib import Path
 
 
 def main() -> None:
-    wheels = sorted(glob.glob("dist/*.whl"))
+    wheel_glob = os.environ.get("ROAR_WHEEL_GLOB", "dist/*.whl")
+    wheels = sorted(glob.glob(wheel_glob))
     if len(wheels) != 1:
-        raise SystemExit(f"Expected exactly one wheel, found: {wheels}")
+        raise SystemExit(f"Expected exactly one wheel matching {wheel_glob!r}, found: {wheels}")
 
     wheel = wheels[0]
     with zipfile.ZipFile(wheel) as zf:
@@ -60,8 +66,60 @@ def main() -> None:
     if not has_preload_lib:
         raise SystemExit("Missing preload interposer library in wheel (roar/bin/libroar*_preload*)")
 
+    if platform == "linux":
+        _verify_linux_glibc_floor(wheel, names, required_bins)
+
     print(f"Verified wheel contents: {wheel}")
 
 
+def _verify_linux_glibc_floor(wheel: str, names: set[str], required_bins: set[str]) -> None:
+    max_allowed = _parse_glibc_version(os.environ.get("ROAR_WHEEL_MAX_GLIBC", "2.17"))
+    members = sorted(required_bins) + sorted(
+        name
+        for name in names
+        if name.startswith("roar/bin/libroar_tracer_preload")
+        or name.startswith("roar/bin/libroar-tracer-preload")
+    )
+
+    with tempfile.TemporaryDirectory(prefix="roar-wheel-verify-") as tmp_dir:
+        tmp_path = Path(tmp_dir)
+        with zipfile.ZipFile(wheel) as zf:
+            for member in members:
+                extracted = Path(zf.extract(member, tmp_path))
+                max_found = _max_glibc_version(extracted)
+                if max_found is None:
+                    continue
+                if max_found > max_allowed:
+                    raise SystemExit(
+                        f"{member} requires GLIBC_{max_found[0]}.{max_found[1]} "
+                        f"(max allowed GLIBC_{max_allowed[0]}.{max_allowed[1]})"
+                    )
+
+
+def _max_glibc_version(path: Path) -> tuple[int, int] | None:
+    if not shutil.which("objdump"):
+        raise SystemExit("objdump is required to verify Linux wheel portability")
+
+    result = subprocess.run(
+        ["objdump", "-p", str(path)],
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+    if result.returncode != 0:
+        raise SystemExit(f"objdump failed for {path}: {result.stderr.strip()}")
+
+    versions = [
+        _parse_glibc_version(match.group(1))
+        for match in re.finditer(r"GLIBC_(\d+\.\d+)", result.stdout)
+    ]
+    return max(versions) if versions else None
+
+
+def _parse_glibc_version(raw: str) -> tuple[int, int]:
+    major, minor = raw.split(".", 1)
+    return int(major), int(minor)
+
+
 if __name__ == "__main__":
     main()
diff --git a/scripts/sync_packaged_preload.py b/scripts/sync_packaged_preload.py
new file mode 100644
index 00000000..ad7ec83c
--- /dev/null
+++ b/scripts/sync_packaged_preload.py
@@ -0,0 +1,14 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+
+import sys
+from pathlib import Path
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
+
+from scripts.sync_packaged_rust_artifacts import (
+    main,
+)
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/sync_packaged_rust_artifacts.py b/scripts/sync_packaged_rust_artifacts.py
new file mode 100644
index 00000000..6b2e1682
--- /dev/null
+++ b/scripts/sync_packaged_rust_artifacts.py
@@ -0,0 +1,310 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+
+import argparse
+import filecmp
+import os
+import shutil
+import subprocess
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+
+LINUX_PORTABLE_TARGET = "x86_64-unknown-linux-gnu.2.17"
+LINUX_PORTABLE_TARGET_DIR = "x86_64-unknown-linux-gnu"
+
+
+@dataclass(frozen=True)
+class ArtifactSpec:
+    package_name: str
+    source_paths: tuple[Path, ...]
+    binary_names: tuple[str, ...] = ()
+    library_names: tuple[str, ...] = ()
+
+
+@dataclass(frozen=True)
+class SyncLayout:
+    root_dir: Path
+    rust_manifest: Path
+    release_dir: Path
+    package_bin_dir: Path
+    artifacts: tuple[ArtifactSpec, ...]
+    portable_target: str | None = None
+
+
+def _common_tracer_sources(root_dir: Path) -> tuple[Path, ...]:
+    return (
+        root_dir / "rust" / "Cargo.toml",
+        root_dir / "rust" / "Cargo.lock",
+        root_dir / "rust" / "crates" / "tracer-fd",
+        root_dir / "rust" / "crates" / "tracer-runtime",
+        root_dir / "rust" / "crates" / "tracer-schema",
+    )
+
+
+def _default_layout() -> SyncLayout:
+    root_dir = Path(__file__).resolve().parents[1]
+    library_suffix = ".dylib" if sys.platform == "darwin" else ".so"
+    common_sources = _common_tracer_sources(root_dir)
+    release_dir = root_dir / "rust" / "target" / "release"
+    portable_target = None
+
+    if sys.platform.startswith("linux"):
+        release_dir = root_dir / "rust" / "target" / LINUX_PORTABLE_TARGET_DIR / "release"
+        portable_target = LINUX_PORTABLE_TARGET
+
+    artifacts = [
+        ArtifactSpec(
+            package_name="roar-proxy",
+            source_paths=(
+                root_dir / "rust" / "Cargo.toml",
+                root_dir / "rust" / "Cargo.lock",
+                root_dir / "rust" / "services" / "proxy",
+            ),
+            binary_names=("roar-proxy",),
+        ),
+        ArtifactSpec(
+            package_name="roar-tracer-preload",
+            source_paths=(
+                *common_sources,
+                root_dir / "rust" / "tracers" / "preload",
+            ),
+            binary_names=("roar-tracer-preload",),
+            library_names=(
+                f"libroar_tracer_preload{library_suffix}",
+                f"libroar-tracer-preload{library_suffix}",
+            ),
+        ),
+    ]
+
+    if sys.platform.startswith("linux"):
+        artifacts.extend(
+            [
+                ArtifactSpec(
+                    package_name="roar-tracer",
+                    source_paths=(
+                        *common_sources,
+                        root_dir / "rust" / "tracers" / "ptrace",
+                    ),
+                    binary_names=("roar-tracer",),
+                ),
+                ArtifactSpec(
+                    package_name="roar-tracer-ebpf",
+                    source_paths=(
+                        *common_sources,
+                        root_dir / "rust" / "tracers" / "ebpf" / "common",
+                        root_dir / "rust" / "tracers" / "ebpf" / "probe",
+                        root_dir / "rust" / "tracers" / "ebpf" / "userspace",
+                    ),
+                    binary_names=("roar-tracer-ebpf", "roard"),
+                ),
+            ]
+        )
+
+    return SyncLayout(
+        root_dir=root_dir,
+        rust_manifest=root_dir / "rust" / "Cargo.toml",
+        release_dir=release_dir,
+        package_bin_dir=root_dir / "roar" / "bin",
+        artifacts=tuple(artifacts),
+        portable_target=portable_target,
+    )
+
+
+def _iter_source_files(paths: tuple[Path, ...]) -> list[Path]:
+    files: list[Path] = []
+    for source_path in paths:
+        if not source_path.exists():
+            continue
+        if source_path.is_file():
+            files.append(source_path)
+            continue
+        files.extend(path for path in source_path.rglob("*") if path.is_file())
+    return files
+
+
+def _latest_mtime(paths: list[Path]) -> float:
+    if not paths:
+        return 0.0
+    return max(path.stat().st_mtime for path in paths)
+
+
+def _artifact_is_stale(path: Path, latest_source_mtime: float) -> bool:
+    return not path.exists() or path.stat().st_mtime < latest_source_mtime
+
+
+def _first_existing_path(directory: Path, names: tuple[str, ...]) -> Path | None:
+    for name in names:
+        candidate = directory / name
+        if candidate.exists():
+            return candidate
+    return None
+
+
+def _sync_reason_for_path(
+    *,
+    release_path: Path | None,
+    package_path: Path,
+    latest_source_mtime: float,
+    missing_release_reason: str,
+    stale_release_reason: str,
+    stale_package_reason: str,
+    differs_reason: str,
+) -> str | None:
+    if release_path is None:
+        return missing_release_reason
+    if _artifact_is_stale(release_path, latest_source_mtime):
+        return stale_release_reason
+    if _artifact_is_stale(package_path, latest_source_mtime):
+        return stale_package_reason
+    if not filecmp.cmp(release_path, package_path, shallow=False):
+        return differs_reason
+    return None
+
+
+def sync_reason(layout: SyncLayout) -> str | None:
+    for artifact in layout.artifacts:
+        latest_source_mtime = _latest_mtime(_iter_source_files(artifact.source_paths))
+
+        for binary_name in artifact.binary_names:
+            reason = _sync_reason_for_path(
+                release_path=layout.release_dir / binary_name,
+                package_path=layout.package_bin_dir / binary_name,
+                latest_source_mtime=latest_source_mtime,
+                missing_release_reason=f"release {binary_name} is missing",
+                stale_release_reason=f"release {binary_name} is older than its sources",
+                stale_package_reason=f"packaged {binary_name} is older than its sources",
+                differs_reason=f"packaged {binary_name} differs from release artifact",
+            )
+            if reason is not None:
+                return reason
+
+        if artifact.library_names:
+            release_library = _first_existing_path(layout.release_dir, artifact.library_names)
+            package_library = _first_existing_path(layout.package_bin_dir, artifact.library_names)
+            if release_library is None:
+                return f"release library for {artifact.package_name} is missing"
+            package_target = package_library or layout.package_bin_dir / release_library.name
+            reason = _sync_reason_for_path(
+                release_path=release_library,
+                package_path=package_target,
+                latest_source_mtime=latest_source_mtime,
+                missing_release_reason=f"release library for {artifact.package_name} is missing",
+                stale_release_reason=f"release library for {artifact.package_name} is older than its sources",
+                stale_package_reason=f"packaged library for {artifact.package_name} is older than its sources",
+                differs_reason=f"packaged library for {artifact.package_name} differs from release artifact",
+            )
+            if reason is not None:
+                return reason
+    return None
+
+
+def _packages_needing_build(layout: SyncLayout) -> list[str]:
+    packages: list[str] = []
+    for artifact in layout.artifacts:
+        latest_source_mtime = _latest_mtime(_iter_source_files(artifact.source_paths))
+        needs_build = False
+        for binary_name in artifact.binary_names:
+            if _artifact_is_stale(layout.release_dir / binary_name, latest_source_mtime):
+                needs_build = True
+                break
+        if not needs_build and artifact.library_names:
+            release_library = _first_existing_path(layout.release_dir, artifact.library_names)
+            if release_library is None or _artifact_is_stale(release_library, latest_source_mtime):
+                needs_build = True
+        if needs_build and artifact.package_name not in packages:
+            packages.append(artifact.package_name)
+    return packages
+
+
+def _build_release_artifacts(layout: SyncLayout, packages: list[str]) -> None:
+    if not packages:
+        return
+    command = ["cargo"]
+    env = os.environ.copy()
+    if layout.portable_target:
+        command.extend(
+            [
+                "zigbuild",
+                "--release",
+                "--manifest-path",
+                str(layout.rust_manifest),
+                "--target",
+                layout.portable_target,
+            ]
+        )
+        zig_path = shutil.which("python-zig") or shutil.which("zig")
+        if zig_path:
+            env.setdefault("CARGO_ZIGBUILD_ZIG_PATH", zig_path)
+    else:
+        command.extend(
+            [
+                "build",
+                "--release",
+                "--manifest-path",
+                str(layout.rust_manifest),
+            ]
+        )
+    for package in packages:
+        command.extend(["-p", package])
+    subprocess.run(command, check=True, cwd=layout.root_dir, env=env)
+
+
+def _sync_file(src: Path, dst: Path) -> None:
+    dst.parent.mkdir(parents=True, exist_ok=True)
+    shutil.copy2(src, dst)
+    dst.chmod(0o755)
+
+
+def sync_packaged_rust_artifacts(layout: SyncLayout) -> None:
+    _build_release_artifacts(layout, _packages_needing_build(layout))
+
+    for artifact in layout.artifacts:
+        for binary_name in artifact.binary_names:
+            release_path = layout.release_dir / binary_name
+            if not release_path.exists():
+                raise SystemExit(f"release {binary_name} is missing after build")
+            _sync_file(release_path, layout.package_bin_dir / binary_name)
+
+        if artifact.library_names:
+            release_library = _first_existing_path(layout.release_dir, artifact.library_names)
+            if release_library is None:
+                raise SystemExit(
+                    f"release library for {artifact.package_name} is missing after build"
+                )
+            for library_name in artifact.library_names:
+                candidate = layout.package_bin_dir / library_name
+                if candidate.exists() and candidate.name != release_library.name:
+                    candidate.unlink()
+            _sync_file(release_library, layout.package_bin_dir / release_library.name)
+
+
+# Backward-compatible alias while callers migrate from the preload-only name.
+sync_packaged_preload = sync_packaged_rust_artifacts
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Sync the packaged Rust tracer/proxy artifacts in roar/bin with the Rust source.",
+    )
+    parser.add_argument(
+        "--check",
+        action="store_true",
+        help="exit non-zero if any packaged Rust artifact needs to be rebuilt or resynced",
+    )
+    args = parser.parse_args()
+
+    layout = _default_layout()
+    reason = sync_reason(layout)
+    if args.check:
+        if reason is not None:
+            raise SystemExit(reason)
+        print("packaged Rust artifacts are up to date")
+        return
+
+    sync_packaged_rust_artifacts(layout)
+    print("synced packaged Rust artifacts")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/benchmarks/bench_ray_e2e.py b/tests/benchmarks/bench_ray_e2e.py
index e4654c7e..b06271b8 100644
--- a/tests/benchmarks/bench_ray_e2e.py
+++ b/tests/benchmarks/bench_ray_e2e.py
@@ -170,10 +170,7 @@ def _make_roar_runtime_env() -> dict:
     env_vars = dict(runtime_env.get("env_vars", {}))
     env_vars.update(
         {
-            "ROAR_WORKER": "1",
             "ROAR_JOB_ID": f"bench-{uuid.uuid4().hex[:8]}",
-            "ROAR_LOG_DIR": "/shared/.roar-logs",
-            "ROAR_LOG_BACKEND": "actor",
             "AWS_ENDPOINT_URL": MINIO_INTERNAL_ENDPOINT,
             "AWS_ACCESS_KEY_ID": MINIO_ACCESS_KEY,
             "AWS_SECRET_ACCESS_KEY": MINIO_SECRET_KEY,
@@ -181,7 +178,6 @@ def _make_roar_runtime_env() -> dict:
         }
     )
     runtime_env["env_vars"] = env_vars
-    runtime_env["worker_process_setup_hook"] = "roar.ray.worker.setup"
     return runtime_env
 
 
diff --git a/tests/e2e/ray/Dockerfile b/tests/e2e/ray/Dockerfile
index 945b584e..cf768988 100644
--- a/tests/e2e/ray/Dockerfile
+++ b/tests/e2e/ray/Dockerfile
@@ -1,4 +1,4 @@
-FROM rayproject/ray:2.44.1-py312
+FROM rayproject/ray:2.54.0-py312
 
 USER root
 
@@ -6,15 +6,14 @@ RUN apt-get update \
     && apt-get install -y --no-install-recommends curl ca-certificates build-essential \
     && rm -rf /var/lib/apt/lists/*
 
-ENV RUSTUP_HOME=/root/.rustup
-ENV CARGO_HOME=/root/.cargo
-ENV PATH=/root/.cargo/bin:${PATH}
-
-RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --profile minimal --default-toolchain stable
-
 WORKDIR /app
 
 COPY . /app
 
-RUN pip install --no-cache-dir boto3 pyarrow pandas pytest pytest-timeout minio \
-    && pip install --no-cache-dir -e .
+RUN pip install --no-cache-dir \
+    pydantic==2.12.5 \
+    pydantic-settings==2.12.0 \
+    boto3 pyarrow pandas pytest pytest-timeout minio \
+    && pip install --no-cache-dir -e . \
+    && cp /app/roar_inject.pth "$(python -c 'import site; print(site.getsitepackages()[0])')/" \
+    && ln -sf /app/roar/bin/roar-proxy /usr/local/bin/roar-proxy
diff --git a/tests/e2e/ray/conftest.py b/tests/e2e/ray/conftest.py
index 195121bf..e31280d7 100644
--- a/tests/e2e/ray/conftest.py
+++ b/tests/e2e/ray/conftest.py
@@ -2,18 +2,36 @@
 
 from __future__ import annotations
 
+import base64
 import contextlib
+import functools
 import importlib
 import json
+import os
+import shlex
+import sqlite3
 import subprocess
 import sys
+import tempfile
 import time
-from collections.abc import Mapping
+import urllib.parse
+import urllib.request
+from collections.abc import Mapping, Sequence
 from pathlib import Path
 
 import pytest
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
 
 COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
+REPO_ROOT = COMPOSE_FILE.parent.parent.parent.parent.resolve()
+ROAR_BIN = REPO_ROOT / ".venv" / "bin" / "roar"
+HOST_JOBS_DIR = COMPOSE_FILE.parent / "jobs"
+HOST_PROJECTS_DIR = REPO_ROOT.parent / ".tmp-ray-e2e"
+HOST_GLAAS_URL = "http://localhost:3001"
+CLUSTER_GLAAS_URL = "http://host.docker.internal:3001"
+HEAD_PROJECT_DIR = "/app"
+JOBS_DIR = f"{HEAD_PROJECT_DIR}/tests/e2e/ray/jobs"
+FRAGMENT_STORE_URL = CLUSTER_GLAAS_URL
 HEAD_TIMEOUT_SECONDS = 120
 WORKERS_TIMEOUT_SECONDS = 60
 POLL_INTERVAL_SECONDS = 3
@@ -64,18 +82,60 @@ def _get_ray():
 
 
 def pytest_configure(config: pytest.Config) -> None:
+    # Ensure subprocess calls in tests can find tools installed in this repo's
+    # virtualenv (for example the `ray` CLI used by infra health checks).
+    venv_bin = REPO_ROOT / ".venv" / "bin"
+    if venv_bin.exists():
+        current_path = os.environ.get("PATH", "")
+        venv_bin_text = str(venv_bin)
+        if venv_bin_text not in current_path.split(":"):
+            os.environ["PATH"] = (
+                f"{venv_bin_text}:{current_path}" if current_path else venv_bin_text
+            )
+
     config.option.importmode = "importlib"
     with contextlib.suppress(
         ModuleNotFoundError
     ):  # Ray not installed; e2e tests require a live Docker cluster
         _get_ray()
     config.addinivalue_line("markers", "ray_e2e: Ray end-to-end tests requiring Docker")
+    config.addinivalue_line(
+        "markers",
+        "ray_contract: User-facing Ray contract tests using `roar run ray job submit ...`",
+    )
+    config.addinivalue_line(
+        "markers",
+        "ray_diagnostic: Diagnostic Ray tests that inspect internal runtime details",
+    )
 
 
 def pytest_collection_modifyitems(items: list[pytest.Item]) -> None:
     marker = pytest.mark.ray_e2e
     for item in items:
         item.add_marker(marker)
+        if item.get_closest_marker("timeout") is None:
+            item.add_marker(pytest.mark.timeout(180))
+
+
+@functools.lru_cache(maxsize=1)
+def _docker_accessible() -> bool:
+    try:
+        subprocess.run(
+            ["docker", "info"],
+            check=True,
+            capture_output=True,
+            timeout=5,
+        )
+        return True
+    except (subprocess.SubprocessError, OSError):
+        return False
+
+
+def run_docker(args: Sequence[str], **kwargs):
+    command = list(args)
+    if _docker_accessible():
+        return subprocess.run(command, **kwargs)
+    return subprocess.run(["sg", "docker", "-c", shlex.join(command)], **kwargs)
 
 
 def _compose_args(compose_file: Path, *args: str) -> list[str]:
@@ -86,7 +146,7 @@ def _wait_for_ray_head(compose_file: Path) -> None:
     deadline = time.monotonic() + HEAD_TIMEOUT_SECONDS
     last_error = ""
     while time.monotonic() < deadline:
-        result = subprocess.run(
+        result = run_docker(
             _compose_args(compose_file, "exec", "-T", "ray-head", "ray", "status"),
             capture_output=True,
             text=True,
@@ -108,7 +168,7 @@ def _alive_node_count(compose_file: Path) -> int:
         "print(json.dumps({'alive': len(alive)})); "
         "ray.shutdown()"
     )
-    result = subprocess.run(
+    result = run_docker(
         _compose_args(compose_file, "exec", "-T", "ray-head", "python", "-c", script),
         capture_output=True,
         text=True,
@@ -143,7 +203,7 @@ def _ensure_roar_db(compose_file: Path) -> None:
     Ensure roar is initialised on the head node before tests run.
     Idempotent: harmless if .roar already exists.
     """
-    subprocess.run(
+    run_docker(
         _compose_args(
             compose_file,
             "exec",
@@ -158,9 +218,425 @@ def _ensure_roar_db(compose_file: Path) -> None:
     )
 
 
+def exec_on_service(
+    service: str,
+    args: Sequence[str],
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    env: Mapping[str, str] | None = None,
+    timeout: float | None = None,
+) -> subprocess.CompletedProcess[str]:
+    compose_path = Path(compose_file)
+    command = ["docker", "compose", "-f", str(compose_path), "exec", "-T"]
+    if env:
+        for key, value in env.items():
+            command.extend(["-e", f"{key}={value}"])
+    command.append(service)
+    command.extend(args)
+    return run_docker(command, capture_output=True, text=True, check=False, timeout=timeout)
+
+
+def _roar_bin() -> str:
+    if ROAR_BIN.exists():
+        return str(ROAR_BIN)
+    return "roar"
+
+
+def _run_checked_local(command: Sequence[str], *, cwd: Path) -> None:
+    subprocess.run(list(command), cwd=cwd, check=True, capture_output=True)
+
+
+def _sync_packaged_rust_artifacts_for_ray_images() -> None:
+    subprocess.run(
+        [sys.executable, "scripts/sync_packaged_rust_artifacts.py"],
+        cwd=REPO_ROOT,
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+
+
+def make_host_project_dir(prefix: str = "project") -> Path:
+    HOST_PROJECTS_DIR.mkdir(parents=True, exist_ok=True)
+    return Path(tempfile.mkdtemp(prefix=f"{prefix}-", dir=str(HOST_PROJECTS_DIR)))
+
+
+def init_host_project(
+    project_dir: Path,
+    *,
+    glaas_url: str | None = HOST_GLAAS_URL,
+    ignore_tmp_files: bool | None = None,
+) -> None:
+    project_dir.mkdir(parents=True, exist_ok=True)
+    (project_dir / "README.md").write_text("ray host-submit e2e\n", encoding="utf-8")
+    (project_dir / ".gitignore").write_text(".roar/\n", encoding="utf-8")
+    _run_checked_local(["git", "init", "-q"], cwd=project_dir)
+    _run_checked_local(["git", "config", "user.email", "test@test.com"], cwd=project_dir)
+    _run_checked_local(["git", "config", "user.name", "test"], cwd=project_dir)
+    _run_checked_local(["git", "add", "README.md", ".gitignore"], cwd=project_dir)
+    _run_checked_local(["git", "commit", "-q", "-m", "init"], cwd=project_dir)
+    _run_checked_local([_roar_bin(), "init", "--path", str(project_dir), "-n"], cwd=project_dir)
+    if glaas_url:
+        _run_checked_local([_roar_bin(), "config", "set", "glaas.url", glaas_url], cwd=project_dir)
+    if ignore_tmp_files is not None:
+        _run_checked_local(
+            [
+                _roar_bin(),
+                "config",
+                "set",
+                "filters.ignore_tmp_files",
+                "true" if ignore_tmp_files else "false",
+            ],
+            cwd=project_dir,
+        )
+
+
+def build_roar_submit_env_from_host(
+    ray_cluster: Mapping[str, str],
+    *,
+    use_fragment_store: bool,
+    extra_env: Mapping[str, str] | None = None,
+    glaas_url: str = HOST_GLAAS_URL,
+    cluster_glaas_url: str = CLUSTER_GLAAS_URL,
+) -> dict[str, str]:
+    env = dict(os.environ)
+    env.update(
+        {
+            "ROAR_CLUSTER_PIP_REQ": "skip",
+            "AWS_ACCESS_KEY_ID": "minioadmin",
+            "AWS_SECRET_ACCESS_KEY": "minioadmin",
+            "AWS_DEFAULT_REGION": "us-east-1",
+            "AWS_ENDPOINT_URL": str(ray_cluster["minio_endpoint"]),
+            "ROAR_CLUSTER_AWS_ENDPOINT_URL": str(
+                ray_cluster.get("cluster_minio_endpoint", "http://minio:9000")
+            ),
+        }
+    )
+    if use_fragment_store:
+        env["GLAAS_URL"] = glaas_url
+        env["ROAR_CLUSTER_GLAAS_URL"] = cluster_glaas_url
+    else:
+        env["GLAAS_URL"] = ""
+        env.pop("ROAR_CLUSTER_GLAAS_URL", None)
+    if extra_env:
+        env.update({str(key): str(value) for key, value in extra_env.items()})
+    return env
+
+
+def run_roar_ray_job_from_host(
+    project_dir: Path,
+    ray_cluster: Mapping[str, str],
+    script_path: str | Path,
+    *,
+    use_fragment_store: bool,
+    tracer: str | None = "ptrace",
+    extra_env: Mapping[str, str] | None = None,
+    script_args: Sequence[str] | None = None,
+    submit_args: Sequence[str] | None = None,
+    working_dir: Path = HOST_JOBS_DIR,
+    timeout: float = 180,
+) -> subprocess.CompletedProcess[str]:
+    resolved_script = Path(script_path)
+    if resolved_script.is_absolute():
+        try:
+            script_arg = str(resolved_script.relative_to(working_dir))
+        except ValueError:
+            script_arg = str(resolved_script)
+    else:
+        script_arg = str(script_path)
+
+    command = [
+        _roar_bin(),
+        "run",
+    ]
+    if tracer:
+        command.extend(["--tracer", tracer])
+    command.extend(
+        [
+            "ray",
+            "job",
+            "submit",
+            "--address",
+            str(ray_cluster["dashboard_url"]),
+            "--working-dir",
+            str(working_dir),
+        ]
+    )
+    if submit_args:
+        command.extend(submit_args)
+    command.extend(["--", "python", script_arg])
+    if script_args:
+        command.extend(script_args)
+
+    return subprocess.run(
+        command,
+        cwd=project_dir,
+        env=build_roar_submit_env_from_host(
+            ray_cluster,
+            use_fragment_store=use_fragment_store,
+            extra_env=extra_env,
+        ),
+        capture_output=True,
+        text=True,
+        timeout=timeout,
+    )
+
+
+def run_roar_cli_from_host(
+    project_dir: Path,
+    *args: str,
+    extra_env: Mapping[str, str] | None = None,
+    timeout: float = 30,
+) -> subprocess.CompletedProcess[str]:
+    env = dict(os.environ)
+    if extra_env:
+        env.update({str(key): str(value) for key, value in extra_env.items()})
+    return subprocess.run(
+        [_roar_bin(), *args],
+        cwd=project_dir,
+        env=env,
+        capture_output=True,
+        text=True,
+        timeout=timeout,
+    )
+
+
+def query_roar_db(
+    project_dir: Path,
+    sql: str,
+    params: Sequence[object] = (),
+) -> list[dict[str, object]]:
+    db_path = project_dir / ".roar" / "roar.db"
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+    try:
+        cur = conn.cursor()
+        cur.execute(sql, tuple(params))
+        return [dict(row) for row in cur.fetchall()]
+    finally:
+        conn.close()
+
+
+def load_fragment_key(project_dir: Path) -> dict[str, str]:
+    key_dir = project_dir / ".roar" / "fragment-sessions"
+    key_files = sorted(key_dir.glob("*.key"))
+    assert key_files, f"Expected a fragment key under {key_dir}"
+    payload = json.loads(key_files[-1].read_text(encoding="utf-8"))
+    assert isinstance(payload, dict), f"Unexpected fragment key payload: {payload!r}"
+    return {str(key): str(value) for key, value in payload.items()}
+
+
+def fetch_fragment_batches(
+    session_id: str,
+    token: str,
+    *,
+    glaas_url: str = HOST_GLAAS_URL,
+) -> list[dict[str, object]]:
+    encoded_token = urllib.parse.quote(token, safe="")
+    url = f"{glaas_url.rstrip('/')}/api/v1/fragments/sessions/{session_id}/fragments?token={encoded_token}"
+    with urllib.request.urlopen(url, timeout=5) as response:
+        payload = json.loads(response.read().decode("utf-8"))
+    fragments = payload.get("fragments")
+    if fragments is None and isinstance(payload.get("data"), dict):
+        fragments = payload["data"].get("fragments")
+    assert isinstance(fragments, list), f"Expected fragment list from {url}, got: {payload!r}"
+    return [item for item in fragments if isinstance(item, dict)]
+
+
+def decrypt_fragment_batches(
+    batches: Sequence[dict[str, object]],
+    token: str,
+) -> list[dict[str, object]]:
+    key = bytes.fromhex(token)
+    decrypted: list[dict[str, object]] = []
+    for batch in batches:
+        encrypted_batch = batch.get("encrypted_batch")
+        if not isinstance(encrypted_batch, str) or not encrypted_batch:
+            continue
+        payload = base64.b64decode(encrypted_batch)
+        nonce = payload[:12]
+        ciphertext = payload[12:]
+        plaintext = AESGCM(key).decrypt(nonce, ciphertext, None)
+        decoded = json.loads(plaintext.decode("utf-8"))
+        if isinstance(decoded, list):
+            decrypted.extend(item for item in decoded if isinstance(item, dict))
+    return decrypted
+
+
+def exec_shell_on_service(
+    service: str,
+    cmd: str,
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    env: Mapping[str, str] | None = None,
+) -> tuple[str, str, int]:
+    result = exec_on_service(
+        service,
+        ["bash", "-lc", cmd],
+        compose_file=compose_file,
+        env=env,
+    )
+    return result.stdout, result.stderr, result.returncode
+
+
+def exec_on_head(
+    args: Sequence[str],
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    env: Mapping[str, str] | None = None,
+    timeout: float | None = None,
+) -> subprocess.CompletedProcess[str]:
+    return exec_on_service(
+        "ray-head",
+        args,
+        compose_file=compose_file,
+        env=env,
+        timeout=timeout,
+    )
+
+
+def exec_shell_on_head(
+    cmd: str,
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    env: Mapping[str, str] | None = None,
+) -> tuple[str, str, int]:
+    return exec_shell_on_service("ray-head", cmd, compose_file=compose_file, env=env)
+
+
+def reset_roar_project_on_head(
+    compose_file: str | Path = COMPOSE_FILE,
+    *,
+    project_dir: str = HEAD_PROJECT_DIR,
+    glaas_url: str | None = FRAGMENT_STORE_URL,
+) -> None:
+    configure_glaas = ""
+    if glaas_url is not None:
+        configure_glaas = (
+            f" && roar config set glaas.url {shlex.quote(str(glaas_url))}"
+            f" && roar config set glaas.web_url {shlex.quote(str(glaas_url))}"
+        )
+    stdout, stderr, rc = exec_shell_on_head(
+        (
+            f"cd {shlex.quote(project_dir)}"
+            " && git config --global user.email test@test.com"
+            " && git config --global user.name test"
+            " && git init -q"
+            " && git add -A"
+            " && git commit -q -m init --allow-empty"
+            f" && rm -rf {shlex.quote(project_dir)}/.roar"
+            f" && roar init --path {shlex.quote(project_dir)} -n"
+            f"{configure_glaas}"
+        ),
+        compose_file=compose_file,
+    )
+    if rc != 0:
+        raise AssertionError(f"roar init failed on ray-head:\nstdout:\n{stdout}\nstderr:\n{stderr}")
+
+
+def build_roar_submit_env_on_head(
+    *,
+    use_fragment_store: bool,
+    extra_env: Mapping[str, str] | None = None,
+) -> dict[str, str]:
+    env = {
+        "AWS_ENDPOINT_URL": "http://minio:9000",
+        "AWS_ACCESS_KEY_ID": "minioadmin",
+        "AWS_SECRET_ACCESS_KEY": "minioadmin",
+        "ROAR_CLUSTER_PIP_REQ": "skip",
+    }
+    if use_fragment_store:
+        env["GLAAS_URL"] = FRAGMENT_STORE_URL
+    else:
+        env["GLAAS_URL"] = ""
+    if extra_env:
+        env.update({str(key): str(value) for key, value in extra_env.items()})
+    return env
+
+
+def run_roar_ray_job_on_head(
+    script_path: str,
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    use_fragment_store: bool,
+    extra_env: Mapping[str, str] | None = None,
+    script_args: Sequence[str] | None = None,
+    submit_args: Sequence[str] | None = None,
+    working_dir: str = HEAD_PROJECT_DIR,
+    timeout: float = 180,
+) -> tuple[str, str, int]:
+    command = [
+        "roar",
+        "run",
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        "http://127.0.0.1:8265",
+        "--working-dir",
+        working_dir,
+    ]
+    if submit_args:
+        command.extend(submit_args)
+    command.extend(["--", "python", script_path])
+    if script_args:
+        command.extend(script_args)
+
+    result = exec_on_head(
+        command,
+        compose_file=compose_file,
+        env=build_roar_submit_env_on_head(
+            use_fragment_store=use_fragment_store,
+            extra_env=extra_env,
+        ),
+        timeout=timeout,
+    )
+    return result.stdout, result.stderr, result.returncode
+
+
+def query_roar_db_on_head(
+    sql: str,
+    params: Sequence[object] = (),
+    *,
+    compose_file: str | Path = COMPOSE_FILE,
+    db_path: str = f"{HEAD_PROJECT_DIR}/.roar/roar.db",
+) -> list[dict[str, object]]:
+    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
+        tmp_path = tmp.name
+
+    run_docker(
+        [
+            "docker",
+            "compose",
+            "-f",
+            str(Path(compose_file)),
+            "cp",
+            f"ray-head:{db_path}",
+            tmp_path,
+        ],
+        check=True,
+        capture_output=True,
+    )
+
+    conn = sqlite3.connect(tmp_path)
+    conn.row_factory = sqlite3.Row
+    try:
+        cur = conn.cursor()
+        cur.execute(sql, tuple(params))
+        return [dict(row) for row in cur.fetchall()]
+    finally:
+        conn.close()
+        Path(tmp_path).unlink(missing_ok=True)
+
+
 @pytest.fixture(scope="session")
 def ray_cluster() -> dict[str, str]:
-    subprocess.run(
+    run_docker(
+        _compose_args(COMPOSE_FILE, "down", "-v", "--remove-orphans"),
+        check=False,
+    )
+    _sync_packaged_rust_artifacts_for_ray_images()
+    run_docker(
         _compose_args(COMPOSE_FILE, "up", "-d", "--build"),
         check=True,
     )
@@ -172,10 +648,11 @@ def ray_cluster() -> dict[str, str]:
             "head_address": "ray://localhost:10001",
             "dashboard_url": "http://localhost:8265",
             "minio_endpoint": "http://localhost:9000",
+            "cluster_minio_endpoint": "http://minio:9000",
             "compose_file": str(COMPOSE_FILE),
         }
     finally:
-        subprocess.run(
+        run_docker(
             _compose_args(COMPOSE_FILE, "down", "-v"),
             check=False,
         )
@@ -197,6 +674,7 @@ def submit_job_on_head(
     script_path: str,
     env: Mapping[str, str] | None = None,
 ) -> tuple[str, str, int]:
+    """Diagnostic helper that bypasses `roar run ray job submit`."""
     compose_path = Path(compose_file)
     merged_env = dict(env or {})
 
@@ -207,12 +685,12 @@ def submit_job_on_head(
         existing_pp = merged_env.get("PYTHONPATH", "")
         merged_env["PYTHONPATH"] = f"{inject_dir}:{existing_pp}" if existing_pp else inject_dir
         merged_env.setdefault("ROAR_PROJECT_DIR", "/app")
-        merged_env.setdefault("ROAR_LOG_DIR", "/shared/.roar-logs")
+        merged_env.setdefault("ROAR_RAY_NODE_AGENTS", "1")
 
     command = ["docker", "compose", "-f", str(compose_path), "exec", "-T"]
     for key, value in merged_env.items():
         command.extend(["-e", f"{key}={value}"])
     command.extend(["ray-head", "python", script_path])
 
-    result = subprocess.run(command, capture_output=True, text=True, check=False)
+    result = run_docker(command, capture_output=True, text=True, check=False)
     return result.stdout, result.stderr, result.returncode
diff --git a/tests/e2e/ray/docker-compose.yml b/tests/e2e/ray/docker-compose.yml
index 2d978bc6..30c9c725 100644
--- a/tests/e2e/ray/docker-compose.yml
+++ b/tests/e2e/ray/docker-compose.yml
@@ -45,6 +45,8 @@ services:
         condition: service_healthy
       minio-init:
         condition: service_completed_successfully
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
     ports:
       - "6379:6379"
       - "8265:8265"
@@ -54,8 +56,6 @@ services:
       interval: 5s
       timeout: 3s
       retries: 20
-    volumes:
-      - roar-data:/shared
     networks:
       - roar-ray-test
 
@@ -72,8 +72,8 @@ services:
     depends_on:
       ray-head:
         condition: service_healthy
-    volumes:
-      - roar-data:/shared
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
     networks:
       - roar-ray-test
 
@@ -90,14 +90,11 @@ services:
     depends_on:
       ray-head:
         condition: service_healthy
-    volumes:
-      - roar-data:/shared
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
     networks:
       - roar-ray-test
 
-volumes:
-  roar-data:
-
 networks:
   roar-ray-test:
     driver: bridge
diff --git a/tests/e2e/ray/jobs/attributed_file_io.py b/tests/e2e/ray/jobs/attributed_file_io.py
index 7c39e224..c494bed5 100644
--- a/tests/e2e/ray/jobs/attributed_file_io.py
+++ b/tests/e2e/ray/jobs/attributed_file_io.py
@@ -10,6 +10,7 @@
 import json
 import os
 import sys
+from pathlib import Path
 
 import ray
 
@@ -29,40 +30,20 @@ def write_attributed_file(task_index: int, output_dir: str) -> dict:
         "output_path": output_path,
     }
 
-    with open(output_path, "w", encoding="utf-8") as f:
-        json.dump(payload, f)
+    Path(output_path).parent.mkdir(parents=True, exist_ok=True)
+    with open(output_path, "w", encoding="utf-8") as handle:
+        json.dump(payload, handle)
 
     return payload
 
 
-@ray.remote
-def read_and_summarize(paths: list[str]) -> dict:
-    """Read multiple files written by other tasks and return a summary."""
-    ctx = ray.get_runtime_context()
-    records = []
-    for path in paths:
-        with open(path, encoding="utf-8") as f:
-            records.append(json.load(f))
-    return {
-        "reader_task_id": ctx.get_task_id(),
-        "reader_node_id": ctx.get_node_id(),
-        "records_read": len(records),
-        "paths": paths,
-    }
-
-
 if __name__ == "__main__":
-    output_dir = sys.argv[1] if len(sys.argv) > 1 else "/shared/attributed"
+    default_dir = Path.cwd() / "artifacts" / "attributed"
+    output_dir = sys.argv[1] if len(sys.argv) > 1 else str(default_dir)
     os.makedirs(output_dir, exist_ok=True)
 
     ray.init(address="auto")
 
-    # Distribute 6 write tasks across workers
     write_refs = [write_attributed_file.remote(i, output_dir) for i in range(6)]
     write_results = ray.get(write_refs)
-
-    # Read all outputs from a single task
-    written_paths = [r["output_path"] for r in write_results]
-    summary = ray.get(read_and_summarize.remote(written_paths))
-
-    print(json.dumps({"writes": write_results, "summary": summary}))
+    print(json.dumps({"writes": write_results}))
diff --git a/tests/e2e/ray/jobs/basic_file_io.py b/tests/e2e/ray/jobs/basic_file_io.py
index 1c52b2a8..97d4913a 100644
--- a/tests/e2e/ray/jobs/basic_file_io.py
+++ b/tests/e2e/ray/jobs/basic_file_io.py
@@ -3,25 +3,17 @@
 from __future__ import annotations
 
 import json
+from pathlib import Path
 
 import ray
 
 
 @ray.remote
-def write_file(path: str, data: str) -> str:
-    with open(path, "w", encoding="utf-8") as handle:
-        handle.write(data)
-    return path
-
-
-@ray.remote
-def read_file(path: str) -> str:
-    with open(path, encoding="utf-8") as handle:
-        return handle.read()
-
-
-@ray.remote
-def transform(input_path: str, output_path: str) -> dict[str, object]:
+def run_file_io_pipeline(input_path: str, output_path: str) -> dict[str, object]:
+    seed_payload = {"a": 1, "b": 2, "label": "sample"}
+    Path(input_path).parent.mkdir(parents=True, exist_ok=True)
+    with open(input_path, "w", encoding="utf-8") as handle:
+        json.dump(seed_payload, handle)
     with open(input_path, encoding="utf-8") as handle:
         payload = json.load(handle)
 
@@ -35,19 +27,20 @@ def transform(input_path: str, output_path: str) -> dict[str, object]:
     with open(output_path, "w", encoding="utf-8") as handle:
         json.dump(transformed, handle)
 
-    return transformed
+    with open(output_path, encoding="utf-8") as handle:
+        result = json.load(handle)
+    return {"input_path": input_path, "output_path": output_path, "result": result}
 
 
 def main() -> None:
     ray.init(address="auto")
-    input_path = "/shared/input.json"
-    output_path = "/shared/output.json"
-
-    seed_payload = {"a": 1, "b": 2, "label": "sample"}
-    ray.get(write_file.remote(input_path, json.dumps(seed_payload)))
-    ray.get(transform.remote(input_path, output_path))
-
-    result = json.loads(ray.get(read_file.remote(output_path)))
+    base_dir = Path.cwd() / "artifacts" / "basic_file_io"
+    result = ray.get(
+        run_file_io_pipeline.remote(
+            str(base_dir / "input.json"),
+            str(base_dir / "output.json"),
+        )
+    )
     print(json.dumps(result))
 
 
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/__init__.py b/tests/e2e/ray/jobs/cloud_demo_emulated/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/__init__.py
@@ -0,0 +1 @@
+
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/main.py b/tests/e2e/ray/jobs/cloud_demo_emulated/main.py
new file mode 100644
index 00000000..cb86d7e5
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/main.py
@@ -0,0 +1,63 @@
+"""Cloud-demo-shaped pipeline entrypoint for fragment sufficiency contracts."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import sys
+import time
+import uuid
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parent
+PHASE_SCRIPTS = [
+    ("extraction", APP_DIR / "scripts" / "run_extraction.py"),
+    ("training", APP_DIR / "scripts" / "run_training.py"),
+    ("evaluation", APP_DIR / "scripts" / "run_evaluation.py"),
+]
+
+
+def _run_phase(phase: str, script_path: Path, state_file: Path) -> float:
+    started = time.perf_counter()
+    subprocess.run(
+        [sys.executable, str(script_path), "--state-file", str(state_file)],
+        check=True,
+        cwd=APP_DIR,
+    )
+    elapsed_ms = (time.perf_counter() - started) * 1000
+    print(f"[pipeline:{phase}] {elapsed_ms:.1f}ms")
+    return elapsed_ms
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="cloud-demo-emulated")
+    parser.add_argument("--run-id", default=None)
+    args = parser.parse_args()
+
+    run_id = args.run_id or f"run-{uuid.uuid4().hex[:8]}"
+    state_file = Path(f"/tmp/cloud-demo-emulated-state-{run_id}.json")
+    state_file.parent.mkdir(parents=True, exist_ok=True)
+    state_file.write_text(json.dumps({"run_id": run_id}), encoding="utf-8")
+
+    phase_times: dict[str, float] = {}
+    for phase, script in PHASE_SCRIPTS:
+        phase_times[phase] = _run_phase(phase, script, state_file)
+
+    final_state = json.loads(state_file.read_text(encoding="utf-8"))
+    print(
+        json.dumps(
+            {
+                "script": "cloud_demo_emulated",
+                "run_id": run_id,
+                "shard_keys": final_state.get("shard_keys", []),
+                "model_key": final_state.get("model_key"),
+                "metrics_key": final_state.get("metrics_key"),
+                "phase_times_ms": phase_times,
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_evaluation.py b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_evaluation.py
new file mode 100644
index 00000000..fa962c1a
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_evaluation.py
@@ -0,0 +1,33 @@
+"""Cloud-demo-emulated evaluation phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_emulated.workload.evaluation import run_evaluation  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run emulated evaluation phase")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["metrics_key"] = run_evaluation(
+        str(state["model_key"]), list(state["shard_keys"]), run_id
+    )
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved evaluation state to {state_path} (metrics={state['metrics_key']})")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_extraction.py b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_extraction.py
new file mode 100644
index 00000000..8027c87d
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_extraction.py
@@ -0,0 +1,31 @@
+"""Cloud-demo-emulated extraction phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_emulated.workload.extraction import run_extraction  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run emulated extraction phase")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["shard_keys"] = run_extraction(run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved extraction state to {state_path} ({len(state['shard_keys'])} shards)")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_training.py b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_training.py
new file mode 100644
index 00000000..9e358bd0
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/scripts/run_training.py
@@ -0,0 +1,31 @@
+"""Cloud-demo-emulated training phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_emulated.workload.training import run_training  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run emulated training phase")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["model_key"] = run_training(list(state["shard_keys"]), run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved training state to {state_path} (model={state['model_key']})")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/__init__.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/__init__.py
@@ -0,0 +1 @@
+
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/aws_client.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/aws_client.py
new file mode 100644
index 00000000..85a00949
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/aws_client.py
@@ -0,0 +1,21 @@
+"""Shared S3 client helpers for the emulated cloud-demo pipeline."""
+
+from __future__ import annotations
+
+import os
+
+import boto3
+
+
+def resolve_s3_endpoint() -> str | None:
+    return os.getenv("AWS_ENDPOINT_URL")
+
+
+def s3_client(*, endpoint_url: str | None = None):
+    return boto3.client(
+        "s3",
+        endpoint_url=endpoint_url,
+        aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name=os.getenv("AWS_DEFAULT_REGION", "us-east-1"),
+    )
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/config.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/config.py
new file mode 100644
index 00000000..fa76618a
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/config.py
@@ -0,0 +1,14 @@
+"""Configuration for the emulated cloud-demo pipeline."""
+
+from __future__ import annotations
+
+import os
+
+SHARD_COUNT = int(os.getenv("CLOUD_DEMO_EMULATED_SHARD_COUNT", "25"))
+NUM_FRAMES_PER_FILE = int(os.getenv("CLOUD_DEMO_EMULATED_FRAMES", "5"))
+NUM_EPOCHS = int(os.getenv("CLOUD_DEMO_EMULATED_EPOCHS", "1"))
+EVAL_LIMIT = int(os.getenv("CLOUD_DEMO_EMULATED_EVAL_LIMIT", "20"))
+
+S3_DATA_BUCKET = os.getenv("S3_DATA_BUCKET", "test-bucket")
+S3_MODELS_BUCKET = os.getenv("S3_MODELS_BUCKET", "output-bucket")
+S3_RESULTS_BUCKET = os.getenv("S3_RESULTS_BUCKET", "output-bucket")
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/evaluation.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/evaluation.py
new file mode 100644
index 00000000..b1d3b1cf
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/evaluation.py
@@ -0,0 +1,64 @@
+"""Evaluation workload matching the real cloud-demo shape."""
+
+from __future__ import annotations
+
+import io
+import json
+
+import pyarrow.parquet as pq
+from cloud_demo_emulated.workload.aws_client import resolve_s3_endpoint, s3_client
+from cloud_demo_emulated.workload.config import (
+    EVAL_LIMIT,
+    S3_DATA_BUCKET,
+    S3_MODELS_BUCKET,
+    S3_RESULTS_BUCKET,
+)
+
+import ray
+
+
+@ray.remote
+def evaluate_shard(
+    shard_key: str,
+    model_key: str,
+    data_bucket: str,
+    models_bucket: str,
+    endpoint: str | None,
+) -> dict:
+    s3 = s3_client(endpoint_url=endpoint)
+    model_bytes = s3.get_object(Bucket=models_bucket, Key=model_key)["Body"].read()
+    model = json.loads(model_bytes.decode("utf-8"))
+    obj = s3.get_object(Bucket=data_bucket, Key=shard_key)
+    table = pq.read_table(io.BytesIO(obj["Body"].read()))
+    score = float(model["weight"]) / max(table.num_rows, 1)
+    return {"shard": shard_key, "score": score}
+
+
+def run_evaluation(
+    model_key: str, shard_keys: list[str], run_id: str, ray_address: str = "auto"
+) -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        endpoint = resolve_s3_endpoint()
+        eval_shards = shard_keys[:EVAL_LIMIT]
+        futures = [
+            evaluate_shard.remote(shard_key, model_key, S3_DATA_BUCKET, S3_MODELS_BUCKET, endpoint)
+            for shard_key in eval_shards
+        ]
+        results = ray.get(futures)
+    finally:
+        ray.shutdown()
+
+    summary = {
+        "run_id": run_id,
+        "avg_score": float(sum(item["score"] for item in results) / max(len(results), 1)),
+        "num_shards_evaluated": len(results),
+    }
+    metrics_key = f"evaluation/{run_id}/metrics.json"
+    s3 = s3_client(endpoint_url=resolve_s3_endpoint())
+    s3.put_object(
+        Bucket=S3_RESULTS_BUCKET,
+        Key=metrics_key,
+        Body=json.dumps(summary, sort_keys=True).encode("utf-8"),
+    )
+    return metrics_key
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/extraction.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/extraction.py
new file mode 100644
index 00000000..1f41f8d9
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/extraction.py
@@ -0,0 +1,53 @@
+"""Extraction workload matching the real cloud-demo shape."""
+
+from __future__ import annotations
+
+import io
+
+import numpy as np
+import pyarrow as pa
+import pyarrow.parquet as pq
+from cloud_demo_emulated.workload.aws_client import resolve_s3_endpoint, s3_client
+from cloud_demo_emulated.workload.config import NUM_FRAMES_PER_FILE, S3_DATA_BUCKET, SHARD_COUNT
+
+import ray
+
+
+@ray.remote
+def generate_sensor_shard(
+    shard_id: int, num_frames: int, bucket: str, endpoint: str | None
+) -> dict:
+    rng = np.random.default_rng(shard_id)
+    table = pa.table(
+        {
+            "frame_id": pa.array(range(num_frames)),
+            "position_x": pa.array(rng.normal(0, 1, num_frames).astype(np.float32)),
+            "position_y": pa.array(rng.normal(0, 1, num_frames).astype(np.float32)),
+            "position_z": pa.array(rng.normal(0, 1, num_frames).astype(np.float32)),
+            "depth_mean": pa.array(rng.uniform(1.0, 10.0, num_frames).astype(np.float32)),
+        }
+    )
+    buffer = io.BytesIO()
+    pq.write_table(table, buffer)
+    buffer.seek(0)
+
+    key = f"sensor_data/shard_{shard_id:06d}.parquet"
+    s3 = s3_client(endpoint_url=endpoint)
+    s3.put_object(Bucket=bucket, Key=key, Body=buffer.getvalue())
+    return {"shard_id": shard_id, "key": key}
+
+
+def run_extraction(run_id: str, ray_address: str = "auto") -> list[str]:
+    del run_id
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        endpoint = resolve_s3_endpoint()
+        futures = [
+            generate_sensor_shard.remote(index, NUM_FRAMES_PER_FILE, S3_DATA_BUCKET, endpoint)
+            for index in range(SHARD_COUNT)
+        ]
+        results = ray.get(futures)
+    finally:
+        ray.shutdown()
+    ordered = sorted(results, key=lambda item: int(item["shard_id"]))
+    return [str(item["key"]) for item in ordered]
diff --git a/tests/e2e/ray/jobs/cloud_demo_emulated/workload/training.py b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/training.py
new file mode 100644
index 00000000..14d4090f
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_emulated/workload/training.py
@@ -0,0 +1,58 @@
+"""Training workload matching the real cloud-demo shape."""
+
+from __future__ import annotations
+
+import io
+import json
+
+import pyarrow.parquet as pq
+from cloud_demo_emulated.workload.aws_client import resolve_s3_endpoint, s3_client
+from cloud_demo_emulated.workload.config import NUM_EPOCHS, S3_DATA_BUCKET, S3_MODELS_BUCKET
+
+import ray
+
+
+@ray.remote
+def train_on_shard(
+    shard_key: str,
+    model_state: bytes | None,
+    bucket: str,
+    endpoint: str | None,
+) -> bytes:
+    s3 = s3_client(endpoint_url=endpoint)
+    obj = s3.get_object(Bucket=bucket, Key=shard_key)
+    table = pq.read_table(io.BytesIO(obj["Body"].read()))
+
+    frame_count = table.num_rows
+    position_sum = float(sum(table["position_x"].to_pylist()))
+    prior_weight = 0.0
+    if model_state:
+        prior_weight = float(json.loads(model_state.decode("utf-8"))["weight"])
+    next_state = {
+        "weight": prior_weight + frame_count + position_sum,
+        "frames": frame_count,
+        "source_shard": shard_key,
+    }
+    return json.dumps(next_state).encode("utf-8")
+
+
+def run_training(shard_keys: list[str], run_id: str, ray_address: str = "auto") -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        endpoint = resolve_s3_endpoint()
+        model_state: bytes | None = None
+        for _epoch in range(NUM_EPOCHS):
+            futures = [
+                train_on_shard.remote(shard_key, model_state, S3_DATA_BUCKET, endpoint)
+                for shard_key in shard_keys
+            ]
+            results = ray.get(futures)
+            model_state = results[-1]
+    finally:
+        ray.shutdown()
+
+    assert model_state is not None
+    model_key = f"models/{run_id}/sensor_predictor_final.json"
+    s3 = s3_client(endpoint_url=resolve_s3_endpoint())
+    s3.put_object(Bucket=S3_MODELS_BUCKET, Key=model_key, Body=model_state)
+    return model_key
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/__init__.py b/tests/e2e/ray/jobs/cloud_demo_like/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/__init__.py
@@ -0,0 +1 @@
+
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/main.py b/tests/e2e/ray/jobs/cloud_demo_like/main.py
new file mode 100644
index 00000000..6e38fc9b
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/main.py
@@ -0,0 +1,61 @@
+"""Cloud-demo-shaped Ray pipeline entrypoint for lineage e2e coverage."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import sys
+import time
+import uuid
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parent
+PHASE_SCRIPTS = [
+    ("extraction", APP_DIR / "scripts" / "run_extraction.py"),
+    ("training", APP_DIR / "scripts" / "run_training.py"),
+    ("evaluation", APP_DIR / "scripts" / "run_evaluation.py"),
+]
+
+
+def _run_phase(phase: str, script_path: Path, state_file: Path) -> float:
+    started = time.perf_counter()
+    subprocess.run(
+        [sys.executable, str(script_path), "--state-file", str(state_file)],
+        check=True,
+        cwd=APP_DIR,
+    )
+    elapsed_ms = (time.perf_counter() - started) * 1000
+    print(f"[pipeline:{phase}] {elapsed_ms:.1f}ms")
+    return elapsed_ms
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="cloud-demo-like pipeline")
+    parser.add_argument("--run-id", default=None)
+    args = parser.parse_args()
+
+    run_id = args.run_id or f"run-{uuid.uuid4().hex[:8]}"
+    state_file = Path(f"/tmp/cloud-demo-like-state-{run_id}.json")
+    state_file.parent.mkdir(parents=True, exist_ok=True)
+    state_file.write_text(json.dumps({"run_id": run_id}), encoding="utf-8")
+
+    phase_times: dict[str, float] = {}
+    for phase, script in PHASE_SCRIPTS:
+        phase_times[phase] = _run_phase(phase, script, state_file)
+
+    final_state = json.loads(state_file.read_text(encoding="utf-8"))
+    print(
+        json.dumps(
+            {
+                "script": "cloud_demo_like",
+                "run_id": run_id,
+                "phase_times_ms": phase_times,
+                "report_key": final_state.get("report_key"),
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_evaluation.py b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_evaluation.py
new file mode 100644
index 00000000..c5d07f94
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_evaluation.py
@@ -0,0 +1,31 @@
+"""Cloud-demo-like evaluation phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_like.workload import run_evaluation  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run cloud-demo-like evaluation")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["report_key"] = run_evaluation(list(state.get("model_keys", [])), run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved evaluation state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_extraction.py b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_extraction.py
new file mode 100644
index 00000000..d202584e
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_extraction.py
@@ -0,0 +1,31 @@
+"""Cloud-demo-like extraction phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_like.workload import run_extraction  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run cloud-demo-like extraction")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["processed_keys"] = run_extraction(run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved extraction state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_training.py b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_training.py
new file mode 100644
index 00000000..fce786dc
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/scripts/run_training.py
@@ -0,0 +1,31 @@
+"""Cloud-demo-like training phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from cloud_demo_like.workload import run_training  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run cloud-demo-like training")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["model_keys"] = run_training(list(state.get("processed_keys", [])), run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved training state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/cloud_demo_like/workload.py b/tests/e2e/ray/jobs/cloud_demo_like/workload.py
new file mode 100644
index 00000000..60be5159
--- /dev/null
+++ b/tests/e2e/ray/jobs/cloud_demo_like/workload.py
@@ -0,0 +1,125 @@
+"""Small S3-backed Ray workload with the same phase shape as cloud-demo."""
+
+from __future__ import annotations
+
+import json
+import os
+from urllib.parse import urlparse
+
+import boto3
+
+import ray
+
+DATA_BUCKET = "test-bucket"
+RESULTS_BUCKET = "output-bucket"
+SHARD_COUNT = 3
+
+
+def _s3():
+    return boto3.client(
+        "s3",
+        endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
+        aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+def _parse_s3_uri(uri: str) -> tuple[str, str]:
+    parsed = urlparse(uri)
+    if parsed.scheme != "s3" or not parsed.netloc:
+        raise ValueError(f"Invalid S3 URI: {uri}")
+    return parsed.netloc, parsed.path.lstrip("/")
+
+
+@ray.remote
+def extract_shard(shard_id: int, run_id: str) -> dict[str, object]:
+    s3 = _s3()
+    payload = {
+        "run_id": run_id,
+        "shard_id": shard_id,
+        "values": [shard_id + 1, shard_id + 2, shard_id + 3],
+    }
+    key = f"cloud-demo-like/{run_id}/processed/shard_{shard_id}.json"
+    s3.put_object(Bucket=DATA_BUCKET, Key=key, Body=json.dumps(payload).encode("utf-8"))
+    return {"shard_id": shard_id, "processed_key": f"s3://{DATA_BUCKET}/{key}"}
+
+
+def run_extraction(run_id: str, ray_address: str = "auto") -> list[str]:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        results = ray.get([extract_shard.remote(index, run_id) for index in range(SHARD_COUNT)])
+    finally:
+        ray.shutdown()
+    return [
+        str(item["processed_key"])
+        for item in sorted(results, key=lambda item: int(item["shard_id"]))
+    ]
+
+
+@ray.remote
+def train_on_shard(processed_key: str, run_id: str) -> dict[str, object]:
+    s3 = _s3()
+    bucket, key = _parse_s3_uri(processed_key)
+    payload = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
+    shard_id = int(payload["shard_id"])
+    model = {
+        "run_id": run_id,
+        "shard_id": shard_id,
+        "weight": sum(int(value) for value in payload.get("values", [])),
+    }
+    model_key = f"cloud-demo-like/{run_id}/models/model_{shard_id}.json"
+    s3.put_object(Bucket=RESULTS_BUCKET, Key=model_key, Body=json.dumps(model).encode("utf-8"))
+    return {"shard_id": shard_id, "model_key": f"s3://{RESULTS_BUCKET}/{model_key}"}
+
+
+def run_training(processed_keys: list[str], run_id: str, ray_address: str = "auto") -> list[str]:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        results = ray.get([train_on_shard.remote(key, run_id) for key in processed_keys])
+    finally:
+        ray.shutdown()
+    return [
+        str(item["model_key"]) for item in sorted(results, key=lambda item: int(item["shard_id"]))
+    ]
+
+
+@ray.remote
+def evaluate_shard(model_key: str, run_id: str) -> dict[str, object]:
+    s3 = _s3()
+    bucket, key = _parse_s3_uri(model_key)
+    model = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
+    shard_id = int(model["shard_id"])
+    metrics = {
+        "run_id": run_id,
+        "shard_id": shard_id,
+        "score": float(model["weight"]) / max(shard_id + 1, 1),
+    }
+    metrics_key = f"cloud-demo-like/{run_id}/metrics/metric_{shard_id}.json"
+    s3.put_object(Bucket=RESULTS_BUCKET, Key=metrics_key, Body=json.dumps(metrics).encode("utf-8"))
+    return {
+        "shard_id": shard_id,
+        "metrics_key": f"s3://{RESULTS_BUCKET}/{metrics_key}",
+        "score": metrics["score"],
+    }
+
+
+def run_evaluation(model_keys: list[str], run_id: str, ray_address: str = "auto") -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        results = ray.get([evaluate_shard.remote(key, run_id) for key in model_keys])
+    finally:
+        ray.shutdown()
+
+    report = {
+        "run_id": run_id,
+        "scores": [
+            float(item["score"]) for item in sorted(results, key=lambda item: int(item["shard_id"]))
+        ],
+    }
+    report["avg_score"] = sum(report["scores"]) / max(len(report["scores"]), 1)
+
+    s3 = _s3()
+    report_key = f"cloud-demo-like/{run_id}/results/final_report.json"
+    s3.put_object(Bucket=RESULTS_BUCKET, Key=report_key, Body=json.dumps(report).encode("utf-8"))
+    return f"s3://{RESULTS_BUCKET}/{report_key}"
diff --git a/tests/e2e/ray/jobs/driver_proxy_capture.py b/tests/e2e/ray/jobs/driver_proxy_capture.py
new file mode 100644
index 00000000..12d0a07f
--- /dev/null
+++ b/tests/e2e/ray/jobs/driver_proxy_capture.py
@@ -0,0 +1,42 @@
+"""Driver-only S3 workload to exercise the driver_entrypoint proxy path."""
+
+from __future__ import annotations
+
+import json
+import os
+import uuid
+
+import boto3
+
+
+def main() -> None:
+    endpoint = os.environ.get("AWS_ENDPOINT_URL") or None
+    s3 = boto3.client(
+        "s3",
+        endpoint_url=endpoint,
+        aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name=os.environ.get("AWS_DEFAULT_REGION", "us-east-1"),
+    )
+
+    run_id = uuid.uuid4().hex[:8]
+    key = f"driver/{run_id}/driver_proxy_capture.txt"
+    payload = b"driver proxy capture\n"
+
+    s3.put_object(Bucket="test-bucket", Key=key, Body=payload)
+    body = s3.get_object(Bucket="test-bucket", Key=key)["Body"].read().decode("utf-8")
+
+    print(
+        json.dumps(
+            {
+                "key": key,
+                "body": body,
+                "aws_endpoint_url": os.environ.get("AWS_ENDPOINT_URL", ""),
+                "roar_proxy_port": os.environ.get("ROAR_PROXY_PORT", ""),
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/fragment_latency_probe.py b/tests/e2e/ray/jobs/fragment_latency_probe.py
new file mode 100644
index 00000000..e9ca2971
--- /dev/null
+++ b/tests/e2e/ray/jobs/fragment_latency_probe.py
@@ -0,0 +1,113 @@
+"""Emit S3-backed fragment-like events with timestamps and report latency stats."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import math
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _percentile(values: list[float], pct: float) -> float:
+    if not values:
+        return 0.0
+    ordered = sorted(values)
+    idx = math.ceil((pct / 100.0) * len(ordered)) - 1
+    idx = max(0, min(idx, len(ordered) - 1))
+    return float(ordered[idx])
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+@ray.remote
+def _emit(index: int, run_id: str, bucket: str) -> dict[str, Any]:
+    client = _s3_client()
+    emitted_at_ns = time.time_ns()
+    key = f"fragment-latency/{run_id}/f{index:05d}-{emitted_at_ns}.json"
+    body = json.dumps({"index": index, "emitted_at_ns": emitted_at_ns}).encode("utf-8")
+
+    client.put_object(Bucket=bucket, Key=key, Body=body)
+    response = client.get_object(Bucket=bucket, Key=key)
+    raw = response["Body"].read().decode("utf-8")
+    completed_at_ns = time.time_ns()
+    parsed = json.loads(raw)
+
+    return {
+        "index": index,
+        "node_id": _node_id(),
+        "path": f"s3://{bucket}/{key}",
+        "emitted_at_ns": emitted_at_ns,
+        "completed_at_ns": completed_at_ns,
+        "payload_emitted_at_ns": int(parsed["emitted_at_ns"]),
+    }
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--fragments", type=int, default=200)
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    total = max(1, int(args.fragments))
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "fragment_latency_probe",
+        "run_id": run_id,
+        "bucket": args.bucket,
+        "fragments_requested": total,
+        "records": [],
+        "errors": [],
+    }
+    try:
+        refs = [_emit.remote(index, run_id, str(args.bucket)) for index in range(total)]
+        for ref in refs:
+            try:
+                report["records"].append(ray.get(ref, timeout=120))
+            except Exception as exc:
+                report["errors"].append(str(exc))
+
+        latencies_ms: list[float] = []
+        for item in report["records"]:
+            if not isinstance(item, dict):
+                continue
+            emitted = int(item.get("emitted_at_ns", 0))
+            completed = int(item.get("completed_at_ns", emitted))
+            latencies_ms.append(max(0.0, (completed - emitted) / 1_000_000.0))
+
+        report["latency_ms"] = {
+            "count": len(latencies_ms),
+            "p50": _percentile(latencies_ms, 50),
+            "p95": _percentile(latencies_ms, 95),
+            "min": min(latencies_ms) if latencies_ms else 0.0,
+            "max": max(latencies_ms) if latencies_ms else 0.0,
+        }
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+    return 1 if report["errors"] else 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/native_background_thread_attribution.py b/tests/e2e/ray/jobs/native_background_thread_attribution.py
new file mode 100644
index 00000000..ed70dd38
--- /dev/null
+++ b/tests/e2e/ray/jobs/native_background_thread_attribution.py
@@ -0,0 +1,196 @@
+"""Ray job that reproduces unbound background-thread native I/O attribution."""
+
+from __future__ import annotations
+
+import ctypes
+import json
+import os
+import subprocess
+import textwrap
+import threading
+import time
+from pathlib import Path
+
+import ray
+
+_LIBRARY_SOURCE = textwrap.dedent(
+    r"""
+    #include <fcntl.h>
+    #include <sys/syscall.h>
+    #include <string.h>
+    #include <unistd.h>
+
+    int native_write_file(const char *path, const char *payload, int delay_ms) {
+        int fd = open(path, O_CREAT | O_TRUNC | O_WRONLY, 0644);
+        if (fd < 0) {
+            return 2;
+        }
+
+        if (delay_ms > 0) {
+            usleep((useconds_t)delay_ms * 1000);
+        }
+
+        size_t payload_len = strlen(payload);
+        if (write(fd, payload, payload_len) != (ssize_t)payload_len) {
+            close(fd);
+            return 3;
+        }
+
+        if (close(fd) != 0) {
+            return 4;
+        }
+
+        return 0;
+    }
+
+    int current_native_tid(void) {
+        return (int)syscall(SYS_gettid);
+    }
+    """
+).strip()
+
+
+def _to_text(value: object) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        return value.hex()
+    return str(value)
+
+
+def _build_library() -> Path:
+    source_path = Path.cwd() / "native_background_thread_writer.c"
+    library_path = Path.cwd() / "libnative_background_thread_writer.so"
+    source_path.write_text(_LIBRARY_SOURCE, encoding="utf-8")
+    subprocess.run(
+        [
+            "gcc",
+            "-shared",
+            "-fPIC",
+            "-O2",
+            "-Wall",
+            "-Wextra",
+            "-o",
+            str(library_path),
+            str(source_path),
+        ],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    return library_path
+
+
+@ray.remote(max_concurrency=1)
+class BackgroundThreadNativeWriter:
+    def __init__(self) -> None:
+        library = ctypes.CDLL(str(_build_library()))
+        native_write_file = library.native_write_file
+        native_write_file.argtypes = [ctypes.c_char_p, ctypes.c_char_p, ctypes.c_int]
+        native_write_file.restype = ctypes.c_int
+
+        current_native_tid = library.current_native_tid
+        current_native_tid.argtypes = []
+        current_native_tid.restype = ctypes.c_int
+
+        self._native_write_file = native_write_file
+        self._current_native_tid = current_native_tid
+        self._background_thread: threading.Thread | None = None
+        self._background_meta: dict[str, str] = {}
+
+    def launch_background_write(
+        self, path: str, payload: str, native_delay_ms: int
+    ) -> dict[str, str]:
+        from roar.ray import roar_worker
+
+        target = Path(path)
+        target.parent.mkdir(parents=True, exist_ok=True)
+        started = threading.Event()
+        self._background_meta = {}
+
+        def _run() -> None:
+            python_thread_id = threading.get_native_id()
+            native_thread_id = self._current_native_tid()
+            self._background_meta = {
+                "path": str(target),
+                "background_thread_id": str(python_thread_id),
+                "native_thread_id": str(native_thread_id),
+                "pre_write_bound_task_id": roar_worker._bound_native_task_id_for_event(
+                    os.getpid(),
+                    python_thread_id,
+                ),
+            }
+            started.set()
+
+            rc = self._native_write_file(
+                str(target).encode("utf-8"),
+                payload.encode("utf-8"),
+                native_delay_ms,
+            )
+            self._background_meta["returncode"] = str(rc)
+
+        background_thread = threading.Thread(
+            target=_run,
+            name="native-background-writer",
+        )
+        self._background_thread = background_thread
+        background_thread.start()
+        if not started.wait(timeout=10):
+            raise RuntimeError("background native writer thread did not start in time")
+
+        ctx = ray.get_runtime_context()
+        return {
+            "path": str(target),
+            "ld_preload": os.environ.get("LD_PRELOAD", ""),
+            "trace_sock": os.environ.get("ROAR_PRELOAD_TRACE_SOCK", ""),
+            "worker_id": _to_text(ctx.get_worker_id()),
+            "task_id": _to_text(ctx.get_task_id()),
+            "launch_thread_id": str(threading.get_native_id()),
+            "background_thread_id": self._background_meta.get("background_thread_id", ""),
+            "native_thread_id": self._background_meta.get("native_thread_id", ""),
+            "pre_write_bound_task_id": self._background_meta.get("pre_write_bound_task_id", ""),
+        }
+
+    def block_on_next_task(self, marker_path: str, sleep_seconds: float) -> dict[str, str]:
+        target = Path(marker_path)
+        target.parent.mkdir(parents=True, exist_ok=True)
+        target.write_text("python marker\n", encoding="utf-8")
+        time.sleep(sleep_seconds)
+
+        ctx = ray.get_runtime_context()
+        return {
+            "path": str(target),
+            "worker_id": _to_text(ctx.get_worker_id()),
+            "task_id": _to_text(ctx.get_task_id()),
+        }
+
+    def wait_for_background(self) -> dict[str, str]:
+        if self._background_thread is None:
+            return {}
+        self._background_thread.join(timeout=10)
+        if self._background_thread.is_alive():
+            raise RuntimeError("background native writer thread did not finish in time")
+        return dict(self._background_meta)
+
+
+def main() -> None:
+    ray.init(address="auto")
+
+    actor = BackgroundThreadNativeWriter.options(num_cpus=1).remote()
+    native_path = str(Path.cwd() / "artifacts" / "native_background_thread_output.txt")
+    marker_path = str(Path.cwd() / "artifacts" / "native_background_thread_marker.txt")
+
+    launch_ref = actor.launch_background_write.remote(
+        native_path,
+        "background native thread\n",
+        600,
+    )
+    block_ref = actor.block_on_next_task.remote(marker_path, 1.2)
+    waited_ref = actor.wait_for_background.remote()
+
+    launch, block, waited = ray.get([launch_ref, block_ref, waited_ref])
+    print(json.dumps({"launch": launch, "block": block, "waited": waited}))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/native_library_tracing.py b/tests/e2e/ray/jobs/native_library_tracing.py
new file mode 100644
index 00000000..ad6f0f5f
--- /dev/null
+++ b/tests/e2e/ray/jobs/native_library_tracing.py
@@ -0,0 +1,107 @@
+"""Ray job that performs in-process native I/O through a compiled shared library."""
+
+from __future__ import annotations
+
+import ctypes
+import json
+import os
+import subprocess
+import textwrap
+from pathlib import Path
+
+import ray
+
+_LIBRARY_SOURCE = textwrap.dedent(
+    r"""
+    #include <fcntl.h>
+    #include <string.h>
+    #include <unistd.h>
+
+    int native_write_file(const char *path) {
+        int fd = open(path, O_CREAT | O_TRUNC | O_WRONLY, 0644);
+        if (fd < 0) {
+            return 2;
+        }
+
+        const char *payload = "native library output\n";
+        size_t payload_len = strlen(payload);
+        if (write(fd, payload, payload_len) != (ssize_t)payload_len) {
+            close(fd);
+            return 3;
+        }
+
+        if (close(fd) != 0) {
+            return 4;
+        }
+
+        return 0;
+    }
+    """
+).strip()
+
+
+def _to_text(value: object) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        return value.hex()
+    return str(value)
+
+
+def _build_library() -> Path:
+    source_path = Path.cwd() / "native_writer_library.c"
+    library_path = Path.cwd() / "libnative_writer.so"
+    source_path.write_text(_LIBRARY_SOURCE, encoding="utf-8")
+    subprocess.run(
+        [
+            "gcc",
+            "-shared",
+            "-fPIC",
+            "-O2",
+            "-Wall",
+            "-Wextra",
+            "-o",
+            str(library_path),
+            str(source_path),
+        ],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    return library_path
+
+
+@ray.remote
+def write_via_native_library(path: str) -> dict[str, str]:
+    target = Path(path)
+    target.parent.mkdir(parents=True, exist_ok=True)
+
+    library = ctypes.CDLL(str(_build_library()))
+    native_write_file = library.native_write_file
+    native_write_file.argtypes = [ctypes.c_char_p]
+    native_write_file.restype = ctypes.c_int
+
+    rc = native_write_file(str(target).encode("utf-8"))
+    if rc != 0:
+        raise RuntimeError(f"native_write_file failed with exit code {rc}")
+
+    ctx = ray.get_runtime_context()
+    return {
+        "path": str(target),
+        "ld_preload": os.environ.get("LD_PRELOAD", ""),
+        "trace_sock": os.environ.get("ROAR_PRELOAD_TRACE_SOCK", ""),
+        "worker_id": _to_text(ctx.get_worker_id()),
+        "task_id": _to_text(ctx.get_task_id()),
+    }
+
+
+def main() -> None:
+    ray.init(address="auto")
+    payload = ray.get(
+        write_via_native_library.remote(str(Path.cwd() / "artifacts" / "native_library_output.txt"))
+    )
+    print(json.dumps(payload))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/native_task_attribution.py b/tests/e2e/ray/jobs/native_task_attribution.py
new file mode 100644
index 00000000..081e045a
--- /dev/null
+++ b/tests/e2e/ray/jobs/native_task_attribution.py
@@ -0,0 +1,157 @@
+"""Ray job that reproduces delayed native child I/O task attribution."""
+
+from __future__ import annotations
+
+import json
+import os
+import subprocess
+import textwrap
+import time
+from pathlib import Path
+
+import ray
+
+_HELPER_SOURCE = textwrap.dedent(
+    r"""
+    #include <stdio.h>
+    #include <stdlib.h>
+    #include <string.h>
+    #include <unistd.h>
+
+    int main(int argc, char **argv) {
+        if (argc != 3) {
+            fprintf(stderr, "usage: %s <path> <delay_ms>\n", argv[0]);
+            return 2;
+        }
+
+        const char *path = argv[1];
+        int delay_ms = atoi(argv[2]);
+        if (delay_ms > 0) {
+            usleep((useconds_t)delay_ms * 1000);
+        }
+
+        FILE *handle = fopen(path, "wb");
+        if (handle == NULL) {
+            perror("fopen");
+            return 3;
+        }
+
+        const char *payload = "native child output\n";
+        size_t payload_len = strlen(payload);
+        if (fwrite(payload, 1, payload_len, handle) != payload_len) {
+            perror("fwrite");
+            fclose(handle);
+            return 4;
+        }
+
+        if (fclose(handle) != 0) {
+            perror("fclose");
+            return 5;
+        }
+
+        return 0;
+    }
+    """
+).strip()
+
+
+def _to_text(value: object) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        return value.hex()
+    return str(value)
+
+
+@ray.remote(max_concurrency=1)
+class NativeAttributionActor:
+    def __init__(self) -> None:
+        self._helper_path = self._build_helper()
+        self._children: list[subprocess.Popen[bytes]] = []
+
+    def _build_helper(self) -> str:
+        source_path = Path.cwd() / "native_writer_helper.c"
+        binary_path = Path.cwd() / "native_writer_helper"
+        source_path.write_text(_HELPER_SOURCE, encoding="utf-8")
+        subprocess.run(
+            ["gcc", "-O2", "-Wall", "-Wextra", "-o", str(binary_path), str(source_path)],
+            check=True,
+            capture_output=True,
+            text=True,
+        )
+        return str(binary_path)
+
+    def launch_delayed_native_write(self, path: str, delay_ms: int) -> dict[str, str]:
+        target = Path(path)
+        target.parent.mkdir(parents=True, exist_ok=True)
+
+        child = subprocess.Popen(
+            [self._helper_path, str(target), str(delay_ms)],
+            env=os.environ.copy(),
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.PIPE,
+        )
+        self._children.append(child)
+
+        ctx = ray.get_runtime_context()
+        return {
+            "path": str(target),
+            "ld_preload": os.environ.get("LD_PRELOAD", ""),
+            "trace_sock": os.environ.get("ROAR_PRELOAD_TRACE_SOCK", ""),
+            "worker_id": _to_text(ctx.get_worker_id()),
+            "task_id": _to_text(ctx.get_task_id()),
+            "child_pid": str(child.pid),
+        }
+
+    def block_on_next_task(self, path: str, sleep_seconds: float) -> dict[str, str]:
+        target = Path(path)
+        target.parent.mkdir(parents=True, exist_ok=True)
+        target.write_text("python marker\n", encoding="utf-8")
+        time.sleep(sleep_seconds)
+
+        ctx = ray.get_runtime_context()
+        return {
+            "path": str(target),
+            "worker_id": _to_text(ctx.get_worker_id()),
+            "task_id": _to_text(ctx.get_task_id()),
+        }
+
+    def wait_for_children(self) -> dict[str, object]:
+        results: list[dict[str, object]] = []
+        while self._children:
+            child = self._children.pop(0)
+            _, stderr = child.communicate(timeout=5)
+            results.append(
+                {
+                    "pid": child.pid,
+                    "returncode": child.returncode,
+                    "stderr": stderr.decode("utf-8", errors="replace").strip(),
+                }
+            )
+        return {"children": results}
+
+
+def main() -> None:
+    ray.init(address="auto")
+
+    actor = NativeAttributionActor.options(num_cpus=1).remote()
+    native_path = str(Path.cwd() / "artifacts" / "native_task_output.txt")
+    marker_path = str(Path.cwd() / "artifacts" / "native_task_marker.txt")
+
+    launch = ray.get(actor.launch_delayed_native_write.remote(native_path, 400))
+    block = ray.get(actor.block_on_next_task.remote(marker_path, 1.0))
+    waited = ray.get(actor.wait_for_children.remote())
+
+    print(
+        json.dumps(
+            {
+                "launch": launch,
+                "block": block,
+                "waited": waited,
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/native_thread_attribution.py b/tests/e2e/ray/jobs/native_thread_attribution.py
new file mode 100644
index 00000000..ce4e155c
--- /dev/null
+++ b/tests/e2e/ray/jobs/native_thread_attribution.py
@@ -0,0 +1,163 @@
+"""Ray job that forces concurrent same-process native writes on one actor worker."""
+
+from __future__ import annotations
+
+import ctypes
+import json
+import os
+import subprocess
+import textwrap
+import threading
+import time
+from pathlib import Path
+
+import ray
+
+_LIBRARY_SOURCE = textwrap.dedent(
+    r"""
+    #include <fcntl.h>
+    #include <sys/syscall.h>
+    #include <string.h>
+    #include <unistd.h>
+
+    int native_write_file(const char *path, const char *payload, int delay_ms) {
+        int fd = open(path, O_CREAT | O_TRUNC | O_WRONLY, 0644);
+        if (fd < 0) {
+            return 2;
+        }
+
+        if (delay_ms > 0) {
+            usleep((useconds_t)delay_ms * 1000);
+        }
+
+        size_t payload_len = strlen(payload);
+        if (write(fd, payload, payload_len) != (ssize_t)payload_len) {
+            close(fd);
+            return 3;
+        }
+
+        if (close(fd) != 0) {
+            return 4;
+        }
+
+        return 0;
+    }
+
+    int current_native_tid(void) {
+        return (int)syscall(SYS_gettid);
+    }
+    """
+).strip()
+
+
+def _to_text(value: object) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        return value.hex()
+    return str(value)
+
+
+def _build_library() -> Path:
+    source_path = Path.cwd() / "native_thread_writer.c"
+    library_path = Path.cwd() / "libnative_thread_writer.so"
+    source_path.write_text(_LIBRARY_SOURCE, encoding="utf-8")
+    subprocess.run(
+        [
+            "gcc",
+            "-shared",
+            "-fPIC",
+            "-O2",
+            "-Wall",
+            "-Wextra",
+            "-o",
+            str(library_path),
+            str(source_path),
+        ],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    return library_path
+
+
+@ray.remote(max_concurrency=2)
+class ThreadedNativeWriter:
+    def __init__(self) -> None:
+        library = ctypes.CDLL(str(_build_library()))
+        native_write_file = library.native_write_file
+        native_write_file.argtypes = [ctypes.c_char_p, ctypes.c_char_p, ctypes.c_int]
+        native_write_file.restype = ctypes.c_int
+
+        current_native_tid = library.current_native_tid
+        current_native_tid.argtypes = []
+        current_native_tid.restype = ctypes.c_int
+
+        self._native_write_file = native_write_file
+        self._current_native_tid = current_native_tid
+        self._start_barrier = threading.Barrier(2)
+        self._finish_barrier = threading.Barrier(2)
+
+    def write(
+        self, path: str, payload: str, native_delay_ms: int, return_delay_ms: int
+    ) -> dict[str, str]:
+        from roar.ray import roar_worker
+
+        target = Path(path)
+        target.parent.mkdir(parents=True, exist_ok=True)
+        python_thread_id = threading.get_native_id()
+        native_thread_id = self._current_native_tid()
+        pre_write_bound_task_id = roar_worker._bound_native_task_id_for_event(
+            os.getpid(),
+            python_thread_id,
+        )
+
+        self._start_barrier.wait(timeout=30)
+        rc = self._native_write_file(
+            str(target).encode("utf-8"),
+            payload.encode("utf-8"),
+            native_delay_ms,
+        )
+        if rc != 0:
+            raise RuntimeError(f"native_write_file failed with exit code {rc}")
+
+        self._finish_barrier.wait(timeout=30)
+        if return_delay_ms > 0:
+            time.sleep(return_delay_ms / 1000.0)
+
+        ctx = ray.get_runtime_context()
+        return {
+            "path": str(target),
+            "ld_preload": os.environ.get("LD_PRELOAD", ""),
+            "trace_sock": os.environ.get("ROAR_PRELOAD_TRACE_SOCK", ""),
+            "worker_id": _to_text(ctx.get_worker_id()),
+            "task_id": _to_text(ctx.get_task_id()),
+            "thread_id": str(python_thread_id),
+            "native_thread_id": str(native_thread_id),
+            "pre_write_bound_task_id": pre_write_bound_task_id,
+        }
+
+
+def main() -> None:
+    ray.init(address="auto")
+
+    writer = ThreadedNativeWriter.remote()
+    fast_ref = writer.write.remote(
+        str(Path.cwd() / "artifacts" / "native_thread_fast.txt"),
+        "fast native thread\n",
+        0,
+        0,
+    )
+    slow_ref = writer.write.remote(
+        str(Path.cwd() / "artifacts" / "native_thread_slow.txt"),
+        "slow native thread\n",
+        0,
+        250,
+    )
+
+    fast, slow = ray.get([fast_ref, slow_ref])
+    print(json.dumps({"fast": fast, "slow": slow}))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/native_tracing.py b/tests/e2e/ray/jobs/native_tracing.py
index e5b70750..45ac6296 100644
--- a/tests/e2e/ray/jobs/native_tracing.py
+++ b/tests/e2e/ray/jobs/native_tracing.py
@@ -1,27 +1,43 @@
-"""Ray job that verifies native preload env wiring in workers."""
+"""Ray job that reports worker native-tracing activation and writes a local file."""
 
 from __future__ import annotations
 
 import json
 import os
+from pathlib import Path
 
 import ray
 
 
+def _to_text(value: object) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        return value.hex()
+    return str(value)
+
+
 @ray.remote
 def write_and_report(path: str) -> dict[str, str]:
+    Path(path).parent.mkdir(parents=True, exist_ok=True)
     with open(path, "w", encoding="utf-8") as handle:
         handle.write("native tracing smoke\n")
+    ctx = ray.get_runtime_context()
     return {
         "path": path,
         "ld_preload": os.environ.get("LD_PRELOAD", ""),
+        "trace_sock": os.environ.get("ROAR_PRELOAD_TRACE_SOCK", ""),
         "aws_endpoint_url": os.environ.get("AWS_ENDPOINT_URL", ""),
+        "worker_id": _to_text(ctx.get_worker_id()),
+        "task_id": _to_text(ctx.get_task_id()),
     }
 
 
 def main() -> None:
     ray.init(address="auto")
-    payload = ray.get(write_and_report.remote("/shared/native_tracing_output.txt"))
+    payload = ray.get(
+        write_and_report.remote(str(Path.cwd() / "artifacts" / "native_tracing_output.txt"))
+    )
     print(json.dumps(payload))
 
 
diff --git a/tests/e2e/ray/jobs/nested_subprocess_s3_lineage.py b/tests/e2e/ray/jobs/nested_subprocess_s3_lineage.py
new file mode 100644
index 00000000..27a30aeb
--- /dev/null
+++ b/tests/e2e/ray/jobs/nested_subprocess_s3_lineage.py
@@ -0,0 +1,34 @@
+"""Nested subprocess Ray job for task-scoped S3 lineage coverage."""
+
+from __future__ import annotations
+
+import json
+import subprocess
+import sys
+import uuid
+from pathlib import Path
+
+
+def main() -> None:
+    run_id = f"run-{uuid.uuid4().hex[:8]}"
+    app_dir = Path(__file__).resolve().parent
+    worker_script = app_dir / "nested_subprocess_s3_worker.py"
+    result = subprocess.run(
+        [sys.executable, str(worker_script), "--run-id", run_id],
+        check=True,
+        capture_output=True,
+        text=True,
+        cwd=app_dir,
+    )
+    payload = {}
+    for line in reversed(result.stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped.startswith("{"):
+            continue
+        payload = json.loads(stripped)
+        break
+    print(json.dumps({"run_id": run_id, **payload}, sort_keys=True))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/nested_subprocess_s3_worker.py b/tests/e2e/ray/jobs/nested_subprocess_s3_worker.py
new file mode 100644
index 00000000..c45e68aa
--- /dev/null
+++ b/tests/e2e/ray/jobs/nested_subprocess_s3_worker.py
@@ -0,0 +1,45 @@
+"""Nested subprocess worker that performs one remote S3 write."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client(
+        "s3",
+        endpoint_url=os.environ.get("AWS_ENDPOINT_URL"),
+        aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+@ray.remote
+def write_s3(run_id: str) -> str:
+    key = f"nested-subprocess/{run_id}/data.txt"
+    _s3_client().put_object(Bucket="test-bucket", Key=key, Body=b"nested subprocess")
+    return f"s3://test-bucket/{key}"
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--run-id", required=True)
+    args = parser.parse_args()
+
+    ray.init(address="auto")
+    try:
+        output_uri = ray.get(write_s3.remote(str(args.run_id)))
+    finally:
+        ray.shutdown()
+    print(json.dumps({"output_uri": output_uri}, sort_keys=True))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/__init__.py b/tests/e2e/ray/jobs/phase_lineage_contract/__init__.py
new file mode 100644
index 00000000..8b137891
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/__init__.py
@@ -0,0 +1 @@
+
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/main.py b/tests/e2e/ray/jobs/phase_lineage_contract/main.py
new file mode 100644
index 00000000..71c60f9e
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/main.py
@@ -0,0 +1,63 @@
+"""Three-phase Ray pipeline for lineage contract coverage."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import sys
+import time
+import uuid
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parent
+PHASE_SCRIPTS = [
+    ("extraction", APP_DIR / "scripts" / "run_extraction.py"),
+    ("training", APP_DIR / "scripts" / "run_training.py"),
+    ("evaluation", APP_DIR / "scripts" / "run_evaluation.py"),
+]
+
+
+def _run_phase(phase: str, script_path: Path, state_file: Path) -> float:
+    started = time.perf_counter()
+    subprocess.run(
+        [sys.executable, str(script_path), "--state-file", str(state_file)],
+        check=True,
+        cwd=APP_DIR,
+    )
+    elapsed_ms = (time.perf_counter() - started) * 1000
+    print(f"[phase:{phase}] {elapsed_ms:.1f}ms")
+    return elapsed_ms
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="phase-lineage-contract")
+    parser.add_argument("--run-id", default=None)
+    args = parser.parse_args()
+
+    run_id = args.run_id or f"run-{uuid.uuid4().hex[:8]}"
+    state_file = Path(f"/tmp/phase-lineage-contract-{run_id}.json")
+    state_file.parent.mkdir(parents=True, exist_ok=True)
+    state_file.write_text(json.dumps({"run_id": run_id}), encoding="utf-8")
+
+    phase_times: dict[str, float] = {}
+    for phase, script in PHASE_SCRIPTS:
+        phase_times[phase] = _run_phase(phase, script, state_file)
+
+    final_state = json.loads(state_file.read_text(encoding="utf-8"))
+    print(
+        json.dumps(
+            {
+                "script": "phase_lineage_contract",
+                "run_id": run_id,
+                "processed_key": final_state.get("processed_key"),
+                "model_key": final_state.get("model_key"),
+                "report_key": final_state.get("report_key"),
+                "phase_times_ms": phase_times,
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_evaluation.py b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_evaluation.py
new file mode 100644
index 00000000..f02d4329
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_evaluation.py
@@ -0,0 +1,32 @@
+"""Phase-lineage evaluation phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from phase_lineage_contract.workload import run_evaluation  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run phase-lineage evaluation")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    model_key = str(state["model_key"])
+    state["report_key"] = run_evaluation(model_key, run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved evaluation state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_extraction.py b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_extraction.py
new file mode 100644
index 00000000..5c824170
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_extraction.py
@@ -0,0 +1,31 @@
+"""Phase-lineage extraction phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from phase_lineage_contract.workload import run_extraction  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run phase-lineage extraction")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    state["processed_key"] = run_extraction(run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved extraction state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_training.py b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_training.py
new file mode 100644
index 00000000..16066ffd
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/scripts/run_training.py
@@ -0,0 +1,32 @@
+"""Phase-lineage training phase wrapper."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import sys
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parents[1]
+if str(APP_DIR.parent) not in sys.path:
+    sys.path.insert(0, str(APP_DIR.parent))
+
+from phase_lineage_contract.workload import run_training  # noqa: E402
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Run phase-lineage training")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+    processed_key = str(state["processed_key"])
+    state["model_key"] = run_training(processed_key, run_id)
+    state_path.write_text(json.dumps(state), encoding="utf-8")
+    print(f"Saved training state to {state_path}")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/phase_lineage_contract/workload.py b/tests/e2e/ray/jobs/phase_lineage_contract/workload.py
new file mode 100644
index 00000000..71ffc04a
--- /dev/null
+++ b/tests/e2e/ray/jobs/phase_lineage_contract/workload.py
@@ -0,0 +1,98 @@
+"""Small single-path Ray workload with explicit extract/train/evaluate phases."""
+
+from __future__ import annotations
+
+import json
+import os
+from urllib.parse import urlparse
+
+import boto3
+
+import ray
+
+DATA_BUCKET = "test-bucket"
+RESULTS_BUCKET = "output-bucket"
+
+
+def _s3():
+    return boto3.client(
+        "s3",
+        endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
+        aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+def _parse_s3_uri(uri: str) -> tuple[str, str]:
+    parsed = urlparse(uri)
+    if parsed.scheme != "s3" or not parsed.netloc:
+        raise ValueError(f"Invalid S3 URI: {uri}")
+    return parsed.netloc, parsed.path.lstrip("/")
+
+
+@ray.remote
+def extract_dataset(run_id: str) -> str:
+    s3 = _s3()
+    payload = {
+        "run_id": run_id,
+        "records": [2, 4, 6, 8],
+        "source": "synthetic",
+    }
+    key = f"phase-lineage/{run_id}/processed/features.json"
+    s3.put_object(Bucket=DATA_BUCKET, Key=key, Body=json.dumps(payload).encode("utf-8"))
+    return f"s3://{DATA_BUCKET}/{key}"
+
+
+def run_extraction(run_id: str, ray_address: str = "auto") -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        return str(ray.get(extract_dataset.remote(run_id)))
+    finally:
+        ray.shutdown()
+
+
+@ray.remote
+def train_model(processed_key: str, run_id: str) -> str:
+    s3 = _s3()
+    bucket, key = _parse_s3_uri(processed_key)
+    payload = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
+    model = {
+        "run_id": run_id,
+        "record_count": len(payload.get("records", [])),
+        "weight": sum(int(value) for value in payload.get("records", [])),
+    }
+    model_key = f"phase-lineage/{run_id}/models/model.json"
+    s3.put_object(Bucket=RESULTS_BUCKET, Key=model_key, Body=json.dumps(model).encode("utf-8"))
+    return f"s3://{RESULTS_BUCKET}/{model_key}"
+
+
+def run_training(processed_key: str, run_id: str, ray_address: str = "auto") -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        return str(ray.get(train_model.remote(processed_key, run_id)))
+    finally:
+        ray.shutdown()
+
+
+@ray.remote
+def evaluate_model(model_key: str, run_id: str) -> str:
+    s3 = _s3()
+    bucket, key = _parse_s3_uri(model_key)
+    model = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
+    report = {
+        "run_id": run_id,
+        "score": float(model["weight"]) / max(int(model["record_count"]), 1),
+        "status": "ok",
+    }
+    report_key = f"phase-lineage/{run_id}/reports/final_report.json"
+    s3.put_object(Bucket=RESULTS_BUCKET, Key=report_key, Body=json.dumps(report).encode("utf-8"))
+    return f"s3://{RESULTS_BUCKET}/{report_key}"
+
+
+def run_evaluation(model_key: str, run_id: str, ray_address: str = "auto") -> str:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        return str(ray.get(evaluate_model.remote(model_key, run_id)))
+    finally:
+        ray.shutdown()
diff --git a/tests/e2e/ray/jobs/pipeline.py b/tests/e2e/ray/jobs/pipeline.py
index 02a054fa..4ac15a87 100644
--- a/tests/e2e/ray/jobs/pipeline.py
+++ b/tests/e2e/ray/jobs/pipeline.py
@@ -2,55 +2,45 @@
 
 from __future__ import annotations
 
+from pathlib import Path
+
 import pandas as pd
 
 import ray
 
 
 @ray.remote
-def extract(input_path: str) -> list[dict[str, object]]:
-    frame = pd.read_csv(input_path)
-    return frame.to_dict(orient="records")
+def run_pipeline(base_dir: str) -> dict[str, str]:
+    base_path = Path(base_dir)
+    base_path.mkdir(parents=True, exist_ok=True)
+    input_path = base_path / "pipeline_input.csv"
+    output_path = base_path / "pipeline_output.parquet"
 
+    pd.DataFrame(
+        [
+            {"id": 1, "value": 10},
+            {"id": 2, "value": 20},
+            {"id": 3, "value": 30},
+        ]
+    ).to_csv(input_path, index=False)
 
-@ray.remote
-def transform(records: list[dict[str, object]]) -> list[dict[str, object]]:
+    frame = pd.read_csv(input_path)
     transformed: list[dict[str, object]] = []
-    for record in records:
+    for record in frame.to_dict(orient="records"):
         updated = dict(record)
         value = updated.get("value")
         if isinstance(value, (int, float)) and not isinstance(value, bool):
             updated["value"] = value * 2
         transformed.append(updated)
-    return transformed
-
-
-@ray.remote
-def load(records: list[dict[str, object]], output_path: str) -> str:
-    frame = pd.DataFrame.from_records(records)
+    frame = pd.DataFrame.from_records(transformed)
     frame.to_parquet(output_path, index=False)
-    return output_path
+    return {"input_path": str(input_path), "output_path": str(output_path)}
 
 
 def main() -> None:
     ray.init(address="auto")
-
-    input_path = "/shared/pipeline_input.csv"
-    output_path = "/shared/pipeline_output.parquet"
-
-    pd.DataFrame(
-        [
-            {"id": 1, "value": 10},
-            {"id": 2, "value": 20},
-            {"id": 3, "value": 30},
-        ]
-    ).to_csv(input_path, index=False)
-
-    records = ray.get(extract.remote(input_path))
-    transformed = ray.get(transform.remote(records))
-    result = ray.get(load.remote(transformed, output_path))
-
-    print(result)
+    result = ray.get(run_pipeline.remote(str(Path.cwd() / "artifacts" / "pipeline")))
+    print(result["output_path"])
 
 
 if __name__ == "__main__":
diff --git a/tests/e2e/ray/jobs/proxy_reachability_probe.py b/tests/e2e/ray/jobs/proxy_reachability_probe.py
new file mode 100644
index 00000000..f6517daf
--- /dev/null
+++ b/tests/e2e/ray/jobs/proxy_reachability_probe.py
@@ -0,0 +1,58 @@
+"""Probe: verify the proxy endpoint injected via AWS_ENDPOINT_URL is reachable from workers.
+
+Each Ray task reports whether it can connect to the URL in AWS_ENDPOINT_URL.
+This exposes the worker-proxy routing bug: if AWS_ENDPOINT_URL is set to
+http://127.0.0.1:19191 (the host machine's proxy), workers running in separate
+processes/containers cannot reach it and the probe returns reachable=False.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import socket
+import urllib.parse
+
+import ray
+
+
+@ray.remote
+def check_proxy_endpoint() -> dict[str, object]:
+    """Return connectivity info for the proxy endpoint this worker sees."""
+    endpoint = os.environ.get("AWS_ENDPOINT_URL", "")
+    raw_node_id = ray.get_runtime_context().get_node_id()
+    node_id = raw_node_id.hex() if hasattr(raw_node_id, "hex") else str(raw_node_id)
+
+    if not endpoint:
+        return {
+            "endpoint": None,
+            "reachable": False,
+            "error": "AWS_ENDPOINT_URL not set",
+            "node_id": node_id,
+        }
+
+    parsed = urllib.parse.urlparse(endpoint)
+    host = parsed.hostname or ""
+    port = parsed.port or 80
+
+    try:
+        with socket.create_connection((host, port), timeout=3):
+            pass
+        return {"endpoint": endpoint, "reachable": True, "error": None, "node_id": node_id}
+    except OSError as exc:
+        return {"endpoint": endpoint, "reachable": False, "error": str(exc), "node_id": node_id}
+
+
+def main() -> None:
+    ray.init(address="auto")
+    try:
+        # Scatter across workers — use enough tasks to hit all nodes.
+        futures = [check_proxy_endpoint.remote() for _ in range(6)]
+        results = ray.get(futures)
+        print(json.dumps({"results": results}))
+    finally:
+        ray.shutdown()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/ray_data_pipeline.py b/tests/e2e/ray/jobs/ray_data_pipeline.py
index 4c59e4c5..94a8ad26 100644
--- a/tests/e2e/ray/jobs/ray_data_pipeline.py
+++ b/tests/e2e/ray/jobs/ray_data_pipeline.py
@@ -4,6 +4,7 @@
 
 import json
 import os
+from pathlib import Path
 
 import ray
 
@@ -22,8 +23,10 @@ def run_pipeline(input_path: str, output_dir: str) -> str:
 def main() -> None:
     ray.init(address="auto")
 
-    input_path = "/shared/ray_data_input.csv"
-    output_dir = "/shared/ray_data_output"
+    base_dir = Path.cwd() / "artifacts" / "ray_data"
+    base_dir.mkdir(parents=True, exist_ok=True)
+    input_path = str(base_dir / "ray_data_input.csv")
+    output_dir = str(base_dir / "ray_data_output")
     os.makedirs(output_dir, exist_ok=True)
 
     with open(input_path, "w", encoding="utf-8") as handle:
diff --git a/tests/e2e/ray/jobs/roar_diagnostic_probe.py b/tests/e2e/ray/jobs/roar_diagnostic_probe.py
new file mode 100644
index 00000000..7516c307
--- /dev/null
+++ b/tests/e2e/ray/jobs/roar_diagnostic_probe.py
@@ -0,0 +1,314 @@
+"""Consolidated Ray diagnostics for node agents, proxy env, collector, and binary."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import inspect
+import json
+import os
+import platform
+import shutil
+import subprocess
+import sys
+from typing import Any
+
+import ray
+
+
+def _to_text(value: Any) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        try:
+            return value.hex()
+        except Exception:
+            return value.decode("utf-8", errors="ignore")
+    return str(value)
+
+
+def _current_node_id() -> str:
+    try:
+        return _to_text(ray.get_runtime_context().get_node_id())
+    except Exception:
+        return ""
+
+
+def _node_resource_key(node: dict[str, Any]) -> str:
+    resources = node.get("Resources", {})
+    if not isinstance(resources, dict):
+        return ""
+    for key in resources:
+        key_text = str(key)
+        if key_text.startswith("node:"):
+            return key_text
+    return ""
+
+
+def _alive_nodes() -> list[dict[str, str]]:
+    out: list[dict[str, str]] = []
+    for node in ray.nodes():
+        if not isinstance(node, dict) or not node.get("Alive"):
+            continue
+        node_id = _to_text(node.get("NodeID"))
+        if not node_id:
+            continue
+        out.append(
+            {
+                "node_id": node_id,
+                "node_ip": _to_text(node.get("NodeManagerAddress")),
+                "node_resource": _node_resource_key(node),
+            }
+        )
+    return out
+
+
+def _list_actors() -> list[dict[str, str]]:
+    try:
+        from ray.util import state
+
+        actors = state.list_actors(detail=True)
+    except Exception:
+        return []
+
+    out: list[dict[str, str]] = []
+    for actor in actors:
+        if not isinstance(actor, dict):
+            continue
+        out.append(
+            {
+                "name": _to_text(actor.get("name")),
+                "state": _to_text(actor.get("state")),
+                "class_name": _to_text(actor.get("class_name")),
+            }
+        )
+    return out
+
+
+@ray.remote(num_cpus=0)
+def _worker_probe(check_binary: bool = False) -> dict[str, Any]:
+    binary_path = shutil.which("roar-proxy")
+    start_ok = False
+    start_code: int | None = None
+    if check_binary and binary_path:
+        try:
+            result = subprocess.run(
+                [binary_path, "--help"],
+                check=False,
+                capture_output=True,
+                text=True,
+                timeout=5,
+            )
+            start_code = int(result.returncode)
+            start_ok = result.returncode == 0
+        except Exception:
+            start_ok = False
+            start_code = None
+
+    return {
+        "node_id": _current_node_id(),
+        "aws_endpoint_url": os.getenv("AWS_ENDPOINT_URL", ""),
+        "platform_system": platform.system(),
+        "platform_machine": platform.machine(),
+        "proxy_binary_path": binary_path or "",
+        "proxy_binary_found": bool(binary_path),
+        "proxy_start_ok": start_ok,
+        "proxy_start_code": start_code,
+    }
+
+
+def _run_per_node_worker_probe(check_binary: bool) -> list[dict[str, Any]]:
+    nodes = _alive_nodes()
+    scheduled: list[tuple[str, ray.ObjectRef]] = []
+    for node in nodes:
+        options: dict[str, Any] = {"num_cpus": 0}
+        node_resource = node.get("node_resource", "")
+        if isinstance(node_resource, str) and node_resource:
+            options["resources"] = {node_resource: 0.001}
+        ref = _worker_probe.options(**options).remote(check_binary=check_binary)
+        scheduled.append((str(node.get("node_id", "")), ref))
+
+    out: list[dict[str, Any]] = []
+    for expected_node_id, ref in scheduled:
+        try:
+            payload = ray.get(ref, timeout=30)
+            if isinstance(payload, dict):
+                payload.setdefault("expected_node_id", expected_node_id)
+                out.append(payload)
+            else:
+                out.append({"expected_node_id": expected_node_id, "error": "non-dict payload"})
+        except Exception as exc:
+            out.append({"expected_node_id": expected_node_id, "error": str(exc)})
+    return out
+
+
+def _agent_name(job_id: str, node_id: str) -> str:
+    return f"roar-node-agent-{job_id}-{str(node_id)[:8]}"
+
+
+def _collect_agent_payload(actor_name: str) -> dict[str, Any]:
+    try:
+        actor = ray.get_actor(actor_name, namespace="roar")
+    except Exception:
+        return {"name": actor_name, "found": False}
+
+    payload: dict[str, Any] = {"name": actor_name, "found": True}
+    try:
+        result = ray.get(actor.collect_logs.remote(), timeout=5)
+        if isinstance(result, dict):
+            lines = result.get("proxy_log_lines", [])
+            payload.update(result)
+            payload["proxy_log_line_count"] = len(lines) if isinstance(lines, list) else 0
+        else:
+            payload["collect_logs_payload"] = _to_text(result)
+    except Exception as exc:
+        payload["collect_logs_error"] = str(exc)
+    return payload
+
+
+def _check_node_agents(job_id: str) -> dict[str, Any]:
+    nodes = _alive_nodes()
+    expected = [_agent_name(job_id, node["node_id"]) for node in nodes]
+    found = [_collect_agent_payload(name) for name in expected]
+    missing = [item["name"] for item in found if not item.get("found")]
+    return {
+        "check": "node-agents",
+        "job_id": job_id,
+        "alive_nodes": nodes,
+        "expected_agent_names": expected,
+        "node_agents_found": found,
+        "node_agents_found_count": len([item for item in found if item.get("found")]),
+        "missing_agent_names": missing,
+        "actors": _list_actors(),
+    }
+
+
+def _check_proxy_env(job_id: str) -> dict[str, Any]:
+    worker_env = _run_per_node_worker_probe(check_binary=False)
+    return {
+        "check": "proxy-env",
+        "job_id": job_id,
+        "alive_nodes": _alive_nodes(),
+        "worker_env": worker_env,
+        "actors": _list_actors(),
+    }
+
+
+def _source_contains(module_name: str, attribute_name: str, needle: str) -> tuple[bool, str]:
+    try:
+        module = __import__(module_name, fromlist=[attribute_name])
+        target = getattr(module, attribute_name)
+        source = inspect.getsource(target)
+    except Exception as exc:
+        return False, str(exc)
+    return needle in source, ""
+
+
+def _proxy_log_plumbing_status() -> dict[str, Any]:
+    collect_ray_io_drops_proxy_logs, collect_ray_io_error = _source_contains(
+        "roar.services.execution.inject.sitecustomize",
+        "_collect_ray_io",
+        "del proxy_logs",
+    )
+    collector_collect_drops_proxy_logs, collector_collect_error = _source_contains(
+        "roar.ray.collector",
+        "collect",
+        "del log_dir, proxy_logs",
+    )
+    return {
+        "collect_ray_io_drops_proxy_logs": collect_ray_io_drops_proxy_logs,
+        "collect_ray_io_source_error": collect_ray_io_error,
+        "collector_collect_drops_proxy_logs": collector_collect_drops_proxy_logs,
+        "collector_collect_source_error": collector_collect_error,
+    }
+
+
+def _check_collector(job_id: str) -> dict[str, Any]:
+    actor_name = f"roar-log-collector-{job_id}"
+    exists = False
+    ping_ok = False
+    error = ""
+    try:
+        actor = ray.get_actor(actor_name, namespace="roar")
+        exists = True
+        try:
+            ping_ok = bool(ray.get(actor.ping.remote(), timeout=5))
+        except Exception as exc:
+            error = str(exc)
+    except Exception as exc:
+        error = str(exc)
+
+    return {
+        "check": "collector",
+        "job_id": job_id,
+        "collector_actor_name": actor_name,
+        "collector_exists": exists,
+        "collector_ping_ok": ping_ok,
+        "collector_error": error,
+        "proxy_log_plumbing": _proxy_log_plumbing_status(),
+        "actors": _list_actors(),
+    }
+
+
+def _check_binary(job_id: str) -> dict[str, Any]:
+    worker_binary = _run_per_node_worker_probe(check_binary=True)
+    return {
+        "check": "binary",
+        "job_id": job_id,
+        "alive_nodes": _alive_nodes(),
+        "worker_binary": worker_binary,
+        "actors": _list_actors(),
+    }
+
+
+def _driver_env_snapshot() -> dict[str, str]:
+    keys = [
+        "ROAR_JOB_INSTRUMENTED",
+        "ROAR_RAY_NODE_AGENTS",
+        "ROAR_WRAP",
+        "GLAAS_URL",
+        "GLAAS_API_URL",
+        "ROAR_SESSION_ID",
+        "ROAR_FRAGMENT_TOKEN",
+        "ROAR_JOB_ID",
+        "AWS_ENDPOINT_URL",
+    ]
+    return {key: os.getenv(key, "") for key in keys}
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--check",
+        default="node-agents,proxy-env,collector,binary",
+        help="Comma-separated checks: node-agents,proxy-env,collector,binary",
+    )
+    args = parser.parse_args(argv)
+    checks = [item.strip() for item in str(args.check).split(",") if item.strip()]
+    valid = {"node-agents", "proxy-env", "collector", "binary"}
+    invalid = sorted(set(checks) - valid)
+    if invalid:
+        print(json.dumps({"error": "invalid check values", "invalid": invalid}, sort_keys=True))
+        return 2
+
+    ray.init(address="auto", ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        job_id = os.getenv("ROAR_JOB_ID") or os.getenv("RAY_JOB_ID") or "default"
+        runners = {
+            "node-agents": _check_node_agents,
+            "proxy-env": _check_proxy_env,
+            "collector": _check_collector,
+            "binary": _check_binary,
+        }
+        for check in checks:
+            payload = runners[check](str(job_id))
+            payload["driver_env"] = _driver_env_snapshot()
+            print(json.dumps(payload, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/e2e/ray/jobs/s3_high_throughput.py b/tests/e2e/ray/jobs/s3_high_throughput.py
new file mode 100644
index 00000000..084fcebb
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_high_throughput.py
@@ -0,0 +1,127 @@
+"""High-throughput S3 probe with configurable operations and parallelism."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+@ray.remote
+def _blast(worker_index: int, run_id: str, bucket: str, ops: int) -> dict[str, Any]:
+    client = _s3_client()
+    node_id = _node_id()
+    success = 0
+    for offset in range(max(0, ops)):
+        op_id = (worker_index * 1_000_000) + offset
+        key = f"high-throughput/{run_id}/w{worker_index:03d}/op-{op_id:09d}.txt"
+        payload = f"{run_id}|{worker_index}|{offset}|{time.time_ns()}"
+        client.put_object(Bucket=bucket, Key=key, Body=payload.encode("utf-8"))
+        body = client.get_object(Bucket=bucket, Key=key)["Body"].read().decode("utf-8")
+        if body != payload:
+            raise RuntimeError(f"payload mismatch: {key}")
+        success += 1
+
+    return {
+        "worker_index": worker_index,
+        "node_id": node_id,
+        "ops_requested": ops,
+        "ops_succeeded": success,
+    }
+
+
+def _distribute_ops(total_ops: int, workers: int) -> list[int]:
+    workers = max(1, workers)
+    base = total_ops // workers
+    rem = total_ops % workers
+    return [base + (1 if idx < rem else 0) for idx in range(workers)]
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--ops", type=int, default=5000)
+    parser.add_argument("--parallelism", type=int, default=64)
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    total_ops = max(1, int(args.ops))
+    parallelism = max(1, int(args.parallelism))
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "s3_high_throughput",
+        "run_id": run_id,
+        "bucket": args.bucket,
+        "ops": total_ops,
+        "parallelism": parallelism,
+        "worker_results": [],
+        "errors": [],
+    }
+    started = time.perf_counter()
+    try:
+        ops_per_worker = _distribute_ops(total_ops, parallelism)
+        refs: list[tuple[int, ray.ObjectRef]] = []
+        for worker_index, worker_ops in enumerate(ops_per_worker):
+            if worker_ops <= 0:
+                continue
+            refs.append(
+                (
+                    worker_index,
+                    _blast.remote(worker_index, run_id, str(args.bucket), worker_ops),
+                )
+            )
+
+        for worker_index, ref in refs:
+            try:
+                report["worker_results"].append(ray.get(ref, timeout=600))
+            except Exception as exc:
+                report["errors"].append({"worker_index": worker_index, "error": str(exc)})
+
+        duration_s = time.perf_counter() - started
+        total_succeeded = sum(
+            int(item.get("ops_succeeded", 0))
+            for item in report["worker_results"]
+            if isinstance(item, dict)
+        )
+        report["summary"] = {
+            "duration_s": duration_s,
+            "ops_succeeded": total_succeeded,
+            "ops_failed": max(0, total_ops - total_succeeded),
+            "throughput_ops_per_s": (float(total_succeeded) / duration_s)
+            if duration_s > 0
+            else 0.0,
+        }
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+    if report["errors"]:
+        return 1
+    return 0 if report.get("summary", {}).get("ops_succeeded", 0) == total_ops else 1
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/s3_multi_node_affinity.py b/tests/e2e/ray/jobs/s3_multi_node_affinity.py
new file mode 100644
index 00000000..220d8c4b
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_multi_node_affinity.py
@@ -0,0 +1,145 @@
+"""Pin S3 operations to specific nodes and report node->key affinity."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _to_text(value: Any) -> str:
+    if value is None:
+        return ""
+    if isinstance(value, bytes):
+        try:
+            return value.hex()
+        except Exception:
+            return value.decode("utf-8", errors="ignore")
+    return str(value)
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _alive_nodes() -> list[dict[str, str]]:
+    out: list[dict[str, str]] = []
+    for node in ray.nodes():
+        if not isinstance(node, dict) or not node.get("Alive"):
+            continue
+        node_id = _to_text(node.get("NodeID"))
+        if not node_id:
+            continue
+        resources = node.get("Resources", {})
+        node_resource = ""
+        if isinstance(resources, dict):
+            for key in resources:
+                key_text = str(key)
+                if key_text.startswith("node:"):
+                    node_resource = key_text
+                    break
+        out.append(
+            {
+                "node_id": node_id,
+                "node_ip": _to_text(node.get("NodeManagerAddress")),
+                "node_resource": node_resource,
+            }
+        )
+    return out
+
+
+def _node_id() -> str:
+    try:
+        return _to_text(ray.get_runtime_context().get_node_id())
+    except Exception:
+        return ""
+
+
+@ray.remote
+def _node_s3_write(run_id: str, bucket: str, index: int) -> dict[str, Any]:
+    node_id = _node_id()
+    key = f"multi-node-affinity/{run_id}/{str(node_id)[:8]}/item-{index:03d}.txt"
+    payload = f"{run_id}|{node_id}|{index}|{time.time_ns()}"
+    s3 = _s3_client()
+    put_resp = s3.put_object(Bucket=bucket, Key=key, Body=payload.encode("utf-8"))
+    get_resp = s3.get_object(Bucket=bucket, Key=key)
+    body = get_resp["Body"].read().decode("utf-8")
+    return {
+        "node_id": node_id,
+        "index": index,
+        "bucket": bucket,
+        "key": key,
+        "path": f"s3://{bucket}/{key}",
+        "etag": str((put_resp or {}).get("ETag", "")),
+        "payload_match": body == payload,
+    }
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "s3_multi_node_affinity",
+        "run_id": run_id,
+        "bucket": args.bucket,
+        "results": [],
+        "errors": [],
+    }
+    try:
+        nodes = _alive_nodes()
+        report["alive_nodes"] = nodes
+
+        scheduled: list[ray.ObjectRef] = []
+        for index, node in enumerate(nodes):
+            node_resource = str(node.get("node_resource", ""))
+            options: dict[str, Any] = {}
+            if node_resource:
+                options["resources"] = {node_resource: 0.001}
+            scheduled.append(
+                _node_s3_write.options(**options).remote(run_id, str(args.bucket), index)
+            )
+
+        for ref in scheduled:
+            try:
+                report["results"].append(ray.get(ref, timeout=120))
+            except Exception as exc:
+                report["errors"].append({"error": str(exc)})
+
+        node_to_keys: dict[str, list[str]] = {}
+        for item in report["results"]:
+            if not isinstance(item, dict):
+                continue
+            node_id = str(item.get("node_id") or "")
+            path = str(item.get("path") or "")
+            if not node_id or not path:
+                continue
+            node_to_keys.setdefault(node_id, []).append(path)
+        report["node_to_keys"] = node_to_keys
+
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+    if report["errors"]:
+        return 1
+    if any(
+        not bool(item.get("payload_match")) for item in report["results"] if isinstance(item, dict)
+    ):
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/s3_multipart_io.py b/tests/e2e/ray/jobs/s3_multipart_io.py
new file mode 100644
index 00000000..22d53fdc
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_multipart_io.py
@@ -0,0 +1,116 @@
+"""Force multipart upload and verify object round-trip."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+@ray.remote
+def _multipart_upload(run_id: str, bucket: str, parts: int, part_size_mb: int) -> dict[str, Any]:
+    client = _s3_client()
+    key = f"multipart/{run_id}/large-object.bin"
+    part_size = max(5, int(part_size_mb)) * 1024 * 1024
+    part_count = max(2, int(parts))
+
+    created = client.create_multipart_upload(Bucket=bucket, Key=key)
+    upload_id = str(created["UploadId"])
+
+    completed_parts: list[dict[str, Any]] = []
+    total_size = 0
+    try:
+        for index in range(part_count):
+            # Keep payload deterministic and large enough for multipart semantics.
+            payload = bytes([65 + (index % 20)]) * part_size
+            total_size += len(payload)
+            result = client.upload_part(
+                Bucket=bucket,
+                Key=key,
+                UploadId=upload_id,
+                PartNumber=index + 1,
+                Body=payload,
+            )
+            completed_parts.append({"PartNumber": index + 1, "ETag": result["ETag"]})
+
+        completed = client.complete_multipart_upload(
+            Bucket=bucket,
+            Key=key,
+            UploadId=upload_id,
+            MultipartUpload={"Parts": completed_parts},
+        )
+    except Exception:
+        client.abort_multipart_upload(Bucket=bucket, Key=key, UploadId=upload_id)
+        raise
+
+    head = client.head_object(Bucket=bucket, Key=key)
+    body = client.get_object(Bucket=bucket, Key=key)["Body"].read(128)
+    return {
+        "node_id": _node_id(),
+        "path": f"s3://{bucket}/{key}",
+        "key": key,
+        "parts": part_count,
+        "part_size_bytes": part_size,
+        "uploaded_size_bytes": total_size,
+        "head_size_bytes": int(head.get("ContentLength", 0)),
+        "multipart_etag": str(head.get("ETag", "")),
+        "complete_etag": str((completed or {}).get("ETag", "")),
+        "first_bytes_len": len(body),
+    }
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--bucket", default="output-bucket")
+    parser.add_argument("--parts", type=int, default=3)
+    parser.add_argument("--part-size-mb", type=int, default=6)
+    args = parser.parse_args(argv)
+
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    ray.init(address="auto")
+    try:
+        result = ray.get(
+            _multipart_upload.remote(
+                run_id,
+                str(args.bucket),
+                int(args.parts),
+                int(args.part_size_mb),
+            ),
+            timeout=300,
+        )
+        report = {
+            "script": "s3_multipart_io",
+            "run_id": run_id,
+            "bucket": args.bucket,
+            "result": result,
+        }
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/s3_perf_probe.py b/tests/e2e/ray/jobs/s3_perf_probe.py
new file mode 100644
index 00000000..04d5ef33
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_perf_probe.py
@@ -0,0 +1,114 @@
+"""Micro-benchmark S3 latency and report p50/p95 metrics as JSON."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import math
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+def _percentile(samples: list[float], percentile: float) -> float:
+    if not samples:
+        return 0.0
+    ordered = sorted(samples)
+    rank = math.ceil((percentile / 100.0) * len(ordered)) - 1
+    rank = max(0, min(rank, len(ordered) - 1))
+    return float(ordered[rank])
+
+
+@ray.remote
+def _run_micro_probe(ops: int, bucket: str, run_id: str) -> dict[str, Any]:
+    s3 = _s3_client()
+    put_latencies_ms: list[float] = []
+    get_latencies_ms: list[float] = []
+
+    for idx in range(ops):
+        key = f"s3-perf/{run_id}/item-{idx:05d}.txt"
+        payload = f"{run_id}-{idx}-{time.time_ns()}".encode()
+
+        start = time.perf_counter()
+        s3.put_object(Bucket=bucket, Key=key, Body=payload)
+        put_latencies_ms.append((time.perf_counter() - start) * 1000.0)
+
+        start = time.perf_counter()
+        body = s3.get_object(Bucket=bucket, Key=key)["Body"].read()
+        get_latencies_ms.append((time.perf_counter() - start) * 1000.0)
+        if body != payload:
+            raise RuntimeError(f"payload mismatch at {key}")
+
+    return {
+        "node_id": _node_id(),
+        "ops": ops,
+        "put_latencies_ms": put_latencies_ms,
+        "get_latencies_ms": get_latencies_ms,
+    }
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--mode", default="micro", choices=["micro"])
+    parser.add_argument("--ops", type=int, default=100)
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "s3_perf_probe",
+        "mode": args.mode,
+        "ops": args.ops,
+        "bucket": args.bucket,
+        "run_id": run_id,
+    }
+    try:
+        payload = ray.get(_run_micro_probe.remote(max(1, int(args.ops)), str(args.bucket), run_id))
+        put_latencies = [float(item) for item in payload.get("put_latencies_ms", [])]
+        get_latencies = [float(item) for item in payload.get("get_latencies_ms", [])]
+        report["node_id"] = payload.get("node_id")
+        report["operation_stats"] = {
+            "put_object": {
+                "count": len(put_latencies),
+                "p50_ms": _percentile(put_latencies, 50),
+                "p95_ms": _percentile(put_latencies, 95),
+                "min_ms": min(put_latencies) if put_latencies else 0.0,
+                "max_ms": max(put_latencies) if put_latencies else 0.0,
+            },
+            "get_object": {
+                "count": len(get_latencies),
+                "p50_ms": _percentile(get_latencies, 50),
+                "p95_ms": _percentile(get_latencies, 95),
+                "min_ms": min(get_latencies) if get_latencies else 0.0,
+                "max_ms": max(get_latencies) if get_latencies else 0.0,
+            },
+        }
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/s3_pipeline.py b/tests/e2e/ray/jobs/s3_pipeline.py
index f04b68c0..ca12d4d4 100644
--- a/tests/e2e/ray/jobs/s3_pipeline.py
+++ b/tests/e2e/ray/jobs/s3_pipeline.py
@@ -23,7 +23,7 @@
 import time
 import uuid
 from typing import Any
-from urllib.parse import urlparse, urlunparse
+from urllib.parse import urlparse
 
 import boto3
 
@@ -34,52 +34,16 @@
 OUT_BUCKET = "output-bucket"
 
 
-def _running_in_ray_worker() -> bool:
-    return os.getenv("ROAR_WORKER") == "1"
-
-
-def _resolve_endpoint_url() -> str:
-    endpoint = os.getenv("AWS_ENDPOINT_URL")
-    if not endpoint:
-        return "http://minio:9000" if _running_in_ray_worker() else "http://localhost:9000"
-
-    parsed = urlparse(endpoint)
-    if (
-        _running_in_ray_worker()
-        and parsed.hostname in {"localhost", "127.0.0.1"}
-        and parsed.scheme in {"http", "https"}
-    ):
-        port = parsed.port or 9000
-        patched = parsed._replace(netloc=f"minio:{port}")
-        return urlunparse(patched)
-    return endpoint
-
-
 def _s3():
     return boto3.client(
         "s3",
-        endpoint_url=_resolve_endpoint_url(),
+        endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
         aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
         aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
         region_name="us-east-1",
     )
 
 
-def _ensure_roar_worker_startup() -> None:
-    """
-    Best-effort worker startup for Ray Client mode.
-
-    Some Ray client execution paths do not trigger worker setup hooks reliably.
-    Calling this inside remote tasks keeps S3/open capture active for live tests.
-    """
-    try:
-        import roar.ray.roar_worker as roar_worker
-
-        roar_worker._startup()
-    except Exception:
-        return
-
-
 def _parse_s3_uri(uri: str) -> tuple[str, str]:
     parsed = urlparse(uri)
     if parsed.scheme != "s3" or not parsed.netloc:
@@ -90,7 +54,6 @@ def _parse_s3_uri(uri: str) -> tuple[str, str]:
 @ray.remote
 def ingest_shard(shard_id: int, run_id: str) -> dict[str, Any]:
     """Read raw CSV from S3, transform, write processed JSON back to S3."""
-    _ensure_roar_worker_startup()
     s3 = _s3()
     raw_key = f"raw/{run_id}/shard_{shard_id}.csv"
     body = s3.get_object(Bucket=TEST_BUCKET, Key=raw_key)["Body"].read().decode("utf-8")
@@ -111,7 +74,6 @@ def ingest_shard(shard_id: int, run_id: str) -> dict[str, Any]:
 @ray.remote
 def train_shard(ingest_result: dict[str, Any], run_id: str) -> dict[str, Any]:
     """Read processed JSON and produce a minimal model artifact."""
-    _ensure_roar_worker_startup()
     s3 = _s3()
     bucket, key = _parse_s3_uri(str(ingest_result["processed_key"]))
     data = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
@@ -130,7 +92,6 @@ def train_shard(ingest_result: dict[str, Any], run_id: str) -> dict[str, Any]:
 @ray.remote
 def eval_model(train_result: dict[str, Any], run_id: str) -> dict[str, Any]:
     """Read model and produce metrics."""
-    _ensure_roar_worker_startup()
     s3 = _s3()
     bucket, key = _parse_s3_uri(str(train_result["model_key"]))
     model = json.loads(s3.get_object(Bucket=bucket, Key=key)["Body"].read())
diff --git a/tests/e2e/ray/jobs/s3_sdk_matrix.py b/tests/e2e/ray/jobs/s3_sdk_matrix.py
new file mode 100644
index 00000000..8100f743
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_sdk_matrix.py
@@ -0,0 +1,259 @@
+"""Exercise S3 via multiple SDK call paths (and optionally awscli)."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import hashlib
+import json
+import os
+import subprocess
+import sys
+import tempfile
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _session() -> boto3.session.Session:
+    return boto3.session.Session()
+
+
+def _s3_client():
+    return _session().client("s3")
+
+
+def _s3_resource():
+    return _session().resource("s3")
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+def _sha256_text(value: str) -> str:
+    return hashlib.sha256(value.encode("utf-8")).hexdigest()
+
+
+def _run_boto3_client(bucket: str, key: str, payload: str) -> dict[str, Any]:
+    client = _s3_client()
+    put_resp = client.put_object(Bucket=bucket, Key=key, Body=payload.encode("utf-8"))
+    get_resp = client.get_object(Bucket=bucket, Key=key)
+    body = get_resp["Body"].read().decode("utf-8")
+    return {
+        "method": "boto3.client",
+        "write_path": f"s3://{bucket}/{key}",
+        "read_path": f"s3://{bucket}/{key}",
+        "payload_match": body == payload,
+        "etag": str((put_resp or {}).get("ETag", "")),
+        "payload_sha256": _sha256_text(body),
+    }
+
+
+def _run_boto3_session_client(bucket: str, key: str, payload: str) -> dict[str, Any]:
+    session_client = _session().client("s3")
+    put_resp = session_client.put_object(Bucket=bucket, Key=key, Body=payload.encode("utf-8"))
+    get_resp = session_client.get_object(Bucket=bucket, Key=key)
+    body = get_resp["Body"].read().decode("utf-8")
+    return {
+        "method": "boto3.Session().client",
+        "write_path": f"s3://{bucket}/{key}",
+        "read_path": f"s3://{bucket}/{key}",
+        "payload_match": body == payload,
+        "etag": str((put_resp or {}).get("ETag", "")),
+        "payload_sha256": _sha256_text(body),
+    }
+
+
+def _run_boto3_resource(bucket: str, key: str, payload: str) -> dict[str, Any]:
+    resource = _s3_resource()
+    obj = resource.Object(bucket, key)
+    put_resp = obj.put(Body=payload.encode("utf-8"))
+    get_resp = obj.get()
+    body = get_resp["Body"].read().decode("utf-8")
+    return {
+        "method": "boto3.resource",
+        "write_path": f"s3://{bucket}/{key}",
+        "read_path": f"s3://{bucket}/{key}",
+        "payload_match": body == payload,
+        "etag": str((put_resp or {}).get("ETag", "")),
+        "payload_sha256": _sha256_text(body),
+    }
+
+
+def _run_awscli(bucket: str, key: str, payload: str) -> dict[str, Any]:
+    env = dict(os.environ)
+    env["AWS_EC2_METADATA_DISABLED"] = "true"
+
+    with tempfile.TemporaryDirectory(prefix="awscli-matrix-") as tmpdir:
+        src_path = os.path.join(tmpdir, "src.txt")
+        dst_path = os.path.join(tmpdir, "dst.txt")
+        with open(src_path, "w", encoding="utf-8") as handle:
+            handle.write(payload)
+
+        put_cmd = [
+            "aws",
+            "s3",
+            "cp",
+            src_path,
+            f"s3://{bucket}/{key}",
+        ]
+        get_cmd = [
+            "aws",
+            "s3api",
+            "get-object",
+            "--bucket",
+            bucket,
+            "--key",
+            key,
+            dst_path,
+        ]
+        head_cmd = [
+            "aws",
+            "s3api",
+            "head-object",
+            "--bucket",
+            bucket,
+            "--key",
+            key,
+            "--output",
+            "json",
+        ]
+
+        put_result = subprocess.run(
+            put_cmd,
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=45,
+            env=env,
+        )
+        get_result = subprocess.run(
+            get_cmd,
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=45,
+            env=env,
+        )
+        head_result = subprocess.run(
+            head_cmd,
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=45,
+            env=env,
+        )
+
+        if put_result.returncode != 0 or get_result.returncode != 0 or head_result.returncode != 0:
+            raise RuntimeError(
+                "awscli commands failed: "
+                f"put={put_result.returncode}, get={get_result.returncode}, head={head_result.returncode}"
+            )
+
+        with open(dst_path, encoding="utf-8") as handle:
+            body = handle.read()
+
+        etag = ""
+        try:
+            head_payload = json.loads(head_result.stdout)
+            etag = str(head_payload.get("ETag", ""))
+        except Exception:
+            etag = ""
+
+    return {
+        "method": "awscli",
+        "write_path": f"s3://{bucket}/{key}",
+        "read_path": f"s3://{bucket}/{key}",
+        "payload_match": body == payload,
+        "etag": etag,
+        "payload_sha256": _sha256_text(body),
+    }
+
+
+@ray.remote
+def _run_method(method: str, run_id: str, bucket: str) -> dict[str, Any]:
+    key = f"sdk-matrix/{run_id}/{method.replace('/', '_').replace(' ', '_')}.txt"
+    payload = f"{method}|{run_id}|{time.time_ns()}"
+
+    if method == "boto3.client":
+        result = _run_boto3_client(bucket, key, payload)
+    elif method == "boto3.Session().client":
+        result = _run_boto3_session_client(bucket, key, payload)
+    elif method == "boto3.resource":
+        result = _run_boto3_resource(bucket, key, payload)
+    elif method == "awscli":
+        result = _run_awscli(bucket, key, payload)
+    else:
+        raise ValueError(f"Unsupported method: {method}")
+
+    result["node_id"] = _node_id()
+    result["key"] = key
+    result["expected_payload_sha256"] = _sha256_text(payload)
+    return result
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--include-awscli", action="store_true")
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    methods = [
+        "boto3.client",
+        "boto3.Session().client",
+        "boto3.resource",
+    ]
+    if args.include_awscli:
+        methods.append("awscli")
+
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "s3_sdk_matrix",
+        "run_id": run_id,
+        "bucket": args.bucket,
+        "methods_requested": methods,
+        "results": [],
+        "errors": [],
+    }
+    try:
+        refs = {method: _run_method.remote(method, run_id, str(args.bucket)) for method in methods}
+        for method, ref in refs.items():
+            try:
+                payload = ray.get(ref, timeout=180)
+                report["results"].append(payload)
+            except Exception as exc:
+                report["errors"].append({"method": method, "error": str(exc)})
+
+        report["paths_by_method"] = {
+            item["method"]: item.get("write_path", "")
+            for item in report["results"]
+            if isinstance(item, dict) and item.get("method")
+        }
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+    if report["errors"]:
+        return 1
+    if any(
+        not bool(item.get("payload_match")) for item in report["results"] if isinstance(item, dict)
+    ):
+        return 1
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/e2e/ray/jobs/s3_subprocess_pipeline.py b/tests/e2e/ray/jobs/s3_subprocess_pipeline.py
new file mode 100644
index 00000000..69ae5380
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_subprocess_pipeline.py
@@ -0,0 +1,82 @@
+"""S3 workload that performs Ray work in child subprocesses without ray.shutdown().
+
+This mirrors the cloud demo shape:
+  - a parent driver process spawns child Python processes
+  - each child calls ray.init(), performs S3 work, and exits normally
+  - no child explicitly calls ray.shutdown()
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import subprocess
+import sys
+import uuid
+
+import boto3
+
+import ray
+
+PHASES = ("extract", "train", "evaluate")
+BUCKET = "test-bucket"
+
+
+def _s3_client(endpoint: str | None):
+    return boto3.client(
+        "s3",
+        endpoint_url=endpoint,
+        aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+@ray.remote
+def write_then_read(bucket: str, key: str, body: str, endpoint: str | None) -> dict[str, str]:
+    s3 = _s3_client(endpoint)
+    payload = body.encode("utf-8")
+    s3.put_object(Bucket=bucket, Key=key, Body=payload)
+    value = s3.get_object(Bucket=bucket, Key=key)["Body"].read().decode("utf-8")
+    return {"key": key, "body": value}
+
+
+def _run_phase(phase: str, run_id: str) -> None:
+    endpoint = os.environ.get("AWS_ENDPOINT_URL") or None
+    ray.init(address="auto", ignore_reinit_error=True, logging_level="ERROR")
+
+    futures = [
+        write_then_read.remote(
+            BUCKET,
+            f"subprocess/{run_id}/{phase}_{index}.txt",
+            f"{phase}-{index}",
+            endpoint,
+        )
+        for index in range(3)
+    ]
+    results = ray.get(futures)
+    print(json.dumps({"phase": phase, "results": results}))
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = list(argv if argv is not None else sys.argv[1:])
+    if args[:1] == ["--phase"]:
+        if len(args) != 3:
+            raise SystemExit("usage: s3_subprocess_pipeline.py --phase <name> <run_id>")
+        _run_phase(args[1], args[2])
+        return 0
+
+    run_id = uuid.uuid4().hex[:8]
+    script_path = os.path.abspath(__file__)
+    for phase in PHASES:
+        subprocess.run(
+            [sys.executable, script_path, "--phase", phase, run_id],
+            check=True,
+        )
+
+    print(json.dumps({"status": "ok", "run_id": run_id}))
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/s3_workload.py b/tests/e2e/ray/jobs/s3_workload.py
new file mode 100644
index 00000000..c01a11bf
--- /dev/null
+++ b/tests/e2e/ray/jobs/s3_workload.py
@@ -0,0 +1,59 @@
+"""Simple S3 workload for proxy-log e2e testing. No roar-specific code."""
+
+from __future__ import annotations
+
+import json
+import os
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client(
+        "s3",
+        endpoint_url=os.environ.get("AWS_ENDPOINT_URL"),
+        aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+@ray.remote
+def s3_write(bucket: str, key: str, data: str) -> str:
+    s3 = _s3_client()
+    s3.put_object(Bucket=bucket, Key=key, Body=data.encode("utf-8"))
+    return f"s3://{bucket}/{key}"
+
+
+@ray.remote
+def s3_read(bucket: str, key: str) -> str:
+    s3 = _s3_client()
+    return s3.get_object(Bucket=bucket, Key=key)["Body"].read().decode("utf-8")
+
+
+def main() -> None:
+    ray.init(address="auto")
+    try:
+        bucket = "test-bucket"
+        key = "proxy-test/data.txt"
+
+        write_uri = ray.get(s3_write.remote(bucket, key, "hello from proxy test"))
+        result = ray.get(s3_read.remote(bucket, key))
+
+        print(
+            json.dumps(
+                {
+                    "status": "ok",
+                    "write_uri": write_uri,
+                    "data": result,
+                }
+            )
+        )
+    finally:
+        ray.shutdown()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/setup_hook_probe.py b/tests/e2e/ray/jobs/setup_hook_probe.py
new file mode 100644
index 00000000..2a613be9
--- /dev/null
+++ b/tests/e2e/ray/jobs/setup_hook_probe.py
@@ -0,0 +1,120 @@
+"""Submit a Ray job that exercises the worker_process_setup_hook crash path."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import os
+import time
+from typing import Any
+
+import ray
+from ray.job_submission import JobStatus, JobSubmissionClient
+
+_DASHBOARD_URL = "http://127.0.0.1:8265"
+_ENTRYPOINT = "python /app/tests/e2e/ray/jobs/setup_hook_probe.py --inner-probe"
+_POLL_INTERVAL_SECONDS = 1.0
+_TIMEOUT_SECONDS = 120.0
+_PROBE_TASK_COUNT = 4
+_TERMINAL_JOB_STATUSES = {
+    JobStatus.SUCCEEDED,
+    JobStatus.FAILED,
+    JobStatus.STOPPED,
+}
+
+
+@ray.remote(max_calls=1, max_retries=0)
+def _worker_pid() -> int:
+    return os.getpid()
+
+
+def _run_inner_probe() -> None:
+    ray.init(address="auto")
+    try:
+        refs = [_worker_pid.remote() for _ in range(_PROBE_TASK_COUNT)]
+        pids = ray.get(refs, timeout=60)
+        print(json.dumps({"phase": "inner_complete", "pids": pids}, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+
+def _wait_for_terminal_status(client: JobSubmissionClient, job_id: str) -> JobStatus:
+    deadline = time.monotonic() + _TIMEOUT_SECONDS
+    last_status: JobStatus | None = None
+
+    while time.monotonic() < deadline:
+        status = client.get_job_status(job_id)
+        last_status = status
+        if status in _TERMINAL_JOB_STATUSES:
+            return status
+        time.sleep(_POLL_INTERVAL_SECONDS)
+
+    last_status_name = last_status.name if isinstance(last_status, JobStatus) else str(last_status)
+    raise TimeoutError(f"Timed out waiting for Ray job {job_id}; last status={last_status_name}")
+
+
+def _build_payload(client: JobSubmissionClient, job_id: str, status: JobStatus) -> dict[str, Any]:
+    info = client.get_job_info(job_id)
+    logs = client.get_job_logs(job_id)
+    return {
+        "driver_exit_code": getattr(info, "driver_exit_code", None),
+        "entrypoint": getattr(info, "entrypoint", ""),
+        "error_type": getattr(info, "error_type", ""),
+        "job_id": job_id,
+        "logs": logs,
+        "message": getattr(info, "message", ""),
+        "status": status.name,
+    }
+
+
+def _build_runtime_env(job_id: str) -> dict[str, Any]:
+    return {
+        "worker_process_setup_hook": "roar.ray.roar_worker._startup",
+        "env_vars": {
+            "PYTHONPATH": "/app/roar/services/execution/inject",
+            "ROAR_JOB_ID": job_id,
+            "ROAR_JOB_INSTRUMENTED": "1",
+            "ROAR_PROJECT_DIR": "/app",
+            "ROAR_RAY_NODE_AGENTS": "1",
+            "ROAR_WRAP": "1",
+        },
+    }
+
+
+def _submit_probe_job() -> int:
+    client = JobSubmissionClient(_DASHBOARD_URL)
+    probe_job_id = f"setup-hook-probe-{int(time.time())}"
+    job_id = client.submit_job(
+        entrypoint=_ENTRYPOINT,
+        runtime_env=_build_runtime_env(probe_job_id),
+    )
+    status = _wait_for_terminal_status(client, job_id)
+    payload = _build_payload(client, job_id, status)
+
+    logs = str(payload.get("logs") or "")
+    if logs:
+        print(logs.rstrip())
+    print(json.dumps(payload, sort_keys=True))
+    return 0
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--inner-probe",
+        action="store_true",
+        help="Run as the submitted Ray job entrypoint instead of submitting the job.",
+    )
+    args = parser.parse_args(argv)
+
+    if args.inner_probe:
+        _run_inner_probe()
+        return 0
+
+    return _submit_probe_job()
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/jobs/timing_contract/__init__.py b/tests/e2e/ray/jobs/timing_contract/__init__.py
new file mode 100644
index 00000000..4363b942
--- /dev/null
+++ b/tests/e2e/ray/jobs/timing_contract/__init__.py
@@ -0,0 +1 @@
+"""Ray timing contract workload package."""
diff --git a/tests/e2e/ray/jobs/timing_contract/main.py b/tests/e2e/ray/jobs/timing_contract/main.py
new file mode 100644
index 00000000..b7ec3695
--- /dev/null
+++ b/tests/e2e/ray/jobs/timing_contract/main.py
@@ -0,0 +1,54 @@
+"""Host-submit Ray timing contract entrypoint."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import subprocess
+import sys
+import uuid
+from pathlib import Path
+
+APP_DIR = Path(__file__).resolve().parent
+PHASE_SCRIPT = APP_DIR / "scripts" / "run_timing_phase.py"
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="timing_contract")
+    parser.add_argument("--run-id", default=None)
+    args = parser.parse_args()
+
+    run_id = args.run_id or f"run-{uuid.uuid4().hex[:8]}"
+    state_file = Path(f"/tmp/timing-contract-state-{run_id}.json")
+    state_file.parent.mkdir(parents=True, exist_ok=True)
+    state_file.write_text(json.dumps({"run_id": run_id}), encoding="utf-8")
+
+    subprocess.run(
+        [sys.executable, str(PHASE_SCRIPT), "--state-file", str(state_file)],
+        check=True,
+        cwd=APP_DIR,
+    )
+
+    final_state = json.loads(state_file.read_text(encoding="utf-8"))
+    print(
+        json.dumps(
+            {
+                "script": "timing_contract",
+                "run_id": run_id,
+                "artifact_path": final_state.get("artifact_path"),
+                "report_key": final_state.get("report_key"),
+                "phase_started_at": final_state.get("phase_started_at"),
+                "phase_ended_at": final_state.get("phase_ended_at"),
+                "phase_expected_duration_seconds": final_state.get(
+                    "phase_expected_duration_seconds"
+                ),
+                "task_started_at": final_state.get("task_started_at"),
+                "task_ended_at": final_state.get("task_ended_at"),
+                "task_expected_duration_seconds": final_state.get("task_expected_duration_seconds"),
+            }
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/timing_contract/scripts/run_timing_phase.py b/tests/e2e/ray/jobs/timing_contract/scripts/run_timing_phase.py
new file mode 100644
index 00000000..8566c1c3
--- /dev/null
+++ b/tests/e2e/ray/jobs/timing_contract/scripts/run_timing_phase.py
@@ -0,0 +1,49 @@
+"""Driver phase wrapper for Ray timing contract tests."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import time
+from pathlib import Path
+
+from timing_contract.workload import run_phase
+
+PHASE_PRE_SLEEP_SECONDS = 0.7
+PHASE_POST_SLEEP_SECONDS = 0.6
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="timing_phase")
+    parser.add_argument("--state-file", required=True)
+    args = parser.parse_args()
+
+    state_path = Path(args.state_file)
+    state = json.loads(state_path.read_text(encoding="utf-8"))
+    run_id = str(state["run_id"])
+
+    phase_started_at = time.time()
+    time.sleep(PHASE_PRE_SLEEP_SECONDS)
+    task_result = run_phase(run_id)
+    time.sleep(PHASE_POST_SLEEP_SECONDS)
+    phase_ended_at = time.time()
+
+    state.update(
+        {
+            "artifact_path": task_result["artifact_path"],
+            "report_key": task_result["report_key"],
+            "phase_started_at": phase_started_at,
+            "phase_ended_at": phase_ended_at,
+            "phase_expected_duration_seconds": PHASE_PRE_SLEEP_SECONDS
+            + float(task_result["expected_duration_seconds"])
+            + PHASE_POST_SLEEP_SECONDS,
+            "task_started_at": task_result["task_started_at"],
+            "task_ended_at": task_result["task_ended_at"],
+            "task_expected_duration_seconds": task_result["expected_duration_seconds"],
+        }
+    )
+    state_path.write_text(json.dumps(state, sort_keys=True), encoding="utf-8")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/timing_contract/workload.py b/tests/e2e/ray/jobs/timing_contract/workload.py
new file mode 100644
index 00000000..28746b69
--- /dev/null
+++ b/tests/e2e/ray/jobs/timing_contract/workload.py
@@ -0,0 +1,61 @@
+"""Timing-focused Ray workload for lineage contract tests."""
+
+from __future__ import annotations
+
+import json
+import os
+import time
+
+import boto3
+
+import ray
+
+TASK_PRE_IO_SLEEP_SECONDS = 1.4
+TASK_POST_IO_SLEEP_SECONDS = 0.8
+RESULTS_BUCKET = "output-bucket"
+
+
+def _s3():
+    return boto3.client(
+        "s3",
+        endpoint_url=os.getenv("AWS_ENDPOINT_URL"),
+        aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID", "minioadmin"),
+        aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", "minioadmin"),
+        region_name="us-east-1",
+    )
+
+
+@ray.remote
+def timed_write(run_id: str) -> dict[str, object]:
+    started_at = time.time()
+    time.sleep(TASK_PRE_IO_SLEEP_SECONDS)
+
+    key = f"timing-contract/{run_id}/timed_write.json"
+    payload = {
+        "run_id": run_id,
+        "task_started_at": started_at,
+        "payload_written_at": time.time(),
+    }
+    _s3().put_object(
+        Bucket=RESULTS_BUCKET,
+        Key=key,
+        Body=json.dumps(payload, sort_keys=True).encode("utf-8"),
+    )
+
+    time.sleep(TASK_POST_IO_SLEEP_SECONDS)
+    ended_at = time.time()
+    return {
+        "artifact_path": f"s3://{RESULTS_BUCKET}/{key}",
+        "report_key": key,
+        "task_started_at": started_at,
+        "task_ended_at": ended_at,
+        "expected_duration_seconds": TASK_PRE_IO_SLEEP_SECONDS + TASK_POST_IO_SLEEP_SECONDS,
+    }
+
+
+def run_phase(run_id: str, ray_address: str = "auto") -> dict[str, object]:
+    ray.init(address=ray_address, ignore_reinit_error=True, logging_level="ERROR")
+    try:
+        return dict(ray.get(timed_write.remote(run_id)))
+    finally:
+        ray.shutdown()
diff --git a/tests/e2e/ray/jobs/tmp_filter_probe.py b/tests/e2e/ray/jobs/tmp_filter_probe.py
new file mode 100644
index 00000000..f0932752
--- /dev/null
+++ b/tests/e2e/ray/jobs/tmp_filter_probe.py
@@ -0,0 +1,42 @@
+"""Ray job that writes both workspace and /tmp artifacts."""
+
+from __future__ import annotations
+
+import json
+import tempfile
+import uuid
+from pathlib import Path
+
+import ray
+
+
+@ray.remote
+def write_probe_files() -> dict[str, str]:
+    suffix = uuid.uuid4().hex[:8]
+    workspace_dir = Path.cwd() / "artifacts" / "tmp_filter"
+    workspace_dir.mkdir(parents=True, exist_ok=True)
+
+    kept_path = workspace_dir / f"kept_{suffix}.json"
+    tmp_path = Path(tempfile.gettempdir()) / f"roar_tmp_filter_probe_{suffix}.json"
+
+    with open(kept_path, "w", encoding="utf-8") as handle:
+        json.dump({"kind": "workspace", "suffix": suffix}, handle)
+    with open(kept_path, encoding="utf-8") as handle:
+        _ = json.load(handle)
+
+    with open(tmp_path, "w", encoding="utf-8") as handle:
+        json.dump({"kind": "tmp", "suffix": suffix}, handle)
+    with open(tmp_path, encoding="utf-8") as handle:
+        _ = json.load(handle)
+
+    return {"workspace_path": str(kept_path), "tmp_path": str(tmp_path)}
+
+
+def main() -> None:
+    ray.init(address="auto")
+    result = ray.get(write_probe_files.remote())
+    print(json.dumps(result))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/worker_bootstrap_probe.py b/tests/e2e/ray/jobs/worker_bootstrap_probe.py
new file mode 100644
index 00000000..88f3c9af
--- /dev/null
+++ b/tests/e2e/ray/jobs/worker_bootstrap_probe.py
@@ -0,0 +1,36 @@
+"""Ray job that probes the active worker bootstrap path under host submit."""
+
+from __future__ import annotations
+
+import json
+import os
+from pathlib import Path
+
+import ray
+
+
+@ray.remote
+def _probe(output_path: str) -> dict[str, str]:
+    output = Path(output_path)
+    output.parent.mkdir(parents=True, exist_ok=True)
+    output.write_text("worker bootstrap probe\n", encoding="utf-8")
+    with output.open(encoding="utf-8") as handle:
+        body = handle.read()
+
+    return {
+        "aws_endpoint_url": os.environ.get("AWS_ENDPOINT_URL", ""),
+        "body": body,
+        "output_path": str(output),
+        "roar_project_dir": os.environ.get("ROAR_PROJECT_DIR", ""),
+    }
+
+
+def main() -> None:
+    ray.init(address="auto")
+    base_dir = Path.cwd() / "artifacts" / "worker_bootstrap_probe"
+    payload = ray.get(_probe.remote(str(base_dir / "output.txt")))
+    print(json.dumps(payload, sort_keys=True))
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tests/e2e/ray/jobs/worker_crash_partial_fragments.py b/tests/e2e/ray/jobs/worker_crash_partial_fragments.py
new file mode 100644
index 00000000..3b2290f0
--- /dev/null
+++ b/tests/e2e/ray/jobs/worker_crash_partial_fragments.py
@@ -0,0 +1,94 @@
+"""Run mixed success/crash workers after S3 ops to create partial lineage."""
+
+from __future__ import annotations
+
+import argparse
+import contextlib
+import json
+import time
+import uuid
+from typing import Any
+
+import boto3
+
+import ray
+
+
+def _s3_client():
+    return boto3.client("s3")
+
+
+def _node_id() -> str:
+    try:
+        value = ray.get_runtime_context().get_node_id()
+        if isinstance(value, bytes):
+            return value.hex()
+        return str(value)
+    except Exception:
+        return ""
+
+
+@ray.remote
+def _task(run_id: str, index: int, should_crash: bool, bucket: str) -> dict[str, Any]:
+    s3 = _s3_client()
+    key = f"worker-crash/{run_id}/task-{index:03d}.txt"
+    payload = f"{run_id}|{index}|{time.time_ns()}"
+    put_resp = s3.put_object(Bucket=bucket, Key=key, Body=payload.encode("utf-8"))
+    body = s3.get_object(Bucket=bucket, Key=key)["Body"].read().decode("utf-8")
+    if body != payload:
+        raise RuntimeError(f"payload mismatch for {key}")
+    if should_crash:
+        raise RuntimeError(f"intentional crash after S3 ops for task={index}")
+    return {
+        "index": index,
+        "node_id": _node_id(),
+        "path": f"s3://{bucket}/{key}",
+        "etag": str((put_resp or {}).get("ETag", "")),
+        "status": "ok",
+    }
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--tasks", type=int, default=8)
+    parser.add_argument("--crash-count", type=int, default=3)
+    parser.add_argument("--bucket", default="test-bucket")
+    args = parser.parse_args(argv)
+
+    task_count = max(1, int(args.tasks))
+    crash_count = max(0, min(int(args.crash_count), task_count))
+    run_id = f"{int(time.time())}-{uuid.uuid4().hex[:8]}"
+
+    ray.init(address="auto")
+    report: dict[str, Any] = {
+        "script": "worker_crash_partial_fragments",
+        "run_id": run_id,
+        "tasks": task_count,
+        "crash_count": crash_count,
+        "bucket": args.bucket,
+        "completed": [],
+        "crashed": [],
+    }
+    try:
+        refs: list[tuple[int, ray.ObjectRef]] = []
+        for index in range(task_count):
+            should_crash = index < crash_count
+            refs.append((index, _task.remote(run_id, index, should_crash, str(args.bucket))))
+
+        for index, ref in refs:
+            try:
+                report["completed"].append(ray.get(ref, timeout=120))
+            except Exception as exc:
+                report["crashed"].append({"index": index, "error": str(exc)})
+        report["completed_count"] = len(report["completed"])
+        report["crashed_count"] = len(report["crashed"])
+        print(json.dumps(report, sort_keys=True))
+    finally:
+        with contextlib.suppress(Exception):
+            ray.shutdown()
+
+    return 1 if report["crashed"] else 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/tests/e2e/ray/test_actor_collection.py b/tests/e2e/ray/test_actor_collection.py
deleted file mode 100644
index 687a47b8..00000000
--- a/tests/e2e/ray/test_actor_collection.py
+++ /dev/null
@@ -1,36 +0,0 @@
-"""E2E: actor-backed Ray log collection works without a shared filesystem."""
-
-from __future__ import annotations
-
-from pathlib import Path
-
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-def test_actor_backend_collects_worker_events_without_shared_log_volume(ray_cluster) -> None:
-    stdout, stderr, returncode = submit_job_on_head(
-        COMPOSE_FILE,
-        f"{JOBS_DIR}/basic_file_io.py",
-        env={
-            "ROAR_WRAP": "1",
-            "ROAR_LOG_BACKEND": "actor",
-            # Intentionally non-shared location: each container has its own /tmp.
-            "ROAR_LOG_DIR": "/tmp/roar-local-logs",
-        },
-    )
-    assert returncode == 0, f"Job failed:\n{stderr}\n{stdout}"
-
-    outputs = _query_roar_db(
-        COMPOSE_FILE,
-        "SELECT path FROM job_outputs WHERE path LIKE '%output.json'",
-    )
-    inputs = _query_roar_db(
-        COMPOSE_FILE,
-        "SELECT path FROM job_inputs WHERE path LIKE '%input.json'",
-    )
-    assert outputs, "Expected output artifacts when actor backend is enabled."
-    assert inputs, "Expected input artifacts when actor backend is enabled."
diff --git a/tests/e2e/ray/test_cloud_demo_fragment_sufficiency.py b/tests/e2e/ray/test_cloud_demo_fragment_sufficiency.py
new file mode 100644
index 00000000..b6f924ae
--- /dev/null
+++ b/tests/e2e/ray/test_cloud_demo_fragment_sufficiency.py
@@ -0,0 +1,332 @@
+"""Cloud-demo-shaped fragment sufficiency contract through Ray submit."""
+
+from __future__ import annotations
+
+import json
+from collections import defaultdict
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_cli_from_host,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(300)]
+
+EXPECTED_TASK_COUNTS = {
+    "cloud_demo_emulated.workload.extraction.generate_sensor_shard": 25,
+    "cloud_demo_emulated.workload.training.train_on_shard": 25,
+    "cloud_demo_emulated.workload.evaluation.evaluate_shard": 20,
+}
+PROXY_FUNCTIONS = {
+    "unknown",
+    "s3_proxy",
+    "s3_driver_proxy",
+    "roar.ray.node_agent.RoarNodeAgent.__init__",
+}
+EXPECTED_PHASE_COMMANDS = (
+    "ray_task:extraction",
+    "ray_task:training",
+    "ray_task:evaluation",
+)
+
+
+def _parse_payload(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict) and payload.get("script") == "cloud_demo_emulated":
+            return payload
+    raise AssertionError(f"Unable to parse cloud-demo-emulated payload from output:\n{stdout}")
+
+
+@pytest.fixture(scope="module")
+def cloud_demo_emulated_fragments(ray_cluster: dict[str, str]) -> dict[str, object]:
+    project_dir = make_host_project_dir("cloud-demo-emulated")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "cloud_demo_emulated/main.py",
+        use_fragment_store=True,
+        extra_env={
+            "S3_DATA_BUCKET": "test-bucket",
+            "S3_MODELS_BUCKET": "output-bucket",
+            "S3_RESULTS_BUCKET": "output-bucket",
+        },
+        timeout=300,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+    payload = _parse_payload(result.stdout)
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    return {
+        "project_dir": project_dir,
+        "payload": payload,
+        "fragments": fragments,
+    }
+
+
+def _unique_task_counts(fragments: list[dict[str, object]]) -> dict[str, int]:
+    job_uids_by_name: dict[str, set[str]] = defaultdict(set)
+    for fragment in fragments:
+        name = str(fragment.get("function_name") or "unknown")
+        job_uid = str(fragment.get("job_uid") or "")
+        if job_uid:
+            job_uids_by_name[name].add(job_uid)
+    return {name: len(job_uids) for name, job_uids in job_uids_by_name.items()}
+
+
+def _paths_for_function(
+    fragments: list[dict[str, object]],
+    function_name: str,
+    field: str,
+) -> set[str]:
+    paths: set[str] = set()
+    for fragment in fragments:
+        if str(fragment.get("function_name") or "unknown") != function_name:
+            continue
+        refs = fragment.get(field)
+        if not isinstance(refs, list):
+            continue
+        for ref in refs:
+            if not isinstance(ref, dict):
+                continue
+            path = str(ref.get("path") or "")
+            if path:
+                paths.add(path)
+    return paths
+
+
+def _paths_for_non_proxy_fragments(fragments: list[dict[str, object]], field: str) -> set[str]:
+    paths: set[str] = set()
+    for fragment in fragments:
+        function_name = str(fragment.get("function_name") or "unknown")
+        if function_name in PROXY_FUNCTIONS:
+            continue
+        refs = fragment.get(field)
+        if not isinstance(refs, list):
+            continue
+        for ref in refs:
+            if not isinstance(ref, dict):
+                continue
+            path = str(ref.get("path") or "")
+            if path:
+                paths.add(path)
+    return paths
+
+
+def _phase_jobs(project_dir: Path) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT id, step_number, command, script, job_uid
+        FROM jobs
+        WHERE job_type = 'ray_task'
+          AND command IN ('ray_task:extraction', 'ray_task:training', 'ray_task:evaluation')
+        ORDER BY step_number, id
+        """,
+    )
+
+
+def _step_numbers_for_command(project_dir: Path, command: str) -> set[int]:
+    return {
+        int(row["step_number"])
+        for row in query_roar_db(
+            project_dir,
+            """
+            SELECT step_number
+            FROM jobs
+            WHERE job_type = 'ray_task' AND command = ?
+            """,
+            (command,),
+        )
+    }
+
+
+def _dag_payload(project_dir: Path, *args: str) -> dict[str, object]:
+    result = run_roar_cli_from_host(project_dir, "dag", *args, timeout=30)
+    assert result.returncode == 0, result.stderr or result.stdout
+    payload = json.loads(result.stdout)
+    assert isinstance(payload, dict), result.stdout
+    return payload
+
+
+def test_cloud_demo_fragments_are_sufficient_for_phase_lineage(
+    cloud_demo_emulated_fragments: dict[str, object],
+) -> None:
+    payload = cloud_demo_emulated_fragments["payload"]
+    assert isinstance(payload, dict)
+    fragments = cloud_demo_emulated_fragments["fragments"]
+    assert isinstance(fragments, list)
+
+    task_counts = _unique_task_counts(fragments)
+    for function_name, expected_count in EXPECTED_TASK_COUNTS.items():
+        assert task_counts.get(function_name, 0) == expected_count, (
+            "Expected the fragment session to contain one named task lineage per real pipeline task. "
+            f"function_name={function_name!r}, expected_count={expected_count}, observed={task_counts.get(function_name, 0)}"
+        )
+
+    extraction_writes = _paths_for_function(
+        fragments,
+        "cloud_demo_emulated.workload.extraction.generate_sensor_shard",
+        "writes",
+    )
+    assert len([path for path in extraction_writes if "sensor_data/shard_" in path]) == 25, (
+        "Expected extraction task fragments to own the shard parquet writes needed for replayable lineage, "
+        f"observed_paths={sorted(extraction_writes)}"
+    )
+
+    training_reads = _paths_for_function(
+        fragments,
+        "cloud_demo_emulated.workload.training.train_on_shard",
+        "reads",
+    )
+    assert len([path for path in training_reads if "sensor_data/shard_" in path]) == 25, (
+        "Expected training task fragments to record shard parquet reads, "
+        f"observed_paths={sorted(training_reads)}"
+    )
+
+    evaluation_reads = _paths_for_function(
+        fragments,
+        "cloud_demo_emulated.workload.evaluation.evaluate_shard",
+        "reads",
+    )
+    shard_reads = [path for path in evaluation_reads if "sensor_data/shard_" in path]
+    model_reads = [path for path in evaluation_reads if "sensor_predictor_final.json" in path]
+    assert len(shard_reads) == 20, (
+        "Expected evaluation task fragments to read the evaluated shard set, "
+        f"observed_paths={sorted(evaluation_reads)}"
+    )
+    assert model_reads, (
+        "Expected evaluation task fragments to read the trained model artifact, "
+        f"observed_paths={sorted(evaluation_reads)}"
+    )
+
+    non_proxy_writes = _paths_for_non_proxy_fragments(fragments, "writes")
+    model_key = str(payload.get("model_key") or "")
+    metrics_key = str(payload.get("metrics_key") or "")
+    assert model_key, payload
+    assert metrics_key, payload
+    assert any(model_key in path for path in non_proxy_writes), (
+        "Expected a named non-proxy fragment to own the final model write. "
+        f"model_key={model_key!r}, observed_non_proxy_writes={sorted(non_proxy_writes)}"
+    )
+    assert any(metrics_key in path for path in non_proxy_writes), (
+        "Expected a named non-proxy fragment to own the evaluation metrics write. "
+        f"metrics_key={metrics_key!r}, observed_non_proxy_writes={sorted(non_proxy_writes)}"
+    )
+
+
+def test_cloud_demo_reconstitution_keeps_phase_outputs_on_named_jobs(
+    cloud_demo_emulated_fragments: dict[str, object],
+) -> None:
+    project_dir = cloud_demo_emulated_fragments["project_dir"]
+    assert isinstance(project_dir, Path)
+    payload = cloud_demo_emulated_fragments["payload"]
+    assert isinstance(payload, dict)
+
+    phase_jobs = _phase_jobs(project_dir)
+    observed_commands = {str(row["command"]) for row in phase_jobs}
+    missing_commands = [
+        command for command in EXPECTED_PHASE_COMMANDS if command not in observed_commands
+    ]
+    assert not missing_commands, (
+        "Expected all phase task families in the reconstituted DB, "
+        f"missing={missing_commands}, observed={sorted(observed_commands)}"
+    )
+
+    extract_steps = _step_numbers_for_command(project_dir, "ray_task:extraction")
+    train_steps = _step_numbers_for_command(project_dir, "ray_task:training")
+    evaluate_steps = _step_numbers_for_command(project_dir, "ray_task:evaluation")
+    assert extract_steps == {2}, phase_jobs
+    assert train_steps == {3}, phase_jobs
+    assert evaluate_steps == {4}, phase_jobs
+
+    model_key = str(payload.get("model_key") or "")
+    metrics_key = str(payload.get("metrics_key") or "")
+    assert model_key, payload
+    assert metrics_key, payload
+
+    output_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT j.command, jo.path
+        FROM job_outputs jo
+        JOIN jobs j ON j.id = jo.job_id
+        WHERE jo.path LIKE ? OR jo.path LIKE ?
+        ORDER BY jo.path, j.command
+        """,
+        (f"%{model_key}", f"%{metrics_key}"),
+    )
+    assert output_rows, (
+        "Expected final pipeline outputs in the host lineage DB, "
+        f"model_key={model_key!r}, metrics_key={metrics_key!r}"
+    )
+    by_path = {str(row["path"]): str(row["command"]) for row in output_rows}
+    assert by_path.get(f"s3://output-bucket/{model_key}") == "ray_task:training", output_rows
+    assert by_path.get(f"s3://output-bucket/{metrics_key}") == "ray_task:evaluation", output_rows
+
+
+def test_cloud_demo_compact_dag_surfaces_phase_story(
+    cloud_demo_emulated_fragments: dict[str, object],
+) -> None:
+    project_dir = cloud_demo_emulated_fragments["project_dir"]
+    assert isinstance(project_dir, Path)
+
+    dag_payload = _dag_payload(project_dir, "--json")
+    nodes = dag_payload.get("nodes", [])
+    assert isinstance(nodes, list), dag_payload
+
+    nodes_by_command = {
+        str(node.get("command")): node
+        for node in nodes
+        if isinstance(node, dict) and str(node.get("command", "")).startswith("ray_task:")
+    }
+    missing_nodes = [
+        command for command in EXPECTED_PHASE_COMMANDS if command not in nodes_by_command
+    ]
+    assert not missing_nodes, (
+        "Expected compact `roar dag` to show the user-facing extraction/training/evaluation phases, "
+        f"missing={missing_nodes}, observed={sorted(nodes_by_command)}"
+    )
+
+    extract_step = int(nodes_by_command["ray_task:extraction"]["step_number"])
+    train_step = int(nodes_by_command["ray_task:training"]["step_number"])
+    evaluate_step = int(nodes_by_command["ray_task:evaluation"]["step_number"])
+    assert extract_step < train_step < evaluate_step, dag_payload
+    assert train_step in nodes_by_command["ray_task:evaluation"]["dependencies"], dag_payload
+    assert extract_step in nodes_by_command["ray_task:training"]["dependencies"], dag_payload
+
+
+def test_cloud_demo_show_resolves_phase_steps(
+    cloud_demo_emulated_fragments: dict[str, object],
+) -> None:
+    project_dir = cloud_demo_emulated_fragments["project_dir"]
+    assert isinstance(project_dir, Path)
+
+    step_expectations = {
+        2: "ray_task:extraction",
+        3: "ray_task:training",
+        4: "ray_task:evaluation",
+    }
+    for step_number, expected_command in step_expectations.items():
+        result = run_roar_cli_from_host(project_dir, "show", f"@{step_number}", timeout=30)
+        assert result.returncode == 0, result.stderr or result.stdout
+        assert expected_command in result.stdout, result.stdout
+        assert "Job not found" not in result.stdout, result.stdout
diff --git a/tests/e2e/ray/test_cloud_demo_like_lineage.py b/tests/e2e/ray/test_cloud_demo_like_lineage.py
new file mode 100644
index 00000000..42a97cc6
--- /dev/null
+++ b/tests/e2e/ray/test_cloud_demo_like_lineage.py
@@ -0,0 +1,91 @@
+"""Cloud-demo-shaped Ray lineage repro through `roar run ray job submit ...`."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(240)]
+
+
+def _parse_payload(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict) and payload.get("script") == "cloud_demo_like":
+            return payload
+    raise AssertionError(f"Unable to parse cloud-demo-like payload from output:\n{stdout}")
+
+
+def _run_pipeline(project_dir: Path, ray_cluster: dict[str, str]) -> dict[str, object]:
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "cloud_demo_like/main.py",
+        use_fragment_store=True,
+        timeout=240,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+    return _parse_payload(result.stdout)
+
+
+def test_cloud_demo_like_pipeline_produces_phase_task_lineage(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("cloud-demo-like")
+    init_host_project(project_dir)
+
+    payload = _run_pipeline(project_dir, ray_cluster)
+    run_id = str(payload.get("run_id") or "")
+    report_key = str(payload.get("report_key") or "")
+
+    assert run_id, payload
+    assert report_key.endswith("/results/final_report.json"), payload
+
+    report_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT COALESCE(a.path, a.first_seen_path) AS path
+        FROM artifacts a
+        WHERE COALESCE(a.path, a.first_seen_path) LIKE ?
+        """,
+        (f"%cloud-demo-like/{run_id}/results/final_report.json",),
+    )
+    assert report_rows, "Expected the cloud-demo-like pipeline report artifact in lineage"
+
+    job_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT command, step_number
+        FROM jobs
+        WHERE job_type = 'ray_task'
+        ORDER BY step_number, command
+        """,
+    )
+    commands = {str(row.get("command") or "") for row in job_rows}
+
+    expected_commands = {
+        "ray_task:extract_shard",
+        "ray_task:train_on_shard",
+        "ray_task:evaluate_shard",
+    }
+    missing = sorted(expected_commands.difference(commands))
+    assert not missing, (
+        "Expected cloud-demo-shaped lineage to include first-class Ray task families "
+        f"for extraction, training, and evaluation, but missing={missing}. "
+        f"Observed commands={sorted(commands)}"
+    )
diff --git a/tests/e2e/ray/test_driver_proxy_capture.py b/tests/e2e/ray/test_driver_proxy_capture.py
new file mode 100644
index 00000000..50320b65
--- /dev/null
+++ b/tests/e2e/ray/test_driver_proxy_capture.py
@@ -0,0 +1,126 @@
+"""Ray contract: driver-local S3 proxy fragments reconstitute on host submit."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, str]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return {str(key): str(value) for key, value in payload.items()}
+    return {}
+
+
+def _fragment_entries_for_key(
+    fragments: list[dict[str, object]],
+    *,
+    key_suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    expected_path = f"s3://test-bucket/{key_suffix}"
+    for fragment in fragments:
+        for io_kind in ("reads", "writes"):
+            refs = fragment.get(io_kind, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                if str(ref.get("path") or "") != expected_path:
+                    continue
+                matches.append(
+                    {
+                        "io_kind": io_kind,
+                        "ray_task_id": fragment.get("ray_task_id"),
+                        "function_name": fragment.get("function_name"),
+                        **ref,
+                    }
+                )
+    return matches
+
+
+def _proxy_rows_for_key(project_dir: Path, *, key_suffix: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT j.command,
+               j.script,
+               json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) = ?
+        ORDER BY j.id
+        """,
+        (f"s3://test-bucket/{key_suffix}",),
+    )
+
+
+def test_host_submit_reconstitutes_driver_proxy_fragment(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-driver-proxy")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "driver_proxy_capture.py",
+        use_fragment_store=True,
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert payload.get("body") == "driver proxy capture\n", payload
+    assert payload.get("aws_endpoint_url", "").startswith("http://127.0.0.1:"), payload
+    assert payload.get("roar_proxy_port"), payload
+
+    key_suffix = payload["key"]
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+
+    refs = _fragment_entries_for_key(fragments, key_suffix=key_suffix)
+    proxy_refs = [ref for ref in refs if str(ref.get("capture_method") or "") == "proxy"]
+
+    assert proxy_refs, "Expected proxy fragment refs for the driver-only S3 artifact"
+    assert {str(ref.get("ray_task_id") or "") for ref in proxy_refs} == {"proxy:driver"}, proxy_refs
+    assert {str(ref.get("function_name") or "") for ref in proxy_refs} == {"s3_driver_proxy"}, (
+        proxy_refs
+    )
+
+    rows = _proxy_rows_for_key(project_dir, key_suffix=key_suffix)
+    assert rows, "Expected driver proxy artifact in the reconstituted roar.db"
+    assert {str(row.get("capture_method") or "") for row in rows} == {"proxy"}, rows
+    assert {str(row.get("ray_task_id") or "") for row in rows} == {"proxy:driver"}, rows
+    assert {str(row.get("script") or "") for row in rows} == {"s3_driver_proxy"}, rows
diff --git a/tests/e2e/ray/test_file_io_capture.py b/tests/e2e/ray/test_file_io_capture.py
deleted file mode 100644
index 24b51a88..00000000
--- a/tests/e2e/ray/test_file_io_capture.py
+++ /dev/null
@@ -1,161 +0,0 @@
-"""
-TDD: roar captures file I/O from Ray workers.
-
-These tests define the target behaviour for the roar-Ray integration.
-They FAIL until roar's sitecustomize / tracer injection reaches workers.
-
-Run against a live cluster:
-    pytest tests/e2e/ray/test_file_io_capture.py -v --timeout=120
-"""
-
-from __future__ import annotations
-
-import sqlite3
-from pathlib import Path
-
-import pytest
-
-from tests.e2e.ray.conftest import submit_job_on_head
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-def _query_roar_db(compose_file, sql: str, params: tuple = ()) -> list[dict]:
-    """
-    Run a query against .roar/roar.db inside the ray-head container
-    by exporting it and reading locally.
-    """
-    import subprocess
-    import tempfile
-
-    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
-        tmp_path = tmp.name
-
-    subprocess.run(
-        [
-            "docker",
-            "compose",
-            "-f",
-            str(compose_file),
-            "cp",
-            "ray-head:/app/.roar/roar.db",
-            tmp_path,
-        ],
-        check=True,
-        capture_output=True,
-    )
-
-    conn = sqlite3.connect(tmp_path)
-    conn.row_factory = sqlite3.Row
-    try:
-        cur = conn.cursor()
-        cur.execute(sql, params)
-        return [dict(row) for row in cur.fetchall()]
-    finally:
-        conn.close()
-        Path(tmp_path).unlink(missing_ok=True)
-
-
-@pytest.fixture(autouse=True)
-def reset_roar_state(ray_cluster):
-    """Reset roar state on the head node before each test."""
-    import subprocess
-
-    subprocess.run(
-        [
-            "docker",
-            "compose",
-            "-f",
-            str(COMPOSE_FILE),
-            "exec",
-            "-T",
-            "ray-head",
-            "bash",
-            "-c",
-            # Reset the roar DB and clear stale worker logs so previous
-            # tests don't pollute the next one.
-            "rm -rf /app/.roar /shared/.roar-logs && roar init --path /app -n",
-        ],
-        check=False,
-        capture_output=True,
-    )
-    yield
-
-
-class TestFileIOCapture:
-    """roar captures file writes from @ray.remote tasks."""
-
-    def test_worker_file_write_appears_as_output_artifact(self, ray_cluster):
-        """
-        roar run wrapping a Ray job should record files written by workers
-        as output artifacts in the lineage DB.
-
-        FAILS until roar instruments Ray workers.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/basic_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        # The job writes /shared/output.json from a remote task.
-        # roar should have captured this as an output artifact.
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT first_seen_path FROM artifacts WHERE first_seen_path LIKE '%output.json'",
-        )
-        assert len(rows) >= 1, (
-            "Expected /shared/output.json to appear in roar artifacts, "
-            "but it was not captured. "
-            "roar is not yet instrumenting Ray worker processes."
-        )
-
-    def test_worker_file_read_appears_as_input_artifact(self, ray_cluster):
-        """
-        Files read by Ray workers should appear as input artifacts.
-
-        FAILS until roar instruments Ray workers.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/basic_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT ji.path FROM job_inputs ji JOIN artifacts a ON ji.artifact_id = a.id "
-            "WHERE ji.path LIKE '%input.json'",
-        )
-        assert len(rows) >= 1, (
-            "Expected /shared/input.json to appear as a job input, "
-            "but it was not captured from the Ray worker."
-        )
-
-    def test_pipeline_intermediate_files_captured(self, ray_cluster):
-        """
-        Multi-step pipeline: intermediate files produced and consumed
-        by different tasks should all appear in lineage.
-
-        FAILS until roar instruments Ray workers.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/pipeline.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT first_seen_path FROM artifacts WHERE first_seen_path LIKE '/shared/%'",
-        )
-        captured_paths = {r["first_seen_path"] for r in rows}
-
-        assert any("pipeline_input.csv" in p for p in captured_paths), (
-            "pipeline_input.csv not captured"
-        )
-        assert any(".parquet" in p for p in captured_paths), "parquet output not captured"
diff --git a/tests/e2e/ray/test_fragment_reconstitution.py b/tests/e2e/ray/test_fragment_reconstitution.py
index f8cd89b1..e27538df 100644
--- a/tests/e2e/ray/test_fragment_reconstitution.py
+++ b/tests/e2e/ray/test_fragment_reconstitution.py
@@ -1,261 +1,118 @@
 from __future__ import annotations
 
-import json
-import os
-import sqlite3
-import subprocess
-import sys
-import urllib.error
-import urllib.request
 from pathlib import Path
 
 import pytest
 
 from roar.ray.fragment_reconstituter import FragmentReconstituter
-
-RAY_DASHBOARD_URL = "http://localhost:8265/api/version"
-GLAAS_HEALTH_URL = "http://localhost:3001/api/v1/health"
-GLAAS_BASE_URL = "http://localhost:3001"
-
-
-def _http_get(url: str, timeout_seconds: int = 5) -> tuple[int, str]:
-    with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
-        status = int(response.getcode())
-        body = response.read().decode("utf-8", errors="replace")
-        return status, body
-
-
-def _skip_if_services_unreachable() -> None:
-    checks = (
-        ("Ray dashboard", RAY_DASHBOARD_URL),
-        ("GLaaS", GLAAS_HEALTH_URL),
-    )
-    for service_name, url in checks:
-        try:
-            status, _body = _http_get(url)
-        except urllib.error.URLError as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        except (TimeoutError, ConnectionError, OSError) as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        if status != 200:
-            pytest.skip(f"{service_name} not healthy at {url}: HTTP {status}")
-
-
-def _run_checked(command: list[str], cwd: Path) -> None:
-    result = subprocess.run(command, cwd=cwd, capture_output=True, text=True, check=False)
-    if result.returncode != 0:
-        pytest.fail(
-            f"Command failed ({' '.join(command)}):\n"
-            f"stdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
-
-
-def _init_clean_repo(repo_dir: Path) -> None:
-    repo_dir.mkdir(parents=True, exist_ok=True)
-    (repo_dir / "README.md").write_text("fragment reconstitution e2e\n", encoding="utf-8")
-
-    _run_checked(["git", "init"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.email", "e2e@example.com"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.name", "E2E"], cwd=repo_dir)
-    _run_checked(["git", "add", "README.md"], cwd=repo_dir)
-    _run_checked(["git", "commit", "-m", "init"], cwd=repo_dir)
-    _run_checked(
-        [sys.executable, "-m", "roar", "init", "--path", str(repo_dir), "-n"], cwd=repo_dir
-    )
-
-
-def _run_submit(repo_dir: Path) -> tuple[subprocess.CompletedProcess[str], dict[str, str], Path]:
-    file_io_probe = """
-import ray
-
-ray.init()
-
-@ray.remote
-def io_task():
-    path = "/tmp/roar-fragment-reconstitution-e2e.txt"
-    with open(path, "w", encoding="utf-8") as handle:
-        handle.write("payload")
-    with open(path, "r", encoding="utf-8") as handle:
-        _ = handle.read()
-    return path
-
-print(ray.get(io_task.remote()))
-ray.shutdown()
-""".strip()
-
-    env = dict(os.environ)
-    env["GLAAS_URL"] = GLAAS_BASE_URL
-    env["GLAAS_API_URL"] = GLAAS_BASE_URL
-
-    result = subprocess.run(
-        [
-            sys.executable,
-            "-m",
-            "roar",
-            "run",
-            "ray",
-            "job",
-            "submit",
-            "--address",
-            "http://localhost:8265",
-            "--working-dir",
-            ".",
-            "--",
-            "python3",
-            "-c",
-            file_io_probe,
-        ],
-        cwd=repo_dir,
-        capture_output=True,
-        text=True,
-        check=False,
-        timeout=180,
-        env=env,
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _run_basic_file_job(
+    project_dir: Path,
+    ray_cluster: dict[str, str],
+) -> tuple[object, dict[str, str]]:
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
     )
-
-    output = f"{result.stdout}\n{result.stderr}".lower()
-    if result.returncode != 0 and "require the ray[default] installation" in output:
-        pytest.skip("Ray job submit requires ray[default] in this environment")
-    if result.returncode != 0 and any(
-        msg in output
-        for msg in (
-            "connection refused",
-            "failed to connect",
-            "unable to connect",
-            "cannot connect",
-            "timed out",
-            "deadline exceeded",
-        )
-    ):
-        pytest.skip("Ray or GLaaS became unreachable during submit")
-    if result.returncode != 0:
-        pytest.fail(
-            f"roar run ray job submit failed.\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
-
-    fragment_dir = repo_dir / ".roar" / "fragment-sessions"
-    key_files = sorted(fragment_dir.glob("*.key"))
-    assert key_files, f"Expected at least one key file under {fragment_dir}"
-    key_file = key_files[-1]
-    key_payload = json.loads(key_file.read_text(encoding="utf-8"))
-    return result, key_payload, key_file
-
-
-def _fetch_counts(db_path: Path) -> dict[str, int]:
-    conn = sqlite3.connect(db_path)
-    try:
-        return {
-            "jobs": int(
-                conn.execute("SELECT COUNT(*) FROM jobs WHERE job_type = 'ray_task'").fetchone()[0]
-            ),
-            "artifacts": int(conn.execute("SELECT COUNT(*) FROM artifacts").fetchone()[0]),
-            "artifact_hashes": int(
-                conn.execute("SELECT COUNT(*) FROM artifact_hashes").fetchone()[0]
-            ),
-            "job_inputs": int(conn.execute("SELECT COUNT(*) FROM job_inputs").fetchone()[0]),
-            "job_outputs": int(conn.execute("SELECT COUNT(*) FROM job_outputs").fetchone()[0]),
-        }
-    finally:
-        conn.close()
-
-
-@pytest.mark.e2e
-def test_auto_reconstitution_populates_local_roar_db(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    result, _key_payload, _key_file = _run_submit(repo_dir)
-    db_path = repo_dir / ".roar" / "roar.db"
-    counts = _fetch_counts(db_path)
+    assert result.returncode == 0, result.stderr or result.stdout
+    return result, load_fragment_key(project_dir)
+
+
+def _count_rows(project_dir: Path) -> dict[str, int]:
+    return {
+        "jobs": int(
+            query_roar_db(
+                project_dir,
+                "SELECT COUNT(*) AS count FROM jobs WHERE job_type = 'ray_task'",
+            )[0]["count"]
+        ),
+        "artifacts": int(
+            query_roar_db(project_dir, "SELECT COUNT(*) AS count FROM artifacts")[0]["count"]
+        ),
+        "job_inputs": int(
+            query_roar_db(project_dir, "SELECT COUNT(*) AS count FROM job_inputs")[0]["count"]
+        ),
+        "job_outputs": int(
+            query_roar_db(project_dir, "SELECT COUNT(*) AS count FROM job_outputs")[0]["count"]
+        ),
+        "artifact_hashes": int(
+            query_roar_db(project_dir, "SELECT COUNT(*) AS count FROM artifact_hashes")[0]["count"]
+        ),
+    }
+
+
+def test_auto_reconstitution_populates_local_roar_db(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-reconst")
+    init_host_project(project_dir)
+
+    result, _key_payload = _run_basic_file_job(project_dir, ray_cluster)
+    counts = _count_rows(project_dir)
 
     assert "[roar] lineage reconstituted:" in f"{result.stdout}\n{result.stderr}"
     assert counts["jobs"] > 0
     assert counts["artifacts"] > 0
-    assert counts["job_inputs"] + counts["job_outputs"] > 0
-
-
-@pytest.mark.e2e
-def test_reconstituted_artifact_hash_rows_are_present_and_correct(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    _result, _key_payload, _key_file = _run_submit(repo_dir)
-    db_path = repo_dir / ".roar" / "roar.db"
-    conn = sqlite3.connect(db_path)
-    conn.row_factory = sqlite3.Row
-    try:
-        rows = conn.execute(
-            """
-            SELECT ah.algorithm, ah.digest, a.id AS artifact_id
-            FROM artifact_hashes ah
-            JOIN artifacts a ON a.id = ah.artifact_id
-            """
-        ).fetchall()
-    finally:
-        conn.close()
+    assert counts["job_inputs"] > 0
+    assert counts["job_outputs"] > 0
+
+
+def test_reconstituted_artifact_hash_rows_are_present_and_well_formed(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-reconst")
+    init_host_project(project_dir)
+
+    _run_basic_file_job(project_dir, ray_cluster)
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT ah.algorithm, ah.digest, a.path
+        FROM artifact_hashes ah
+        JOIN artifacts a ON a.id = ah.artifact_id
+        ORDER BY a.path
+        """,
+    )
 
     assert rows, "Expected artifact_hashes rows to be created during reconstitution"
     for row in rows:
-        algorithm = str(row["algorithm"] or "")
         digest = str(row["digest"] or "")
-        artifact_id = str(row["artifact_id"] or "")
-        assert algorithm
+        assert row["algorithm"]
+        assert row["path"]
         assert digest
-        assert artifact_id
-        if algorithm == "blake3":
-            assert len(digest) == 64
-            int(digest, 16)
-        if algorithm == "sha256":
-            assert len(digest) == 64
-            int(digest, 16)
-        if algorithm == "sha512":
-            assert len(digest) == 128
-            int(digest, 16)
-        if algorithm == "md5":
-            assert len(digest) == 32
-            int(digest, 16)
+        int(digest, 16)
 
 
-@pytest.mark.e2e
-def test_reconstitution_is_idempotent(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
+def test_reconstitution_is_idempotent(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-reconst")
+    init_host_project(project_dir)
 
-    _result, key_payload, _key_file = _run_submit(repo_dir)
-    db_path = repo_dir / ".roar" / "roar.db"
-    before = _fetch_counts(db_path)
+    _result, key_payload = _run_basic_file_job(project_dir, ray_cluster)
+    before = _count_rows(project_dir)
+    db_path = project_dir / ".roar" / "roar.db"
 
-    second_result = FragmentReconstituter(
-        session_id=str(key_payload["session_id"]),
-        token=str(key_payload["token"]),
-        glaas_url=GLAAS_BASE_URL,
+    second = FragmentReconstituter(
+        session_id=key_payload["session_id"],
+        token=key_payload["token"],
+        glaas_url="http://localhost:3001",
         roar_db_path=db_path,
     ).reconstitute()
-    after = _fetch_counts(db_path)
+    after = _count_rows(project_dir)
 
-    assert second_result.jobs_merged == 0
-    assert second_result.artifacts_merged == 0
+    assert second.jobs_merged == 0
+    assert second.artifacts_merged == 0
     assert before == after
-
-
-@pytest.mark.e2e
-def test_fragment_key_file_is_retained(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    _result, key_payload, key_file = _run_submit(repo_dir)
-    db_path = repo_dir / ".roar" / "roar.db"
-    assert key_file.exists()
-
-    FragmentReconstituter(
-        session_id=str(key_payload["session_id"]),
-        token=str(key_payload["token"]),
-        glaas_url=GLAAS_BASE_URL,
-        roar_db_path=db_path,
-    ).reconstitute()
-
-    assert key_file.exists()
diff --git a/tests/e2e/ray/test_fragment_session_registration.py b/tests/e2e/ray/test_fragment_session_registration.py
index 6e1b9853..fe8cb679 100644
--- a/tests/e2e/ray/test_fragment_session_registration.py
+++ b/tests/e2e/ray/test_fragment_session_registration.py
@@ -1,180 +1,55 @@
 from __future__ import annotations
 
-import json
-import os
-import re
-import subprocess
-import sys
-import urllib.error
-import urllib.parse
-import urllib.request
 import uuid
-from pathlib import Path
 
 import pytest
 
-RAY_DASHBOARD_URL = "http://localhost:8265/api/version"
-GLAAS_HEALTH_URL = "http://localhost:3001/api/v1/health"
-GLAAS_BASE_URL = "http://localhost:3001"
+from tests.e2e.ray.conftest import (
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    run_roar_ray_job_from_host,
+)
 
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
 
-def _http_get(url: str, timeout_seconds: int = 5) -> tuple[int, str]:
-    with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
-        status = int(response.getcode())
-        body = response.read().decode("utf-8", errors="replace")
-        return status, body
 
+def test_roar_ray_submit_creates_fragment_key_file(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-session")
+    init_host_project(project_dir)
 
-def _skip_if_services_unreachable() -> None:
-    checks = (
-        ("Ray dashboard", RAY_DASHBOARD_URL),
-        ("GLaaS", GLAAS_HEALTH_URL),
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
     )
-    for service_name, url in checks:
-        try:
-            status, _body = _http_get(url)
-        except urllib.error.URLError as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        except (TimeoutError, ConnectionError, OSError) as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        if status != 200:
-            pytest.skip(f"{service_name} not healthy at {url}: HTTP {status}")
 
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
 
-def _run_checked(command: list[str], cwd: Path) -> None:
-    result = subprocess.run(command, cwd=cwd, capture_output=True, text=True, check=False)
-    if result.returncode != 0:
-        pytest.fail(
-            f"Command failed ({' '.join(command)}):\n"
-            f"stdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
+    uuid.UUID(key_payload["session_id"])
+    assert len(key_payload["token"]) == 64
+    assert len(key_payload["token_hash"]) == 64
 
 
-def _init_clean_repo(repo_dir: Path) -> None:
-    repo_dir.mkdir(parents=True, exist_ok=True)
-    (repo_dir / "README.md").write_text("fragment session e2e\n", encoding="utf-8")
+def test_fragment_session_is_preregistered_in_glaas(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-session")
+    init_host_project(project_dir)
 
-    _run_checked(["git", "init"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.email", "e2e@example.com"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.name", "E2E"], cwd=repo_dir)
-    _run_checked(["git", "add", "README.md"], cwd=repo_dir)
-    _run_checked(["git", "commit", "-m", "init"], cwd=repo_dir)
-    _run_checked(
-        [sys.executable, "-m", "roar", "init", "--path", str(repo_dir), "-n"], cwd=repo_dir
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
     )
 
-
-def _run_roar_ray_submit(
-    repo_dir: Path,
-) -> tuple[subprocess.CompletedProcess[str], dict[str, str], Path]:
-    probe = (
-        "import os; "
-        "print('ROAR_SESSION_ID=' + (os.getenv('ROAR_SESSION_ID') or '')); "
-        "print('ROAR_FRAGMENT_TOKEN=' + (os.getenv('ROAR_FRAGMENT_TOKEN') or ''))"
-    )
-    env = dict(os.environ)
-    env["GLAAS_URL"] = GLAAS_BASE_URL
-    env["GLAAS_API_URL"] = GLAAS_BASE_URL
-
-    result = subprocess.run(
-        [
-            sys.executable,
-            "-m",
-            "roar",
-            "run",
-            "ray",
-            "job",
-            "submit",
-            "--address",
-            "http://localhost:8265",
-            "--working-dir",
-            ".",
-            "--",
-            "python3",
-            "-c",
-            probe,
-        ],
-        cwd=repo_dir,
-        capture_output=True,
-        text=True,
-        check=False,
-        timeout=180,
-        env=env,
-    )
-
-    output = f"{result.stdout}\n{result.stderr}".lower()
-    if result.returncode != 0 and "require the ray[default] installation" in output:
-        pytest.skip("Ray job submit requires ray[default] in this environment")
-    if result.returncode != 0 and any(
-        msg in output
-        for msg in (
-            "connection refused",
-            "failed to connect",
-            "unable to connect",
-            "cannot connect",
-            "timed out",
-            "deadline exceeded",
-        )
-    ):
-        pytest.skip("Ray or GLaaS became unreachable during submit")
-    if result.returncode != 0:
-        pytest.fail(
-            f"roar run ray job submit failed.\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
-
-    fragment_dir = repo_dir / ".roar" / "fragment-sessions"
-    key_files = sorted(fragment_dir.glob("*.key"))
-    assert key_files, f"Expected at least one key file under {fragment_dir}"
-    key_payload = json.loads(key_files[-1].read_text(encoding="utf-8"))
-    return result, key_payload, key_files[-1]
-
-
-@pytest.mark.e2e
-def test_roar_ray_submit_creates_fragment_key_file(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    _result, key_payload, key_file = _run_roar_ray_submit(repo_dir)
-
-    assert key_file.exists()
-    assert key_file.parent == repo_dir / ".roar" / "fragment-sessions"
-    uuid.UUID(str(key_payload["session_id"]))
-    assert isinstance(key_payload.get("token"), str) and len(key_payload["token"]) == 64
-    assert isinstance(key_payload.get("token_hash"), str) and len(key_payload["token_hash"]) == 64
-    assert key_payload.get("created_at")
-
-
-@pytest.mark.e2e
-def test_session_is_preregistered_in_glaas_fragment_store(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    _result, key_payload, _key_file = _run_roar_ray_submit(repo_dir)
-    token = urllib.parse.quote(str(key_payload["token"]), safe="")
-    session_id = key_payload["session_id"]
-    url = f"{GLAAS_BASE_URL}/api/v1/fragments/sessions/{session_id}/fragments?token={token}"
-    status, body = _http_get(url)
-
-    assert status == 200, f"Expected HTTP 200 from {url}, got {status}. Body: {body}"
-    response_payload = json.loads(body)
-    assert "fragments" in response_payload
-
-
-@pytest.mark.e2e
-def test_session_env_vars_visible_inside_ray_job(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    result, key_payload, _key_file = _run_roar_ray_submit(repo_dir)
-    output = f"{result.stdout}\n{result.stderr}"
-
-    session_match = re.search(r"ROAR_SESSION_ID=([0-9a-fA-F-]+)", output)
-    token_match = re.search(r"ROAR_FRAGMENT_TOKEN=([0-9a-fA-F]+)", output)
-
-    assert session_match is not None, f"ROAR_SESSION_ID not found in output:\n{output}"
-    assert token_match is not None, f"ROAR_FRAGMENT_TOKEN not found in output:\n{output}"
-    assert session_match.group(1) == key_payload["session_id"]
-    assert token_match.group(1) == key_payload["token"]
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    assert isinstance(batches, list)
diff --git a/tests/e2e/ray/test_fragment_streaming.py b/tests/e2e/ray/test_fragment_streaming.py
index ffe39de8..574404c9 100644
--- a/tests/e2e/ray/test_fragment_streaming.py
+++ b/tests/e2e/ray/test_fragment_streaming.py
@@ -1,192 +1,95 @@
 from __future__ import annotations
 
-import json
-import os
-import subprocess
-import sys
-import urllib.error
-import urllib.parse
-import urllib.request
-from pathlib import Path
-
 import pytest
 
-RAY_DASHBOARD_URL = "http://localhost:8265/api/version"
-GLAAS_HEALTH_URL = "http://localhost:3001/api/v1/health"
-GLAAS_BASE_URL = "http://localhost:3001"
-PLAINTEXT_MARKER = "ROAR_STREAM_PLAINTEXT_MARKER"
-
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    run_roar_ray_job_from_host,
+)
 
-def _http_get(url: str, timeout_seconds: int = 5) -> tuple[int, str]:
-    with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
-        status = int(response.getcode())
-        body = response.read().decode("utf-8", errors="replace")
-        return status, body
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
 
 
-def _skip_if_services_unreachable() -> None:
-    checks = (
-        ("Ray dashboard", RAY_DASHBOARD_URL),
-        ("GLaaS", GLAAS_HEALTH_URL),
-    )
-    for service_name, url in checks:
-        try:
-            status, _body = _http_get(url)
-        except urllib.error.URLError as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        except (TimeoutError, ConnectionError, OSError) as exc:
-            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
-        if status != 200:
-            pytest.skip(f"{service_name} not healthy at {url}: HTTP {status}")
-
-
-def _run_checked(command: list[str], cwd: Path) -> None:
-    result = subprocess.run(command, cwd=cwd, capture_output=True, text=True, check=False)
-    if result.returncode != 0:
-        pytest.fail(
-            f"Command failed ({' '.join(command)}):\n"
-            f"stdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
-
-
-def _init_clean_repo(repo_dir: Path) -> None:
-    repo_dir.mkdir(parents=True, exist_ok=True)
-    (repo_dir / "README.md").write_text("fragment streaming e2e\n", encoding="utf-8")
-
-    _run_checked(["git", "init"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.email", "e2e@example.com"], cwd=repo_dir)
-    _run_checked(["git", "config", "user.name", "E2E"], cwd=repo_dir)
-    _run_checked(["git", "add", "README.md"], cwd=repo_dir)
-    _run_checked(["git", "commit", "-m", "init"], cwd=repo_dir)
-    _run_checked(
-        [sys.executable, "-m", "roar", "init", "--path", str(repo_dir), "-n"], cwd=repo_dir
+def _fragment_paths(fragments: list[dict[str, object]]) -> set[str]:
+    paths: set[str] = set()
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if isinstance(ref, dict) and isinstance(ref.get("path"), str):
+                    paths.add(ref["path"])
+    return paths
+
+
+def test_file_io_job_streams_encrypted_fragments_to_glaas(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-stream")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
     )
 
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
 
-def _run_file_io_ray_submit(repo_dir: Path) -> dict[str, str]:
-    file_io_probe = f"""
-import ray
-
-ray.init()
-
-@ray.remote
-def io_task():
-    marker_path = "/tmp/{PLAINTEXT_MARKER}.txt"
-    with open(marker_path, "w", encoding="utf-8") as handle:
-        handle.write("payload")
-    with open(marker_path, "r", encoding="utf-8") as handle:
-        _ = handle.read()
-    return marker_path
-
-print(ray.get(io_task.remote()))
-ray.shutdown()
-""".strip()
-
-    env = dict(os.environ)
-    env["GLAAS_URL"] = GLAAS_BASE_URL
-    env["GLAAS_API_URL"] = GLAAS_BASE_URL
-
-    result = subprocess.run(
-        [
-            sys.executable,
-            "-m",
-            "roar",
-            "run",
-            "ray",
-            "job",
-            "submit",
-            "--address",
-            "http://localhost:8265",
-            "--working-dir",
-            ".",
-            "--",
-            "python3",
-            "-c",
-            file_io_probe,
-        ],
-        cwd=repo_dir,
-        capture_output=True,
-        text=True,
-        check=False,
-        timeout=180,
-        env=env,
+    assert batches, "Expected at least one streamed fragment batch"
+    assert all(isinstance(batch.get("encrypted_batch"), str) for batch in batches)
+
+
+def test_fragment_batches_are_opaque_ciphertext(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-stream")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
     )
 
-    output = f"{result.stdout}\n{result.stderr}".lower()
-    if result.returncode != 0 and "require the ray[default] installation" in output:
-        pytest.skip("Ray job submit requires ray[default] in this environment")
-    if result.returncode != 0 and any(
-        msg in output
-        for msg in (
-            "connection refused",
-            "failed to connect",
-            "unable to connect",
-            "cannot connect",
-            "timed out",
-            "deadline exceeded",
-        )
-    ):
-        pytest.skip("Ray or GLaaS became unreachable during submit")
-    if result.returncode != 0:
-        pytest.fail(
-            f"roar run ray job submit failed.\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
-        )
-
-    fragment_dir = repo_dir / ".roar" / "fragment-sessions"
-    key_files = sorted(fragment_dir.glob("*.key"))
-    assert key_files, f"Expected at least one key file under {fragment_dir}"
-    return json.loads(key_files[-1].read_text(encoding="utf-8"))
-
-
-def _fetch_fragments(session_id: str, token: str) -> list[dict[str, object]]:
-    token_param = urllib.parse.quote(token, safe="")
-    url = f"{GLAAS_BASE_URL}/api/v1/fragments/sessions/{session_id}/fragments?token={token_param}"
-    status, body = _http_get(url)
-    assert status == 200, f"Expected HTTP 200 from {url}, got {status}. Body: {body}"
-
-    payload = json.loads(body)
-    fragments = payload.get("fragments")
-    assert isinstance(fragments, list), f"Expected list payload from {url}. Body: {body}"
-    return [item for item in fragments if isinstance(item, dict)]
-
-
-@pytest.mark.e2e
-def test_file_io_job_streams_encrypted_fragments_to_glaas(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    key_payload = _run_file_io_ray_submit(repo_dir)
-    fragments = _fetch_fragments(key_payload["session_id"], key_payload["token"])
-
-    assert fragments, "Expected at least one streamed fragment batch"
-    assert all(isinstance(fragment.get("encrypted_batch"), str) for fragment in fragments)
-
-
-@pytest.mark.e2e
-def test_fragment_list_is_non_empty_for_completed_session(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    key_payload = _run_file_io_ray_submit(repo_dir)
-    fragments = _fetch_fragments(key_payload["session_id"], key_payload["token"])
-
-    assert len(fragments) > 0
-
-
-@pytest.mark.e2e
-def test_fragments_are_opaque_ciphertext(tmp_path: Path) -> None:
-    _skip_if_services_unreachable()
-    repo_dir = tmp_path / "repo"
-    _init_clean_repo(repo_dir)
-
-    key_payload = _run_file_io_ray_submit(repo_dir)
-    fragments = _fetch_fragments(key_payload["session_id"], key_payload["token"])
-
-    assert fragments, "Expected encrypted fragment batches"
-    for fragment in fragments:
-        encrypted_batch = str(fragment.get("encrypted_batch") or "")
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+
+    expected_markers = ("basic_file_io/input.json", "basic_file_io/output.json")
+    for batch in batches:
+        encrypted_batch = str(batch.get("encrypted_batch") or "")
         assert encrypted_batch
-        assert PLAINTEXT_MARKER not in encrypted_batch
-        assert f"/tmp/{PLAINTEXT_MARKER}.txt" not in encrypted_batch
+        assert not any(marker in encrypted_batch for marker in expected_markers)
+
+
+def test_decrypted_fragments_include_worker_file_reads_and_writes(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("fragment-stream")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
+    )
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    paths = _fragment_paths(fragments)
+
+    assert any(path.endswith("/artifacts/basic_file_io/input.json") for path in paths)
+    assert any(path.endswith("/artifacts/basic_file_io/output.json") for path in paths)
diff --git a/tests/e2e/ray/test_glaas_url_only.py b/tests/e2e/ray/test_glaas_url_only.py
new file mode 100644
index 00000000..98b10f66
--- /dev/null
+++ b/tests/e2e/ray/test_glaas_url_only.py
@@ -0,0 +1,52 @@
+from __future__ import annotations
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _fragment_paths(fragments: list[dict[str, object]]) -> set[str]:
+    paths: set[str] = set()
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if isinstance(ref, dict) and isinstance(ref.get("path"), str):
+                    paths.add(ref["path"])
+    return paths
+
+
+def test_host_submit_streams_fragments_with_glaas_url_only(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("glaas-url-only")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
+        extra_env={"GLAAS_API_URL": ""},
+    )
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    assert batches, "Expected streamed fragment batches when only GLAAS_URL is configured"
+
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    paths = _fragment_paths(fragments)
+    assert any(path.endswith("/artifacts/basic_file_io/input.json") for path in paths)
+    assert any(path.endswith("/artifacts/basic_file_io/output.json") for path in paths)
diff --git a/tests/e2e/ray/test_harness_smoke.py b/tests/e2e/ray/test_harness_smoke.py
index 156f7a30..d6148d5f 100644
--- a/tests/e2e/ray/test_harness_smoke.py
+++ b/tests/e2e/ray/test_harness_smoke.py
@@ -56,17 +56,13 @@ def test_minio_is_accessible(ray_cluster: dict[str, str]) -> None:
     assert "test-bucket" in buckets
 
 
-def test_shared_volume_accessible(ray_connection) -> None:
+def test_worker_local_filesystem_accessible(ray_connection) -> None:
     @ray.remote
-    def write_shared(path: str, content: str) -> str:
+    def write_and_read_local_file() -> str:
+        path = "/tmp/roar_smoke_test.txt"
         with open(path, "w", encoding="utf-8") as handle:
-            handle.write(content)
-        return path
-
-    @ray.remote
-    def read_shared(path: str) -> str:
+            handle.write("local-data-ok")
         with open(path, encoding="utf-8") as handle:
             return handle.read()
 
-    shared_path = ray.get(write_shared.remote("/shared/smoke_test.txt", "shared-data-ok"))
-    assert ray.get(read_shared.remote(shared_path)) == "shared-data-ok"
+    assert ray.get(write_and_read_local_file.remote()) == "local-data-ok"
diff --git a/tests/e2e/ray/test_infra_health.py b/tests/e2e/ray/test_infra_health.py
index 310fba6d..943a0e47 100644
--- a/tests/e2e/ray/test_infra_health.py
+++ b/tests/e2e/ray/test_infra_health.py
@@ -3,9 +3,12 @@
 from __future__ import annotations
 
 import json
+import shutil
 import subprocess
+import sys
 import urllib.error
 import urllib.request
+from pathlib import Path
 
 import pytest
 
@@ -44,6 +47,8 @@ def _looks_like_connection_error(output: str) -> bool:
         "unable to connect",
         "cannot connect",
         "could not connect",
+        "connection reset by peer",
+        "connection aborted",
         "max retries exceeded",
         "connection error",
         "timed out",
@@ -69,8 +74,16 @@ def test_ray_head_dashboard_is_reachable() -> None:
 
 @pytest.mark.e2e
 def test_ray_job_submission_works() -> None:
+    ray_cli = shutil.which("ray")
+    if ray_cli is None:
+        fallback_ray = Path(sys.executable).parent / "ray"
+        if fallback_ray.exists():
+            ray_cli = str(fallback_ray)
+        else:
+            pytest.fail("ray CLI is not available in PATH or next to the active Python interpreter")
+
     command = [
-        "ray",
+        ray_cli,
         "job",
         "submit",
         "--address",
@@ -82,7 +95,7 @@ def test_ray_job_submission_works() -> None:
         "import ray; ray.init(); print('HEALTH_OK'); ray.shutdown()",
     ]
     fallback_command = [
-        "ray",
+        ray_cli,
         "job",
         "submit",
         "--address",
@@ -142,7 +155,9 @@ def test_glaas_health_endpoint_responds() -> None:
 
     assert status == 200, f"Expected 200 from {url}, got {status}. Body: {body}"
     payload = _parse_json_or_fail(body, "GLaaS health endpoint")
-    assert payload.get("success") is True, f"Expected success=true in response. Body: {body}"
+    assert payload.get("success") is True or str(payload.get("status", "")).lower() == "healthy", (
+        f"Expected success=true or status=healthy in response. Body: {body}"
+    )
 
 
 @pytest.mark.e2e
diff --git a/tests/e2e/ray/test_job_timing_contract.py b/tests/e2e/ray/test_job_timing_contract.py
new file mode 100644
index 00000000..7e6804d0
--- /dev/null
+++ b/tests/e2e/ray/test_job_timing_contract.py
@@ -0,0 +1,122 @@
+"""Timing contracts for Ray lineage jobs on the real submit path."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(300)]
+
+PHASE_COMMAND = "ray_task:timing_phase"
+TASK_COMMAND = "ray_task:timed_write"
+
+
+def _parse_payload(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict) and payload.get("script") == "timing_contract":
+            return payload
+    raise AssertionError(f"Unable to parse timing payload from output:\n{stdout}")
+
+
+@pytest.fixture(scope="module")
+def timing_contract_run(ray_cluster: dict[str, str]) -> dict[str, object]:
+    project_dir = make_host_project_dir("ray-job-timing")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "timing_contract/main.py",
+        use_fragment_store=True,
+        extra_env={"S3_RESULTS_BUCKET": "output-bucket"},
+        timeout=300,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+    return {
+        "project_dir": project_dir,
+        "payload": _parse_payload(result.stdout),
+    }
+
+
+def _job_row(project_dir: Path, command: str) -> dict[str, object]:
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT id, timestamp, duration_seconds, command, script, job_type
+        FROM jobs
+        WHERE command = ?
+        ORDER BY id DESC
+        LIMIT 1
+        """,
+        (command,),
+    )
+    assert rows, f"Expected a persisted Ray job for {command}"
+    return rows[0]
+
+
+def test_phase_job_timestamp_matches_phase_start_time(
+    timing_contract_run: dict[str, object],
+) -> None:
+    project_dir = timing_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+    payload = timing_contract_run["payload"]
+    assert isinstance(payload, dict)
+
+    phase_job = _job_row(project_dir, PHASE_COMMAND)
+    phase_started_at = float(payload["phase_started_at"])
+    phase_ended_at = float(payload["phase_ended_at"])
+
+    assert abs(float(phase_job["timestamp"]) - phase_started_at) < 0.5, (
+        "Expected the persisted phase job timestamp to reflect when the phase started, "
+        f"job={phase_job}, phase_started_at={phase_started_at}, phase_ended_at={phase_ended_at}"
+    )
+
+
+def test_task_job_timestamp_matches_task_start_time(
+    timing_contract_run: dict[str, object],
+) -> None:
+    project_dir = timing_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+    payload = timing_contract_run["payload"]
+    assert isinstance(payload, dict)
+
+    task_job = _job_row(project_dir, TASK_COMMAND)
+    task_started_at = float(payload["task_started_at"])
+
+    assert abs(float(task_job["timestamp"]) - task_started_at) < 0.5, (
+        "Expected the persisted task job timestamp to reflect task start time, "
+        f"job={task_job}, task_started_at={task_started_at}"
+    )
+
+
+def test_task_job_duration_tracks_full_task_wall_time(
+    timing_contract_run: dict[str, object],
+) -> None:
+    project_dir = timing_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+    payload = timing_contract_run["payload"]
+    assert isinstance(payload, dict)
+
+    task_job = _job_row(project_dir, TASK_COMMAND)
+    task_expected_duration_seconds = float(payload["task_expected_duration_seconds"])
+
+    assert float(task_job["duration_seconds"]) >= task_expected_duration_seconds - 0.2, (
+        "Expected the persisted task duration to cover the full task wall time, "
+        f"job={task_job}, expected_duration={task_expected_duration_seconds}"
+    )
diff --git a/tests/e2e/ray/test_multi_node_capture.py b/tests/e2e/ray/test_multi_node_capture.py
deleted file mode 100644
index 65693afa..00000000
--- a/tests/e2e/ray/test_multi_node_capture.py
+++ /dev/null
@@ -1,160 +0,0 @@
-"""
-TDD: roar captures I/O from workers on remote nodes, not just the driver.
-
-These tests verify that roar's per-node agent correctly instruments workers
-on ray-worker-1 and ray-worker-2 (separate Docker containers from ray-head).
-
-They FAIL until the per-node agent is implemented and log collection works.
-
-Run against a live cluster:
-    pytest tests/e2e/ray/test_multi_node_capture.py -v --timeout=180
-"""
-
-from __future__ import annotations
-
-import json
-import subprocess
-from pathlib import Path
-
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-def _get_worker_container_ips(compose_file: Path) -> dict[str, str]:
-    """Return {container_name: ip} for the two worker containers."""
-    ips = {}
-    for service in ("ray-worker-1", "ray-worker-2"):
-        result = subprocess.run(
-            [
-                "docker",
-                "compose",
-                "-f",
-                str(compose_file),
-                "exec",
-                "-T",
-                service,
-                "hostname",
-                "-i",
-            ],
-            capture_output=True,
-            text=True,
-            check=False,
-        )
-        if result.returncode == 0:
-            ips[service] = result.stdout.strip()
-    return ips
-
-
-class TestMultiNodeCapture:
-    """roar captures I/O from workers running on remote Docker containers."""
-
-    def test_io_captured_from_worker_containers(self, ray_cluster):
-        """
-        The lineage DB should contain artifacts from I/O that happened
-        inside ray-worker-1 or ray-worker-2 containers — not just the driver.
-
-        We verify this by checking that captured artifact metadata includes
-        node IDs corresponding to the worker containers.
-
-        FAILS until roar's per-node agent ships logs from workers back to the driver.
-        """
-        stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/attributed_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        # Parse job output to get which node_ids the tasks ran on
-        result_json = None
-        for line in stdout.splitlines():
-            try:
-                result_json = json.loads(line)
-                break
-            except json.JSONDecodeError:
-                continue
-
-        assert result_json is not None, f"Could not parse job output: {stdout}"
-
-        worker_node_ids = {r["node_id"] for r in result_json["writes"]}
-        assert len(worker_node_ids) >= 2, (
-            f"Tasks only ran on {len(worker_node_ids)} node(s): {worker_node_ids}. "
-            "Need tasks on at least 2 nodes to test multi-node capture."
-        )
-
-        # Check that the roar DB contains artifacts from at least 2 distinct nodes
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT DISTINCT json_extract(metadata, '$.ray_node_id') as node_id "
-            "FROM artifacts "
-            "WHERE path LIKE '%attributed%' AND metadata IS NOT NULL",
-        )
-        captured_node_ids = {r["node_id"] for r in rows if r["node_id"]}
-        assert len(captured_node_ids) >= 2, (
-            f"Expected artifacts from ≥ 2 Ray nodes, got {len(captured_node_ids)}: "
-            f"{captured_node_ids}. "
-            "roar's per-node agent is not collecting I/O from remote worker containers."
-        )
-
-    def test_worker_logs_merged_into_single_lineage_record(self, ray_cluster):
-        """
-        Even though I/O happens on multiple nodes, roar should produce a single
-        unified job record in the local DB with all artifacts from all nodes.
-
-        FAILS until multi-node log merging is implemented in the driver.
-        """
-        submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/attributed_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-
-        # Should be exactly 1 job record
-        job_rows = _query_roar_db(COMPOSE_FILE, "SELECT id FROM jobs")
-        assert len(job_rows) == 1, (
-            f"Expected 1 unified job record, got {len(job_rows)}. "
-            "Multi-node logs should merge into a single job."
-        )
-
-        # That job should reference artifacts from multiple nodes
-        artifact_rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT COUNT(*) as cnt FROM artifacts WHERE path LIKE '%attributed%'",
-        )
-        count = artifact_rows[0]["cnt"] if artifact_rows else 0
-        assert count >= 6, (
-            f"Expected ≥ 6 artifacts in the unified job record, got {count}. "
-            "Worker-node artifacts are not being merged into the driver's job record."
-        )
-
-    def test_native_tracer_captures_non_python_io(self, ray_cluster):
-        """
-        Ray Data uses Arrow C++ under the hood — it bypasses Python's open().
-        The native tracer (eBPF/preload) on each worker node should capture this
-        at the syscall level.
-
-        FAILS until native tracers are running on remote worker nodes.
-        """
-        # Ray Data job writing parquet (Arrow, not Python open())
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/pipeline.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        # Parquet files written by Arrow bypass Python's open(),
-        # so they'll only appear if the native tracer is running on the worker.
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT path, capture_method FROM artifacts "
-            "WHERE path LIKE '%.parquet' AND capture_method = 'tracer'",
-        )
-        assert len(rows) >= 1, (
-            "Expected parquet output to be captured by the native tracer on the worker node. "
-            "No tracer-captured parquet artifacts found. "
-            "roar's native tracer is not running on remote worker containers."
-        )
diff --git a/tests/e2e/ray/test_native_background_thread_attribution.py b/tests/e2e/ray/test_native_background_thread_attribution.py
new file mode 100644
index 00000000..c13e07b7
--- /dev/null
+++ b/tests/e2e/ray/test_native_background_thread_attribution.py
@@ -0,0 +1,150 @@
+"""Ray contract: background-thread native writes must stay on the launching Ray task."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, dict[str, str]]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            normalized: dict[str, dict[str, str]] = {}
+            for key, value in payload.items():
+                if isinstance(value, dict):
+                    normalized[str(key)] = {str(k): str(v) for k, v in value.items()}
+            return normalized
+    return {}
+
+
+def _fragment_entries_for_path(
+    fragments: list[dict[str, object]],
+    suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = ref.get("path")
+                if isinstance(path, str) and path.endswith(suffix):
+                    matches.append(
+                        {
+                            "io_kind": key,
+                            "ray_task_id": fragment.get("ray_task_id"),
+                            "ray_worker_id": fragment.get("ray_worker_id"),
+                            **ref,
+                        }
+                    )
+    return matches
+
+
+def _output_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id, path
+        """,
+        (path_like,),
+    )
+
+
+def test_host_submit_reconstitutes_background_thread_native_write_on_launching_task(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-native-background-thread")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "native_background_thread_attribution.py",
+        use_fragment_store=True,
+        tracer="ptrace",
+        extra_env={
+            "ROAR_FRAGMENT_IDLE_FLUSH_INTERVAL": "10",
+            "ROAR_FRAGMENT_FLUSH_INTERVAL": "10",
+        },
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert set(payload) == {"launch", "block", "waited"}
+
+    launch = payload["launch"]
+    block = payload["block"]
+    waited = payload["waited"]
+
+    assert "libroar_tracer_preload.so" in launch.get("ld_preload", "")
+    assert launch.get("trace_sock"), launch
+    assert launch.get("worker_id"), launch
+    assert launch.get("task_id"), launch
+    assert launch.get("launch_thread_id"), launch
+    assert launch.get("background_thread_id"), launch
+    assert launch["background_thread_id"] != launch["launch_thread_id"], payload
+    assert waited.get("returncode") == "0", payload
+
+    assert block.get("worker_id") == launch.get("worker_id"), payload
+    assert block.get("task_id"), payload
+    assert block.get("task_id") != launch.get("task_id"), payload
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    fragment_refs = _fragment_entries_for_path(
+        fragments,
+        "/artifacts/native_background_thread_output.txt",
+    )
+    native_refs = [ref for ref in fragment_refs if str(ref.get("capture_method") or "") == "native"]
+
+    assert native_refs, "Expected native fragment refs for the background-thread native output"
+    assert {str(ref.get("ray_worker_id") or "") for ref in native_refs} == {launch["worker_id"]}
+    assert {str(ref.get("ray_task_id") or "") for ref in native_refs} == {launch["task_id"]}, (
+        payload,
+        native_refs,
+    )
+
+    rows = _output_rows(project_dir, "%/artifacts/native_background_thread_output.txt")
+    assert rows, "Expected background-thread native output artifact in the reconstituted roar.db"
+    assert {str(row.get("capture_method") or "") for row in rows} == {"native"}
+    assert {str(row.get("ray_task_id") or "") for row in rows} == {launch["task_id"]}, (
+        payload,
+        rows,
+    )
diff --git a/tests/e2e/ray/test_native_library_tracing.py b/tests/e2e/ray/test_native_library_tracing.py
new file mode 100644
index 00000000..3dbbe669
--- /dev/null
+++ b/tests/e2e/ray/test_native_library_tracing.py
@@ -0,0 +1,123 @@
+"""Ray contract: in-process native library I/O must surface as native lineage."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, str]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return {str(key): str(value) for key, value in payload.items()}
+    return {}
+
+
+def _fragment_entries_for_path(
+    fragments: list[dict[str, object]],
+    suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = ref.get("path")
+                if isinstance(path, str) and path.endswith(suffix):
+                    matches.append(
+                        {
+                            "io_kind": key,
+                            "ray_task_id": fragment.get("ray_task_id"),
+                            "ray_worker_id": fragment.get("ray_worker_id"),
+                            **ref,
+                        }
+                    )
+    return matches
+
+
+def _output_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id, path
+        """,
+        (path_like,),
+    )
+
+
+def test_host_submit_reconstitutes_inprocess_native_library_output(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-native-lib")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "native_library_tracing.py",
+        use_fragment_store=True,
+        tracer="ptrace",
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert "libroar_tracer_preload.so" in payload.get("ld_preload", "")
+    assert payload.get("trace_sock"), payload
+    assert payload.get("task_id"), payload
+    assert payload.get("worker_id"), payload
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    fragment_refs = _fragment_entries_for_path(fragments, "/artifacts/native_library_output.txt")
+    native_output_refs = [
+        ref for ref in fragment_refs if str(ref.get("capture_method") or "") == "native"
+    ]
+
+    assert native_output_refs, (
+        "Expected native fragment refs for the in-process native library output artifact"
+    )
+    assert {str(ref.get("ray_worker_id") or "") for ref in native_output_refs} == {
+        payload["worker_id"]
+    }
+
+    rows = _output_rows(project_dir, "%/artifacts/native_library_output.txt")
+    assert rows, "Expected in-process native library output artifact in the reconstituted roar.db"
+    assert {str(row.get("capture_method") or "") for row in rows} == {"native"}
diff --git a/tests/e2e/ray/test_native_task_attribution.py b/tests/e2e/ray/test_native_task_attribution.py
new file mode 100644
index 00000000..eb1302f5
--- /dev/null
+++ b/tests/e2e/ray/test_native_task_attribution.py
@@ -0,0 +1,145 @@
+"""Ray contract: delayed native child I/O must surface as native lineage."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, str]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return payload
+    return {}
+
+
+def _fragment_entries_for_path(
+    fragments: list[dict[str, object]],
+    suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = ref.get("path")
+                if isinstance(path, str) and path.endswith(suffix):
+                    matches.append(
+                        {
+                            "io_kind": key,
+                            "ray_task_id": fragment.get("ray_task_id"),
+                            "ray_worker_id": fragment.get("ray_worker_id"),
+                            **ref,
+                        }
+                    )
+    return matches
+
+
+def _output_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id, path
+        """,
+        (path_like,),
+    )
+
+
+def test_host_submit_reconstitutes_delayed_native_child_output(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-native-task")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "native_task_attribution.py",
+        use_fragment_store=True,
+        tracer="ptrace",
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+
+    launch = payload.get("launch")
+    block = payload.get("block")
+    waited = payload.get("waited")
+    assert isinstance(launch, dict), payload
+    assert isinstance(block, dict), payload
+    assert isinstance(waited, dict), payload
+
+    launch_payload = {str(key): str(value) for key, value in launch.items()}
+    block_payload = {str(key): str(value) for key, value in block.items()}
+    child_results = waited.get("children")
+    assert isinstance(child_results, list), waited
+
+    assert "libroar_tracer_preload.so" in launch_payload.get("ld_preload", "")
+    assert launch_payload.get("trace_sock"), launch_payload
+    assert launch_payload.get("task_id"), launch_payload
+    assert launch_payload.get("worker_id"), launch_payload
+    assert block_payload.get("task_id"), block_payload
+    assert block_payload.get("worker_id") == launch_payload.get("worker_id")
+    assert block_payload.get("task_id") != launch_payload.get("task_id")
+    assert child_results, waited
+    assert all(int(child.get("returncode", -1)) == 0 for child in child_results), waited
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    fragment_refs = _fragment_entries_for_path(fragments, "/artifacts/native_task_output.txt")
+    native_output_refs = [
+        ref for ref in fragment_refs if str(ref.get("capture_method") or "") == "native"
+    ]
+
+    assert native_output_refs, (
+        "Expected native fragment refs for the delayed native child output artifact"
+    )
+    assert {str(ref.get("ray_worker_id") or "") for ref in native_output_refs} == {
+        launch_payload["worker_id"]
+    }
+    assert {str(ref.get("ray_task_id") or "") for ref in native_output_refs} == {
+        launch_payload["task_id"]
+    }, native_output_refs
+
+    rows = _output_rows(project_dir, "%/artifacts/native_task_output.txt")
+    assert rows, "Expected delayed native child output artifact in the reconstituted roar.db"
+    assert {str(row.get("capture_method") or "") for row in rows} == {"native"}
+    assert {str(row.get("ray_task_id") or "") for row in rows} == {launch_payload["task_id"]}, rows
diff --git a/tests/e2e/ray/test_native_thread_attribution.py b/tests/e2e/ray/test_native_thread_attribution.py
new file mode 100644
index 00000000..1c7b4750
--- /dev/null
+++ b/tests/e2e/ray/test_native_thread_attribution.py
@@ -0,0 +1,149 @@
+"""Ray contract: same-process native writes must stay on their originating Ray task."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, dict[str, str]]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            normalized: dict[str, dict[str, str]] = {}
+            for key, value in payload.items():
+                if isinstance(value, dict):
+                    normalized[str(key)] = {str(k): str(v) for k, v in value.items()}
+            return normalized
+    return {}
+
+
+def _fragment_entries_for_path(
+    fragments: list[dict[str, object]],
+    suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = ref.get("path")
+                if isinstance(path, str) and path.endswith(suffix):
+                    matches.append(
+                        {
+                            "io_kind": key,
+                            "ray_task_id": fragment.get("ray_task_id"),
+                            "ray_worker_id": fragment.get("ray_worker_id"),
+                            **ref,
+                        }
+                    )
+    return matches
+
+
+def _output_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id, path
+        """,
+        (path_like,),
+    )
+
+
+def test_host_submit_reconstitutes_same_process_native_writes_on_their_originating_task(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-native-thread")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "native_thread_attribution.py",
+        use_fragment_store=True,
+        tracer="ptrace",
+        extra_env={
+            "ROAR_FRAGMENT_IDLE_FLUSH_INTERVAL": "10",
+            "ROAR_FRAGMENT_FLUSH_INTERVAL": "10",
+        },
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert set(payload) == {"fast", "slow"}
+
+    fast = payload["fast"]
+    slow = payload["slow"]
+    assert "libroar_tracer_preload.so" in fast.get("ld_preload", "")
+    assert fast.get("trace_sock"), fast
+    assert fast.get("worker_id"), fast
+    assert fast.get("task_id"), fast
+    assert slow.get("worker_id") == fast.get("worker_id")
+    assert slow.get("task_id") and slow.get("task_id") != fast.get("task_id")
+    assert fast.get("thread_id") and slow.get("thread_id")
+    assert fast["thread_id"] != slow["thread_id"], payload
+    assert fast.get("native_thread_id") == fast["thread_id"], payload
+    assert slow.get("native_thread_id") == slow["thread_id"], payload
+    assert fast.get("pre_write_bound_task_id") == fast["task_id"], payload
+    assert slow.get("pre_write_bound_task_id") == slow["task_id"], payload
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+
+    expectations = {
+        "fast": ("/artifacts/native_thread_fast.txt", fast["task_id"]),
+        "slow": ("/artifacts/native_thread_slow.txt", slow["task_id"]),
+    }
+    for label, (suffix, expected_task_id) in expectations.items():
+        fragment_refs = _fragment_entries_for_path(fragments, suffix)
+        native_refs = [
+            ref for ref in fragment_refs if str(ref.get("capture_method") or "") == "native"
+        ]
+
+        assert native_refs, f"Expected native fragment refs for {label} output"
+        assert {str(ref.get("ray_worker_id") or "") for ref in native_refs} == {fast["worker_id"]}
+        assert {str(ref.get("ray_task_id") or "") for ref in native_refs} == {expected_task_id}, (
+            native_refs
+        )
+
+        rows = _output_rows(project_dir, f"%{suffix}")
+        assert rows, f"Expected reconstituted roar.db rows for {label} output"
+        assert {str(row.get("capture_method") or "") for row in rows} == {"native"}
+        assert {str(row.get("ray_task_id") or "") for row in rows} == {expected_task_id}, rows
diff --git a/tests/e2e/ray/test_native_tracing.py b/tests/e2e/ray/test_native_tracing.py
index b13c167d..3c2ec72e 100644
--- a/tests/e2e/ray/test_native_tracing.py
+++ b/tests/e2e/ray/test_native_tracing.py
@@ -1,40 +1,23 @@
-"""TDD: Ray workers get native preload tracing via runtime_env wrapper."""
+"""Ray contract: host submit activates worker native tracing and reconstitutes it."""
 
 from __future__ import annotations
 
 import json
-import subprocess
 from pathlib import Path
 
 import pytest
 
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-@pytest.fixture(autouse=True)
-def reset_roar_state(ray_cluster):
-    """Reset roar state on the head node before each test."""
-    subprocess.run(
-        [
-            "docker",
-            "compose",
-            "-f",
-            str(COMPOSE_FILE),
-            "exec",
-            "-T",
-            "ray-head",
-            "bash",
-            "-c",
-            "rm -rf /app/.roar /shared/.roar-logs && roar init --path /app -n",
-        ],
-        check=False,
-        capture_output=True,
-    )
-    yield
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
 
 
 def _parse_json_line(stdout: str) -> dict[str, str]:
@@ -51,21 +34,114 @@ def _parse_json_line(stdout: str) -> dict[str, str]:
     return {}
 
 
-class TestNativeTracing:
-    def test_worker_ld_preload_and_artifact_capture(self, ray_cluster):
-        stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/native_tracing.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}\n{stdout}"
-
-        payload = _parse_json_line(stdout)
-        assert payload, f"Expected JSON payload in stdout, got:\n{stdout}"
-        assert "libroar_tracer_preload.so" in payload.get("ld_preload", "")
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT first_seen_path FROM artifacts WHERE first_seen_path LIKE '%native_tracing_output.txt'",
-        )
-        assert rows, "Expected native tracing output artifact to be captured in roar.db"
+def _artifact_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT COALESCE(path, first_seen_path) AS path,
+               capture_method
+        FROM artifacts
+        WHERE COALESCE(path, first_seen_path) LIKE ?
+        ORDER BY id
+        """,
+        (path_like,),
+    )
+
+
+def _fragment_entries_for_path(
+    fragments: list[dict[str, object]],
+    suffix: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = ref.get("path")
+                if isinstance(path, str) and path.endswith(suffix):
+                    matches.append(
+                        {
+                            "io_kind": key,
+                            "ray_task_id": fragment.get("ray_task_id"),
+                            "ray_worker_id": fragment.get("ray_worker_id"),
+                            **ref,
+                        }
+                    )
+    return matches
+
+
+def _native_entries_for_worker(
+    fragments: list[dict[str, object]],
+    worker_id: str,
+) -> list[dict[str, object]]:
+    matches: list[dict[str, object]] = []
+    for fragment in fragments:
+        if str(fragment.get("ray_worker_id") or "") != worker_id:
+            continue
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                if str(ref.get("capture_method") or "") != "native":
+                    continue
+                matches.append(
+                    {
+                        "io_kind": key,
+                        "ray_task_id": fragment.get("ray_task_id"),
+                        "ray_worker_id": fragment.get("ray_worker_id"),
+                        **ref,
+                    }
+                )
+    return matches
+
+
+def test_host_submit_reconstitutes_native_worker_lineage(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-native")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "native_tracing.py",
+        use_fragment_store=True,
+        tracer="ptrace",
+    )
+
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert "libroar_tracer_preload.so" in payload.get("ld_preload", "")
+    assert payload.get("trace_sock"), payload
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+    fragment_refs = _fragment_entries_for_path(fragments, "/artifacts/native_tracing_output.txt")
+    native_worker_refs = _native_entries_for_worker(fragments, payload.get("worker_id", ""))
+
+    assert fragment_refs, "Expected fragment payloads for the worker output artifact"
+    assert any(
+        str(ref.get("ray_worker_id")) == payload.get("worker_id", "") for ref in fragment_refs
+    ), (
+        "Expected the output artifact fragments to belong to the worker that reported preload activation"
+    )
+    assert native_worker_refs, (
+        "Expected at least one native fragment entry from the same Ray worker that reported "
+        "preload activation"
+    )
+
+    rows = _artifact_rows(project_dir, "%/artifacts/native_tracing_output.txt")
+    assert rows, "Expected worker output artifact to be reconstituted into the host roar.db"
+    assert "[roar] lineage reconstituted:" in f"{result.stdout}\n{result.stderr}"
diff --git a/tests/e2e/ray/test_nested_subprocess_s3_lineage.py b/tests/e2e/ray/test_nested_subprocess_s3_lineage.py
new file mode 100644
index 00000000..b711d050
--- /dev/null
+++ b/tests/e2e/ray/test_nested_subprocess_s3_lineage.py
@@ -0,0 +1,66 @@
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_payload(stdout: str) -> dict[str, str]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped.startswith("{"):
+            continue
+        payload = json.loads(stripped)
+        if isinstance(payload, dict):
+            return {str(key): str(value) for key, value in payload.items()}
+    raise AssertionError(f"Expected JSON payload in stdout, got:\n{stdout}")
+
+
+def test_nested_subprocess_ray_task_keeps_s3_lineage_out_of_proxy_only_bucket(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("nested-subprocess-s3")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "nested_subprocess_s3_lineage.py",
+        use_fragment_store=True,
+        timeout=240,
+    )
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    payload = _parse_payload(result.stdout)
+    output_uri = str(payload.get("output_uri") or "")
+    assert output_uri.startswith("s3://test-bucket/nested-subprocess/"), payload
+
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT j.command,
+               j.script,
+               json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) = ?
+        ORDER BY j.command
+        """,
+        (output_uri,),
+    )
+
+    assert rows, f"Expected reconstituted S3 output row for {output_uri}"
+    assert any(str(row.get("command") or "") == "ray_task:write_s3" for row in rows), rows
diff --git a/tests/e2e/ray/test_phase_lineage_contract.py b/tests/e2e/ray/test_phase_lineage_contract.py
new file mode 100644
index 00000000..91eda541
--- /dev/null
+++ b/tests/e2e/ray/test_phase_lineage_contract.py
@@ -0,0 +1,171 @@
+"""Ray lineage contract tests for a simple multi-phase submit pipeline."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_cli_from_host,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(240)]
+
+EXPECTED_PHASE_COMMANDS = (
+    "ray_task:extraction",
+    "ray_task:training",
+    "ray_task:evaluation",
+)
+
+
+def _parse_payload(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict) and payload.get("script") == "phase_lineage_contract":
+            return payload
+    raise AssertionError(f"Unable to parse phase-lineage payload from output:\n{stdout}")
+
+
+def _run_pipeline(project_dir: Path, ray_cluster: dict[str, str]) -> dict[str, object]:
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "phase_lineage_contract/main.py",
+        use_fragment_store=True,
+        timeout=240,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+    return _parse_payload(result.stdout)
+
+
+def _phase_jobs(project_dir: Path) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT id, step_number, command, script, job_uid
+        FROM jobs
+        WHERE job_type = 'ray_task'
+        ORDER BY step_number, id
+        """,
+    )
+
+
+@pytest.fixture(scope="module")
+def phase_lineage_contract_run(ray_cluster: dict[str, str]) -> dict[str, object]:
+    project_dir = make_host_project_dir("phase-lineage-contract")
+    init_host_project(project_dir)
+    payload = _run_pipeline(project_dir, ray_cluster)
+    return {
+        "project_dir": project_dir,
+        "payload": payload,
+    }
+
+
+def _jobs_by_command(project_dir: Path) -> dict[str, dict[str, object]]:
+    return {str(row["command"]): row for row in _phase_jobs(project_dir)}
+
+
+def _dag_payload(project_dir: Path) -> dict[str, object]:
+    dag_result = run_roar_cli_from_host(project_dir, "dag", "--expanded", "--json", timeout=30)
+    assert dag_result.returncode == 0, dag_result.stderr or dag_result.stdout
+    dag_payload = json.loads(dag_result.stdout)
+    assert isinstance(dag_payload, dict), dag_result.stdout
+    return dag_payload
+
+
+def test_phase_lineage_contract_persists_expected_phase_jobs(
+    phase_lineage_contract_run: dict[str, object],
+) -> None:
+    project_dir = phase_lineage_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+    payload = phase_lineage_contract_run["payload"]
+    assert isinstance(payload, dict)
+
+    run_id = str(payload.get("run_id") or "")
+    report_key = str(payload.get("report_key") or "")
+
+    assert run_id, payload
+    assert report_key.endswith("/reports/final_report.json"), payload
+
+    phase_jobs = _phase_jobs(project_dir)
+    jobs_by_command = _jobs_by_command(project_dir)
+
+    missing_commands = [
+        command for command in EXPECTED_PHASE_COMMANDS if command not in jobs_by_command
+    ]
+    assert not missing_commands, (
+        "Expected first-class Ray phase jobs in the reconstituted DB, "
+        f"missing={missing_commands}, observed={[row['command'] for row in phase_jobs]}"
+    )
+
+    extract_step = int(jobs_by_command["ray_task:extract_dataset"]["step_number"])
+    train_step = int(jobs_by_command["ray_task:train_model"]["step_number"])
+    evaluate_step = int(jobs_by_command["ray_task:evaluate_model"]["step_number"])
+    assert extract_step < train_step < evaluate_step, phase_jobs
+
+    report_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT COALESCE(path, first_seen_path) AS path
+        FROM artifacts
+        WHERE COALESCE(path, first_seen_path) LIKE ?
+        """,
+        (f"%phase-lineage/{run_id}/reports/final_report.json",),
+    )
+    assert report_rows, "Expected the final report artifact in the host lineage DB"
+
+
+def test_phase_lineage_contract_dag_surfaces_dependency_chain(
+    phase_lineage_contract_run: dict[str, object],
+) -> None:
+    project_dir = phase_lineage_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+
+    jobs_by_command = _jobs_by_command(project_dir)
+    dag_payload = _dag_payload(project_dir)
+    extract_step = int(jobs_by_command["ray_task:extract_dataset"]["step_number"])
+    train_step = int(jobs_by_command["ray_task:train_model"]["step_number"])
+
+    nodes_by_command = {
+        str(node.get("command")): node
+        for node in dag_payload.get("nodes", [])
+        if isinstance(node, dict) and str(node.get("command", "")).startswith("ray_task:")
+    }
+    missing_nodes = [
+        command for command in EXPECTED_PHASE_COMMANDS if command not in nodes_by_command
+    ]
+    assert not missing_nodes, (
+        "Expected `roar dag --expanded --json` to surface the phase Ray jobs, "
+        f"missing={missing_nodes}, observed={sorted(nodes_by_command)}"
+    )
+
+    assert train_step in nodes_by_command["ray_task:evaluate_model"]["dependencies"], dag_payload
+    assert extract_step in nodes_by_command["ray_task:train_model"]["dependencies"], dag_payload
+
+
+def test_phase_lineage_contract_show_resolves_phase_steps(
+    phase_lineage_contract_run: dict[str, object],
+) -> None:
+    project_dir = phase_lineage_contract_run["project_dir"]
+    assert isinstance(project_dir, Path)
+    jobs_by_command = _jobs_by_command(project_dir)
+
+    for command in EXPECTED_PHASE_COMMANDS:
+        phase_job = jobs_by_command[command]
+        step_ref = f"@{int(phase_job['step_number'])}"
+        show_result = run_roar_cli_from_host(project_dir, "show", step_ref, timeout=30)
+        assert show_result.returncode == 0, show_result.stderr or show_result.stdout
+        assert command in show_result.stdout, show_result.stdout
+        assert "Job not found" not in show_result.stdout, show_result.stdout
diff --git a/tests/e2e/ray/test_proxy_logs_collection.py b/tests/e2e/ray/test_proxy_logs_collection.py
new file mode 100644
index 00000000..95988d93
--- /dev/null
+++ b/tests/e2e/ray/test_proxy_logs_collection.py
@@ -0,0 +1,94 @@
+"""E2E: roar run ray job submit captures S3 proxy logs as artifacts."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import run_docker
+
+COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
+
+
+def _exec_on_head(cmd: str, env: dict[str, str] | None = None) -> tuple[str, str, int]:
+    """Run a shell command on the ray-head container."""
+    command = ["docker", "compose", "-f", str(COMPOSE_FILE), "exec", "-T"]
+    if env:
+        for k, v in env.items():
+            command.extend(["-e", f"{k}={v}"])
+    command.extend(["ray-head", "bash", "-c", cmd])
+    result = run_docker(command, capture_output=True, text=True, check=False)
+    return result.stdout, result.stderr, result.returncode
+
+
+def _query_artifact_count() -> int:
+    """Query roar.db on the head node for proxy-captured S3 artifacts."""
+    stdout, _, rc = _exec_on_head(
+        'python3 -c "'
+        "import sqlite3, sys; "
+        "conn = sqlite3.connect('/app/.roar/roar.db'); "
+        "count = conn.execute("
+        "\\\"SELECT COUNT(*) FROM artifacts WHERE first_seen_path LIKE 's3://%'\\\""
+        ").fetchone()[0]; "
+        "print(count); "
+        "conn.close()"
+        '"'
+    )
+    if rc != 0:
+        return 0
+    try:
+        return int(stdout.strip())
+    except ValueError:
+        return 0
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+def test_roar_run_ray_job_captures_s3_artifacts(ray_cluster: dict[str, str]) -> None:
+    """roar run ray job submit should capture S3 I/O via proxy into local artifacts.
+
+    FAILS until the `del proxy_logs` bug in _collect_ray_io is fixed.
+    """
+    del ray_cluster
+
+    # Init git + roar project (roar requires a git repo)
+    stdout, stderr, rc = _exec_on_head(
+        "cd /app && git config --global user.email test@test.com"
+        " && git config --global user.name test"
+        " && git init -q && git add -A && git commit -q -m init --allow-empty"
+        " && rm -rf .roar && roar init --path /app -n"
+    )
+    assert rc == 0, f"roar init failed:\n{stdout}\n{stderr}"
+
+    # Run the job via the real production path.
+    # ROAR_CLUSTER_PIP_REQ=skip prevents runtime_env from installing roar-cli from PyPI,
+    # so workers use the local code already installed in the Docker image.
+    env = {
+        "AWS_ENDPOINT_URL": "http://minio:9000",
+        "AWS_ACCESS_KEY_ID": "minioadmin",
+        "AWS_SECRET_ACCESS_KEY": "minioadmin",
+        "ROAR_CLUSTER_PIP_REQ": "skip",
+    }
+    stdout, stderr, rc = _exec_on_head(
+        "roar run ray job submit"
+        " --address http://127.0.0.1:8265"
+        " --working-dir /app"
+        " -- python tests/e2e/ray/jobs/s3_workload.py",
+        env=env,
+    )
+    combined = f"stdout:\n{stdout}\nstderr:\n{stderr}"
+
+    # Job should succeed
+    assert rc == 0, f"roar run ray job submit failed (rc={rc}):\n{combined}"
+
+    # After roar run completes, proxy logs should have been collected
+    # and reconstituted into local artifacts
+    count = _query_artifact_count()
+    assert count > 0, (
+        "Expected roar to capture S3 artifacts via proxy after "
+        "`roar run ray job submit`, but found 0.\n"
+        "This fails because `_collect_ray_io(proxy_logs=...)` does "
+        "`del proxy_logs` before processing.\n"
+        f"artifact_count={count}\n{combined}"
+    )
diff --git a/tests/e2e/ray/test_ray_contracts.py b/tests/e2e/ray/test_ray_contracts.py
new file mode 100644
index 00000000..b0ec2f04
--- /dev/null
+++ b/tests/e2e/ray/test_ray_contracts.py
@@ -0,0 +1,274 @@
+"""User-facing Ray contract tests for `roar run ray job submit ...`."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    decrypt_fragment_batches,
+    fetch_fragment_batches,
+    init_host_project,
+    load_fragment_key,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _assert_submit_ok(result) -> None:
+    assert result.returncode == 0, (
+        f"submit failed (rc={result.returncode})\nstdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+    )
+
+
+def _parse_last_json(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return payload
+    raise AssertionError(f"Unable to parse JSON payload from stdout:\n{stdout}")
+
+
+def _artifact_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT id,
+               COALESCE(path, first_seen_path) AS path,
+               capture_method,
+               metadata
+        FROM artifacts
+        WHERE COALESCE(path, first_seen_path) LIKE ?
+        ORDER BY id
+        """,
+        (path_like,),
+    )
+
+
+def _output_rows(project_dir: Path, path_like: str) -> list[dict[str, object]]:
+    return query_roar_db(
+        project_dir,
+        """
+        SELECT j.id AS job_id,
+               json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               json_extract(j.metadata, '$.ray_node_id') AS ray_node_id,
+               COALESCE(a.path, a.first_seen_path) AS path
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id, path
+        """,
+        (path_like,),
+    )
+
+
+def test_local_file_io_reconstitutes_into_host_roar_db(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-contract")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "basic_file_io.py",
+        use_fragment_store=True,
+    )
+
+    _assert_submit_ok(result)
+    inputs = query_roar_db(
+        project_dir,
+        "SELECT path FROM job_inputs WHERE path LIKE ?",
+        ("%/artifacts/basic_file_io/%",),
+    )
+    outputs = query_roar_db(
+        project_dir,
+        "SELECT path FROM job_outputs WHERE path LIKE ?",
+        ("%/artifacts/basic_file_io/%",),
+    )
+    jobs = query_roar_db(
+        project_dir,
+        """
+        SELECT job_uid, parent_job_uid, json_extract(metadata, '$.ray_task_id') AS ray_task_id
+        FROM jobs
+        WHERE job_type = 'ray_task'
+        """,
+    )
+
+    assert inputs
+    assert outputs
+    assert jobs
+    assert all(row["ray_task_id"] for row in jobs)
+    assert any(str(row["path"]).endswith("/artifacts/basic_file_io/input.json") for row in inputs)
+    assert any(str(row["path"]).endswith("/artifacts/basic_file_io/output.json") for row in outputs)
+
+
+def test_s3_proxy_routing_captures_worker_inputs_and_outputs(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-contract")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "s3_io.py",
+        use_fragment_store=True,
+    )
+
+    _assert_submit_ok(result)
+    artifacts = _artifact_rows(project_dir, "s3://%")
+    inputs = query_roar_db(
+        project_dir,
+        "SELECT path FROM job_inputs WHERE path LIKE 's3://%' ORDER BY path",
+    )
+    outputs = query_roar_db(
+        project_dir,
+        "SELECT path FROM job_outputs WHERE path LIKE 's3://%' ORDER BY path",
+    )
+
+    assert artifacts
+    assert inputs
+    assert outputs
+    assert any(row["capture_method"] == "proxy" for row in artifacts)
+
+
+def test_artifacts_are_attributed_to_distinct_ray_tasks(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-contract")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "attributed_file_io.py",
+        use_fragment_store=True,
+    )
+
+    _assert_submit_ok(result)
+    payload = _parse_last_json(result.stdout)
+    writes = payload.get("writes", [])
+    assert isinstance(writes, list) and len(writes) == 6
+
+    outputs = _output_rows(project_dir, "%/artifacts/attributed/%")
+    task_ids = {row["ray_task_id"] for row in outputs if row["ray_task_id"]}
+    output_paths = {row["path"] for row in outputs if row["path"]}
+
+    assert len(task_ids) == 6
+    assert len(output_paths) == 6
+
+
+def test_multi_node_lineage_merges_jobs_from_multiple_nodes(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-contract")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "s3_multi_node_affinity.py",
+        use_fragment_store=True,
+    )
+
+    _assert_submit_ok(result)
+    payload = _parse_last_json(result.stdout)
+    results = payload.get("results", [])
+    assert isinstance(results, list) and results
+
+    runtime_node_ids = {
+        str(item.get("node_id") or "") for item in results if isinstance(item, dict)
+    }
+    runtime_node_ids.discard("")
+    assert len(runtime_node_ids) >= 2
+
+    run_id = str(payload["run_id"])
+    db_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT DISTINCT json_extract(j.metadata, '$.ray_node_id') AS ray_node_id
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        """,
+        (f"%multi-node-affinity/{run_id}/%",),
+    )
+    db_node_ids = {str(row["ray_node_id"] or "") for row in db_rows}
+    db_node_ids.discard("")
+
+    assert len(db_node_ids) >= 2
+
+
+def test_fragments_capture_tmp_paths_but_reconstitution_filters_them_by_default(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("ray-contract")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "tmp_filter_probe.py",
+        use_fragment_store=True,
+    )
+
+    _assert_submit_ok(result)
+    payload = _parse_last_json(result.stdout)
+    workspace_path = str(payload["workspace_path"])
+    tmp_path_str = str(payload["tmp_path"])
+
+    key_payload = load_fragment_key(project_dir)
+    batches = fetch_fragment_batches(key_payload["session_id"], key_payload["token"])
+    fragments = decrypt_fragment_batches(batches, key_payload["token"])
+
+    captured_paths = set()
+    for fragment in fragments:
+        for key in ("reads", "writes"):
+            refs = fragment.get(key, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if isinstance(ref, dict) and isinstance(ref.get("path"), str):
+                    captured_paths.add(ref["path"])
+
+    assert workspace_path in captured_paths
+    assert tmp_path_str in captured_paths
+
+    kept_rows = _artifact_rows(project_dir, f"%{Path(workspace_path).name}")
+    filtered_rows = _artifact_rows(project_dir, f"%{Path(tmp_path_str).name}")
+
+    assert kept_rows
+    assert not filtered_rows
+
+
+def test_contract_workloads_remain_roar_unaware() -> None:
+    jobs_dir = Path(__file__).resolve().parent / "jobs"
+    workload_names = [
+        "basic_file_io.py",
+        "attributed_file_io.py",
+        "s3_io.py",
+        "s3_pipeline.py",
+        "tmp_filter_probe.py",
+    ]
+
+    for name in workload_names:
+        text = (jobs_dir / name).read_text(encoding="utf-8")
+        assert "import roar" not in text
+        assert "from roar" not in text
+        assert "ROAR_" not in text
diff --git a/tests/e2e/ray/test_ray_data_capture.py b/tests/e2e/ray/test_ray_data_capture.py
index 6b195faa..c80775f7 100644
--- a/tests/e2e/ray/test_ray_data_capture.py
+++ b/tests/e2e/ray/test_ray_data_capture.py
@@ -6,24 +6,36 @@
 
 from pathlib import Path
 
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
+import pytest
+
+from tests.e2e.ray.conftest import (
+    query_roar_db_on_head,
+    reset_roar_project_on_head,
+    run_roar_ray_job_on_head,
+)
 
 COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
 JOBS_DIR = "/app/tests/e2e/ray/jobs"
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_diagnostic, pytest.mark.timeout(180)]
+
+
+@pytest.fixture(autouse=True)
+def reset_roar_state(ray_cluster):
+    del ray_cluster
+    reset_roar_project_on_head(COMPOSE_FILE)
+    yield
 
 
 class TestRayDataCapture:
     def test_read_csv_and_write_parquet_are_captured(self, ray_cluster):
-        stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
+        stdout, stderr, returncode = run_roar_ray_job_on_head(
             f"{JOBS_DIR}/ray_data_pipeline.py",
-            env={"ROAR_WRAP": "1"},
+            compose_file=COMPOSE_FILE,
+            use_fragment_store=True,
         )
         assert returncode == 0, f"Job failed:\n{stderr}\n{stdout}"
 
-        input_rows = _query_roar_db(
-            COMPOSE_FILE,
+        input_rows = query_roar_db_on_head(
             "SELECT ji.path FROM job_inputs ji WHERE ji.path LIKE '%ray_data_input.csv'",
         )
         assert input_rows, (
@@ -31,8 +43,7 @@ def test_read_csv_and_write_parquet_are_captured(self, ray_cluster):
             "but no matching artifact was captured."
         )
 
-        output_rows = _query_roar_db(
-            COMPOSE_FILE,
+        output_rows = query_roar_db_on_head(
             "SELECT jo.path FROM job_outputs jo WHERE jo.path LIKE '%ray_data_output%'",
         )
         assert output_rows, (
diff --git a/tests/e2e/ray/test_reconstituted_composites.py b/tests/e2e/ray/test_reconstituted_composites.py
new file mode 100644
index 00000000..0d070129
--- /dev/null
+++ b/tests/e2e/ray/test_reconstituted_composites.py
@@ -0,0 +1,135 @@
+"""Composite artifact contracts for Ray fragment reconstitution."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(300)]
+
+
+def _parse_payload(stdout: str) -> dict[str, object]:
+    for line in reversed(stdout.splitlines()):
+        line = line.strip()
+        if not line.startswith("{"):
+            continue
+        try:
+            payload = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict) and payload.get("script") == "cloud_demo_emulated":
+            return payload
+    raise AssertionError(f"Unable to parse cloud-demo-emulated payload from output:\n{stdout}")
+
+
+@pytest.fixture(scope="module")
+def composite_reconstitution_run(ray_cluster: dict[str, str]) -> dict[str, object]:
+    project_dir = make_host_project_dir("ray-composites")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "cloud_demo_emulated/main.py",
+        use_fragment_store=True,
+        extra_env={
+            "S3_DATA_BUCKET": "test-bucket",
+            "S3_MODELS_BUCKET": "output-bucket",
+            "S3_RESULTS_BUCKET": "output-bucket",
+        },
+        timeout=300,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+    return {
+        "project_dir": project_dir,
+        "payload": _parse_payload(result.stdout),
+    }
+
+
+def test_reconstitution_creates_composite_artifact_for_dataset_root(
+    composite_reconstitution_run: dict[str, object],
+) -> None:
+    project_dir = composite_reconstitution_run["project_dir"]
+    assert isinstance(project_dir, Path)
+
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT a.id, a.kind, a.component_count, jo.path, j.command
+        FROM artifacts a
+        JOIN job_outputs jo ON jo.artifact_id = a.id
+        JOIN jobs j ON j.id = jo.job_id
+        WHERE jo.path = 's3://test-bucket/sensor_data'
+        ORDER BY j.id DESC
+        """,
+    )
+    assert rows, "Expected reconstitution to add a composite dataset-root artifact output"
+
+    composite_row = rows[0]
+    assert composite_row["kind"] == "composite", composite_row
+    assert int(composite_row["component_count"]) == 25, composite_row
+    assert composite_row["command"] == "ray_task:extraction", composite_row
+
+
+def test_reconstitution_persists_composite_components_and_membership_index(
+    composite_reconstitution_run: dict[str, object],
+) -> None:
+    project_dir = composite_reconstitution_run["project_dir"]
+    assert isinstance(project_dir, Path)
+
+    composite_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT a.id
+        FROM artifacts a
+        JOIN job_outputs jo ON jo.artifact_id = a.id
+        WHERE jo.path = 's3://test-bucket/sensor_data'
+        ORDER BY a.first_seen_at DESC
+        LIMIT 1
+        """,
+    )
+    assert composite_rows, "Expected a persisted composite artifact for the shard dataset root"
+    composite_id = str(composite_rows[0]["id"])
+
+    component_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT relative_path, component_algorithm, component_digest, artifact_id
+        FROM composite_artifact_components
+        WHERE composite_artifact_id = ?
+        ORDER BY ordinal ASC, id ASC
+        """,
+        (composite_id,),
+    )
+    assert len(component_rows) == 25, component_rows
+    assert component_rows[0]["relative_path"] == "shard_000000.parquet", component_rows[0]
+    assert component_rows[-1]["relative_path"] == "shard_000024.parquet", component_rows[-1]
+    assert all(str(row["component_algorithm"]).strip() for row in component_rows), component_rows
+    assert all(str(row["component_digest"]).strip() for row in component_rows), component_rows
+    assert all(str(row["artifact_id"]).strip() for row in component_rows), component_rows
+
+    membership_rows = query_roar_db(
+        project_dir,
+        """
+        SELECT total_components, stored_components, bloom_bits, bloom_hashes, bloom_version
+        FROM composite_membership_indexes
+        WHERE composite_artifact_id = ?
+        """,
+        (composite_id,),
+    )
+    assert membership_rows, "Expected composite membership metadata for the dataset-root artifact"
+    membership = membership_rows[0]
+    assert int(membership["total_components"]) == 25, membership
+    assert int(membership["stored_components"]) == 25, membership
+    assert int(membership["bloom_bits"]) > 0, membership
+    assert int(membership["bloom_hashes"]) > 0, membership
+    assert int(membership["bloom_version"]) == 1, membership
diff --git a/tests/e2e/ray/test_register_targets.py b/tests/e2e/ray/test_register_targets.py
new file mode 100644
index 00000000..a077794f
--- /dev/null
+++ b/tests/e2e/ray/test_register_targets.py
@@ -0,0 +1,119 @@
+"""Ray submit-path contracts for `roar register` step and session targets."""
+
+from __future__ import annotations
+
+import re
+from pathlib import Path
+
+import pytest
+
+from roar.glaas_client import GlaasClient
+from roar.services.registration.session import SessionRegistrationService
+from tests.e2e.ray.conftest import (
+    HOST_GLAAS_URL,
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_cli_from_host,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(240)]
+
+EXPECTED_SESSION_COMMANDS = {
+    "ray_task:extract_dataset",
+    "ray_task:train_model",
+    "ray_task:evaluate_model",
+}
+
+EXPECTED_STEP_COMMANDS = {
+    "ray_task:extraction",
+    "ray_task:training",
+    "ray_task:evaluation",
+}
+
+
+def _parse_session_hash(output: str) -> str:
+    match = re.search(r"/dag/([a-f0-9]{64})", output)
+    if not match:
+        raise AssertionError(f"Unable to parse session hash from output:\n{output}")
+    return match.group(1)
+
+
+def _run_phase_pipeline(project_dir: Path, ray_cluster: dict[str, str]) -> None:
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "phase_lineage_contract/main.py",
+        use_fragment_store=True,
+        timeout=240,
+    )
+    assert result.returncode == 0, result.stderr or result.stdout
+
+
+def _active_session_id(project_dir: Path) -> int:
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT id
+        FROM sessions
+        WHERE is_active = 1
+        ORDER BY id DESC
+        LIMIT 1
+        """,
+    )
+    assert rows, "Expected an active local roar session"
+    return int(rows[0]["id"])
+
+
+def _session_commands(session_hash: str) -> set[str]:
+    client = GlaasClient(HOST_GLAAS_URL)
+    session, error = client.get_session(session_hash)
+    assert error is None, error
+    assert isinstance(session, dict), session
+    jobs = session.get("jobs", [])
+    assert isinstance(jobs, list), session
+    return {
+        str(job.get("command"))
+        for job in jobs
+        if isinstance(job, dict) and str(job.get("command", "")).startswith("ray_task:")
+    }
+
+
+@pytest.fixture(scope="module")
+def register_target_project(ray_cluster: dict[str, str]) -> Path:
+    project_dir = make_host_project_dir("register-targets")
+    init_host_project(project_dir)
+    _run_phase_pipeline(project_dir, ray_cluster)
+    return project_dir
+
+
+def test_register_step_reference_after_ray_submit(register_target_project: Path) -> None:
+    result = run_roar_cli_from_host(register_target_project, "register", "@4", "--yes", timeout=60)
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    session_hash = _parse_session_hash(result.stdout)
+    commands = _session_commands(session_hash)
+    assert EXPECTED_STEP_COMMANDS.issubset(commands), commands
+    assert "ray_task:evaluate_model" in commands, commands
+
+
+def test_register_session_hash_after_ray_submit(register_target_project: Path) -> None:
+    session_id = _active_session_id(register_target_project)
+    session_hash = SessionRegistrationService().compute_session_hash(
+        roar_dir=str(register_target_project / ".roar"),
+        session_id=session_id,
+    )
+
+    result = run_roar_cli_from_host(
+        register_target_project,
+        "register",
+        session_hash,
+        "--yes",
+        timeout=60,
+    )
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    assert session_hash in result.stdout, result.stdout
+    commands = _session_commands(session_hash)
+    assert EXPECTED_SESSION_COMMANDS.issubset(commands), commands
diff --git a/tests/e2e/ray/test_roar_run_host_submit.py b/tests/e2e/ray/test_roar_run_host_submit.py
new file mode 100644
index 00000000..62a71ae8
--- /dev/null
+++ b/tests/e2e/ray/test_roar_run_host_submit.py
@@ -0,0 +1,461 @@
+"""E2E: `roar run ray job submit` invoked from the LOCAL host against the Docker cluster.
+
+This test file reproduces the cloud topology exactly:
+  - `roar run` runs on the local host machine (not inside a container)
+  - The Ray cluster (head + workers) runs in Docker containers
+  - Workers are isolated processes that cannot reach 127.0.0.1 on the host
+
+These tests cover host-submit behavior that only shows up in the cloud/remote-cluster topology:
+
+  BUG 1 — Worker proxy endpoint unreachable (502 Bad Gateway):
+    _ray_job_submit.py hardcodes AWS_ENDPOINT_URL=http://127.0.0.1:19191 into
+    the worker runtime env. Workers inside Docker containers or on remote EC2s
+    cannot connect to the host's local proxy → all S3 calls fail with 502.
+
+  BUG 2 — Duplicate pip entry:
+    _merge_roar_runtime_env_pip() fails to deduplicate URL-based requirements
+    (e.g. presigned S3 URLs) because _requirement_name() returns the full URL
+    rather than a canonical package name, so the URL is never matched as an
+    existing 'roar-cli' entry and gets appended a second time.
+
+The goal is to exercise the real user entrypoint without any roar-aware workload logic.
+"""
+
+from __future__ import annotations
+
+import json
+import os
+import sqlite3
+import subprocess
+import sys
+import textwrap
+from pathlib import Path
+
+import pytest
+
+COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
+JOBS_DIR = Path(__file__).resolve().parent / "jobs"
+REPO_ROOT = Path(__file__).resolve().parents[3]
+ROAR_BIN = REPO_ROOT / ".venv" / "bin" / "roar"
+HOST_GLAAS_URL = "http://localhost:3001"
+CLUSTER_GLAAS_URL = "http://host.docker.internal:3001"
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract]
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _roar_bin() -> str:
+    """Return path to the local roar binary."""
+    if ROAR_BIN.exists():
+        return str(ROAR_BIN)
+    # Fallback: rely on PATH (conftest prepends .venv/bin)
+    return "roar"
+
+
+def _init_project(project_dir: Path) -> None:
+    """Create a minimal git repo + roar project under project_dir."""
+    project_dir.mkdir(parents=True, exist_ok=True)
+    (project_dir / "README.md").write_text("ray host-submit e2e\n", encoding="utf-8")
+    (project_dir / ".gitignore").write_text(".roar/\n", encoding="utf-8")
+    subprocess.run(
+        ["git", "init", "-q"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        ["git", "config", "user.email", "test@test.com"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        ["git", "config", "user.name", "test"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        ["git", "add", "README.md", ".gitignore"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        ["git", "commit", "-q", "-m", "init"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        [_roar_bin(), "init", "--path", str(project_dir), "-n"],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+    subprocess.run(
+        [_roar_bin(), "config", "set", "glaas.url", ""],
+        cwd=project_dir,
+        check=True,
+        capture_output=True,
+    )
+
+
+def _artifact_count(project_dir: Path) -> int:
+    """Count S3 artifacts recorded in roar.db after reconstitution."""
+    db = project_dir / ".roar" / "roar.db"
+    if not db.exists():
+        return 0
+    conn = sqlite3.connect(db)
+    try:
+        row = conn.execute(
+            "SELECT COUNT(*) FROM artifacts WHERE first_seen_path LIKE 's3://%'"
+        ).fetchone()
+        return int(row[0]) if row else 0
+    except sqlite3.OperationalError:
+        return 0
+    finally:
+        conn.close()
+
+
+def _base_env(ray_cluster: dict[str, str]) -> dict[str, str]:
+    """Build a clean env for roar run — inherits PATH but overrides AWS/roar vars."""
+    env = dict(os.environ)
+    env.update(
+        {
+            # Workers use the Docker image's installed roar — no PyPI download.
+            "ROAR_CLUSTER_PIP_REQ": "skip",
+            # Minio credentials (Docker compose defaults).
+            "AWS_ACCESS_KEY_ID": "minioadmin",
+            "AWS_SECRET_ACCESS_KEY": "minioadmin",
+            "AWS_DEFAULT_REGION": "us-east-1",
+            # Point at the minio instance exposed on the host.
+            "AWS_ENDPOINT_URL": str(ray_cluster["minio_endpoint"]),
+            # Workers must use the cluster-visible MinIO address, not the host loopback.
+            "ROAR_CLUSTER_AWS_ENDPOINT_URL": str(ray_cluster["cluster_minio_endpoint"]),
+            # Host submits use the host-visible URL; workers use the cluster-visible URL.
+            "GLAAS_URL": HOST_GLAAS_URL,
+            "ROAR_CLUSTER_GLAAS_URL": CLUSTER_GLAAS_URL,
+        }
+    )
+    return env
+
+
+# ---------------------------------------------------------------------------
+# Test 1: S3 job succeeds and captures artifacts (proves BUG 1 / proxy routing)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+def test_roar_run_from_host_s3_job_succeeds_and_captures_artifacts(
+    ray_cluster: dict[str, str],
+    tmp_path: Path,
+) -> None:
+    """roar run ray job submit from the LOCAL host must succeed and capture S3 artifacts.
+
+    BUG 1 (currently failing): workers receive AWS_ENDPOINT_URL=http://127.0.0.1:19191
+    which points at the host's roar-proxy. Workers inside Docker containers cannot
+    reach the host's loopback address, causing every S3 call to fail with 502.
+
+    After the fix (workers get a reachable proxy endpoint), the job should:
+      - Complete with exit code 0
+      - Record at least 1 S3 artifact in roar.db via proxy log collection
+    """
+    project_dir = tmp_path / "project"
+    _init_project(project_dir)
+
+    cmd = [
+        _roar_bin(),
+        "run",
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        ray_cluster["dashboard_url"],
+        "--working-dir",
+        str(JOBS_DIR),
+        "--",
+        "python",
+        "s3_workload.py",
+    ]
+
+    result = subprocess.run(
+        cmd,
+        cwd=project_dir,
+        env=_base_env(ray_cluster),
+        capture_output=True,
+        text=True,
+        timeout=180,
+    )
+
+    assert result.returncode == 0, (
+        f"roar run ray job submit failed (rc={result.returncode}).\n"
+        f"STDOUT:\n{textwrap.indent(result.stdout, '  ')}\n"
+        f"STDERR:\n{textwrap.indent(result.stderr, '  ')}"
+    )
+
+    artifact_count = _artifact_count(project_dir)
+    assert artifact_count > 0, (
+        f"Job succeeded but 0 S3 artifacts were captured in roar.db. "
+        f"Expected proxy logs to be collected and reconstituted. "
+        f"roar.db path: {project_dir / '.roar' / 'roar.db'}"
+    )
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+def test_roar_run_from_host_subprocess_ray_job_captures_s3_artifacts(
+    ray_cluster: dict[str, str],
+    tmp_path: Path,
+) -> None:
+    """Ray S3 jobs that exit child subprocesses without ray.shutdown() must still collect logs.
+
+    This matches the cloud-demo topology more closely than the simple workload:
+      - a parent driver process spawns child Python processes
+      - each child calls ray.init(), performs S3 work, and exits normally
+      - no child explicitly calls ray.shutdown()
+
+    The contract is still the same: `roar run ray job submit ...` must capture
+    S3 artifacts via the proxy with no workload knowledge of roar.
+    """
+    project_dir = tmp_path / "project"
+    _init_project(project_dir)
+
+    cmd = [
+        _roar_bin(),
+        "run",
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        ray_cluster["dashboard_url"],
+        "--working-dir",
+        str(JOBS_DIR),
+        "--",
+        "python",
+        "s3_subprocess_pipeline.py",
+    ]
+
+    result = subprocess.run(
+        cmd,
+        cwd=project_dir,
+        env=_base_env(ray_cluster),
+        capture_output=True,
+        text=True,
+        timeout=180,
+    )
+
+    assert result.returncode == 0, (
+        f"subprocess Ray job failed (rc={result.returncode}).\n"
+        f"STDOUT:\n{textwrap.indent(result.stdout, '  ')}\n"
+        f"STDERR:\n{textwrap.indent(result.stderr, '  ')}"
+    )
+
+    artifact_count = _artifact_count(project_dir)
+    assert artifact_count > 0, (
+        "Subprocess Ray job succeeded but 0 S3 artifacts were captured in roar.db. "
+        "This means proxy logs were lost when the child processes exited without "
+        "calling ray.shutdown()."
+    )
+
+
+# ---------------------------------------------------------------------------
+# Test 2: Worker proxy endpoint is reachable from inside the cluster
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+def test_worker_proxy_endpoint_is_reachable(
+    ray_cluster: dict[str, str],
+    tmp_path: Path,
+) -> None:
+    """Workers must be able to connect to the AWS_ENDPOINT_URL they receive.
+
+    BUG 1 (currently failing): AWS_ENDPOINT_URL=http://127.0.0.1:19191 is injected
+    into the worker runtime env. Workers inside Docker containers get a loopback
+    address that only resolves on the HOST — not on the worker's own network
+    namespace — so socket.connect() fails with ECONNREFUSED.
+
+    After the fix, every worker should report reachable=True.
+    """
+    project_dir = tmp_path / "project"
+    _init_project(project_dir)
+
+    cmd = [
+        _roar_bin(),
+        "run",
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        ray_cluster["dashboard_url"],
+        "--working-dir",
+        str(JOBS_DIR),
+        "--",
+        "python",
+        "proxy_reachability_probe.py",
+    ]
+
+    result = subprocess.run(
+        cmd,
+        cwd=project_dir,
+        env=_base_env(ray_cluster),
+        capture_output=True,
+        text=True,
+        timeout=180,
+    )
+
+    assert result.returncode == 0, (
+        f"Probe job failed (rc={result.returncode}).\n"
+        f"STDOUT:\n{textwrap.indent(result.stdout, '  ')}\n"
+        f"STDERR:\n{textwrap.indent(result.stderr, '  ')}"
+    )
+
+    # Parse the JSON output from the probe — it's emitted to stdout by the job driver.
+    probe_output: dict | None = None
+    for line in result.stdout.splitlines():
+        line = line.strip()
+        if line.startswith('{"results"'):
+            try:
+                probe_output = json.loads(line)
+                break
+            except json.JSONDecodeError:
+                continue
+
+    assert probe_output is not None, f"Could not find probe JSON output in stdout:\n{result.stdout}"
+
+    worker_results: list[dict] = probe_output["results"]
+    unreachable = [r for r in worker_results if not r.get("reachable")]
+
+    assert not unreachable, (
+        f"{len(unreachable)}/{len(worker_results)} workers could not reach their proxy endpoint.\n"
+        + "\n".join(
+            f"  node={r['node_id'][:8]} endpoint={r['endpoint']} error={r['error']}"
+            for r in unreachable
+        )
+    )
+
+
+# ---------------------------------------------------------------------------
+# Test 3: Runtime env pip list has no duplicates (proves BUG 2)
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+def test_roar_run_runtime_env_pip_no_duplicates(
+    ray_cluster: dict[str, str],
+    tmp_path: Path,
+) -> None:
+    """The pip list injected into runtime_env must not contain duplicate entries.
+
+    BUG 2 (currently failing): when ROAR_CLUSTER_PIP_REQ is a URL-based requirement
+    (e.g. a presigned S3 URL), _merge_roar_runtime_env_pip() fails to recognise it
+    as an existing 'roar-cli' entry (because _requirement_name() returns the full URL).
+    If the user's runtime_env already contains the URL, it ends up duplicated.
+
+    This test passes ROAR_CLUSTER_PIP_REQ=<url> and a pre-populated pip list
+    containing the same URL, then inspects the runtime-env JSON that roar injects
+    into the ray job submit command.
+    """
+    FAKE_WHEEL_URL = (
+        "https://example.com/wheels/roar_cli-0.2.12-cp312-cp312-linux_x86_64.whl"
+        "?X-Amz-Signature=deadbeef"
+    )
+
+    project_dir = tmp_path / "project"
+    _init_project(project_dir)
+
+    # A pre-existing runtime env that already has the wheel URL in pip.
+    existing_runtime_env = json.dumps(
+        {
+            "pip": [FAKE_WHEEL_URL],
+            "env_vars": {},
+        }
+    )
+
+    env = _base_env(ray_cluster)
+    env["ROAR_CLUSTER_PIP_REQ"] = FAKE_WHEEL_URL
+
+    # Use --dry-run flag is not available, so we intercept via a no-op entrypoint
+    # and inspect the --runtime-env-json that roar passes through.
+    # Strategy: write a tiny Python script that captures sys.argv and exits 0.
+    spy_script = tmp_path / "spy.py"
+    spy_script.write_text(
+        textwrap.dedent("""\
+        import sys, json, pathlib
+        args = sys.argv[1:]
+        pathlib.Path('/tmp/roar_spy_args.json').write_text(json.dumps(args))
+        raise SystemExit(0)
+    """)
+    )
+
+    # Patch 'ray' in PATH to the spy script so roar run intercepts the final command.
+    fake_ray_dir = tmp_path / "fakebin"
+    fake_ray_dir.mkdir()
+    fake_ray = fake_ray_dir / "ray"
+    fake_ray.write_text(f'#!/bin/sh\nexec {sys.executable} {spy_script} "$@"\n')
+    fake_ray.chmod(0o755)
+
+    patched_env = dict(env)
+    patched_env["PATH"] = f"{fake_ray_dir}:{patched_env.get('PATH', '')}"
+
+    cmd = [
+        _roar_bin(),
+        "run",
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        ray_cluster["dashboard_url"],
+        "--runtime-env-json",
+        existing_runtime_env,
+        "--working-dir",
+        str(JOBS_DIR),
+        "--",
+        "python",
+        "s3_workload.py",
+    ]
+
+    subprocess.run(
+        cmd,
+        cwd=project_dir,
+        env=patched_env,
+        capture_output=True,
+        text=True,
+        timeout=30,
+    )
+
+    spy_output = Path("/tmp/roar_spy_args.json")
+    assert spy_output.exists(), "Spy script was never invoked — roar run did not call ray"
+
+    captured_args = json.loads(spy_output.read_text())
+
+    # Extract --runtime-env-json value from the captured args.
+    runtime_env_json: str | None = None
+    for i, arg in enumerate(captured_args):
+        if arg == "--runtime-env-json" and i + 1 < len(captured_args):
+            runtime_env_json = captured_args[i + 1]
+            break
+
+    assert runtime_env_json is not None, (
+        f"--runtime-env-json not found in captured args: {captured_args}"
+    )
+
+    runtime_env = json.loads(runtime_env_json)
+    pip_list: list[str] = runtime_env.get("pip", [])
+
+    # Normalise for comparison (strip whitespace).
+    normalised = [p.strip() for p in pip_list]
+    duplicates = [p for p in normalised if normalised.count(p) > 1]
+
+    assert not duplicates, (
+        "Duplicate pip entries found in runtime_env:\n"
+        + "\n".join(f"  {p}" for p in sorted(set(duplicates)))
+        + f"\nFull pip list: {pip_list}"
+    )
diff --git a/tests/e2e/ray/test_runtime_env_conflict.py b/tests/e2e/ray/test_runtime_env_conflict.py
new file mode 100644
index 00000000..f0756a88
--- /dev/null
+++ b/tests/e2e/ray/test_runtime_env_conflict.py
@@ -0,0 +1,168 @@
+from __future__ import annotations
+
+import json
+import os
+import shutil
+import subprocess
+import sys
+from pathlib import Path
+
+import pytest
+
+_CONFLICT_MARKER = "failed to merge the job's runtime env"
+_SUCCESS_MARKER = "RUNTIME_ENV_CONFLICT_OVERRIDE_OK"
+_RAY_ADDRESS = "http://localhost:8265"
+
+
+def _run_checked(command: list[str], cwd: Path) -> None:
+    result = subprocess.run(command, cwd=cwd, capture_output=True, text=True, check=False)
+    if result.returncode != 0:
+        pytest.fail(
+            f"Command failed ({' '.join(command)}):\n"
+            f"stdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+        )
+
+
+def _init_clean_repo(project_dir: Path) -> None:
+    project_dir.mkdir(parents=True, exist_ok=True)
+    (project_dir / "README.md").write_text("runtime env conflict e2e\n", encoding="utf-8")
+    (project_dir / ".gitignore").write_text(".roar/\n", encoding="utf-8")
+
+    _run_checked(["git", "init"], cwd=project_dir)
+    _run_checked(["git", "config", "user.email", "e2e@example.com"], cwd=project_dir)
+    _run_checked(["git", "config", "user.name", "E2E"], cwd=project_dir)
+    _run_checked(["git", "add", "README.md", ".gitignore"], cwd=project_dir)
+    _run_checked(["git", "commit", "-m", "init"], cwd=project_dir)
+    _run_checked(
+        [sys.executable, "-m", "roar", "init", "--path", str(project_dir), "-n"],
+        cwd=project_dir,
+    )
+
+
+def _set_project_config(project_dir: Path) -> None:
+    config_path = project_dir / ".roar" / "config.toml"
+    config_text = config_path.read_text(encoding="utf-8")
+
+    if "pip_install = true" not in config_text:
+        config_text = config_text.replace("pip_install = false", "pip_install = true")
+    if 'default = "ptrace"' not in config_text:
+        config_text = config_text.replace('default = "auto"', 'default = "ptrace"')
+    config_text = config_text.replace('url = "https://api.glaas.ai"', 'url = ""')
+
+    config_path.write_text(config_text, encoding="utf-8")
+    assert "pip_install = true" in config_path.read_text(encoding="utf-8")
+
+
+def _maybe_skip_for_environment_errors(result: subprocess.CompletedProcess[str]) -> None:
+    output = f"{result.stdout}\n{result.stderr}".lower()
+
+    if result.returncode != 0 and "require the ray[default] installation" in output:
+        pytest.skip("Ray job submit requires ray[default] in this environment")
+
+    if result.returncode != 0 and any(
+        message in output
+        for message in (
+            "connection refused",
+            "failed to connect",
+            "unable to connect",
+            "cannot connect",
+            "timed out",
+            "deadline exceeded",
+        )
+    ):
+        pytest.skip("Ray dashboard became unreachable during job submission")
+
+
+def _run_submit(
+    project_dir: Path, *, override_job_runtime_env: bool
+) -> subprocess.CompletedProcess[str]:
+    candidate = Path(sys.executable).with_name("ray")
+    ray_binary = str(candidate) if candidate.exists() else shutil.which("ray")
+    if not ray_binary:
+        pytest.skip("ray CLI is not available in PATH")
+
+    probe = f"import ray; ray.init(); print('{_SUCCESS_MARKER}'); ray.shutdown()"
+    runtime_env: dict[str, object] = {
+        "pip": ["pydantic==2.12.5", "pydantic-settings==2.12.0"],
+    }
+
+    env = dict(os.environ)
+    if override_job_runtime_env:
+        env["RAY_OVERRIDE_JOB_RUNTIME_ENV"] = "1"
+        runtime_env["env_vars"] = {"RAY_OVERRIDE_JOB_RUNTIME_ENV": "1"}
+    else:
+        env.pop("RAY_OVERRIDE_JOB_RUNTIME_ENV", None)
+
+    result = subprocess.run(
+        [
+            sys.executable,
+            "-m",
+            "roar",
+            "run",
+            "--tracer",
+            "ptrace",
+            ray_binary,
+            "job",
+            "submit",
+            "--address",
+            _RAY_ADDRESS,
+            "--working-dir",
+            ".",
+            "--runtime-env-json",
+            json.dumps(runtime_env),
+            "--",
+            "python3",
+            "-c",
+            probe,
+        ],
+        cwd=project_dir,
+        capture_output=True,
+        text=True,
+        check=False,
+        timeout=180,
+        env=env,
+    )
+    _maybe_skip_for_environment_errors(result)
+    return result
+
+
+@pytest.mark.e2e
+def test_runtime_env_conflict_without_override(ray_cluster: dict[str, str], tmp_path: Path) -> None:
+    del ray_cluster
+    project_dir = tmp_path / "repo"
+    _init_clean_repo(project_dir)
+    _set_project_config(project_dir)
+
+    result = _run_submit(project_dir, override_job_runtime_env=False)
+    output = f"{result.stdout}\n{result.stderr}"
+    output_lower = output.lower()
+
+    assert result.returncode != 0, (
+        "Expected `roar run ray job submit` to fail without "
+        "RAY_OVERRIDE_JOB_RUNTIME_ENV=1 when ray.pip_install=true.\n"
+        f"output:\n{output}"
+    )
+    assert (
+        _CONFLICT_MARKER in output_lower or "failed to merge the job's runtime_env" in output_lower
+    ), output
+    assert "conflict" in output_lower, output
+
+
+@pytest.mark.e2e
+def test_runtime_env_conflict_succeeds_with_override(
+    ray_cluster: dict[str, str], tmp_path: Path
+) -> None:
+    del ray_cluster
+    project_dir = tmp_path / "repo"
+    _init_clean_repo(project_dir)
+    _set_project_config(project_dir)
+
+    result = _run_submit(project_dir, override_job_runtime_env=True)
+    output = f"{result.stdout}\n{result.stderr}"
+
+    assert result.returncode == 0, (
+        "Expected `roar run ray job submit` to succeed with "
+        "RAY_OVERRIDE_JOB_RUNTIME_ENV=1.\n"
+        f"output:\n{output}"
+    )
+    assert _SUCCESS_MARKER in output, output
diff --git a/tests/e2e/ray/test_s3_capture.py b/tests/e2e/ray/test_s3_capture.py
deleted file mode 100644
index 3014edfe..00000000
--- a/tests/e2e/ray/test_s3_capture.py
+++ /dev/null
@@ -1,114 +0,0 @@
-"""
-TDD: roar captures S3 I/O from Ray workers via the proxy.
-
-These tests define the target behaviour for S3 proxy injection into Ray workers.
-They FAIL until roar propagates AWS_ENDPOINT_URL to workers and collects proxy logs.
-
-Run against a live cluster:
-    pytest tests/e2e/ray/test_s3_capture.py -v --timeout=120
-"""
-
-from __future__ import annotations
-
-from pathlib import Path
-
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-class TestS3Capture:
-    """roar captures S3 operations performed by Ray workers."""
-
-    def test_worker_s3_put_appears_as_output_artifact(self, ray_cluster):
-        """
-        S3 PutObject calls made by Ray workers should be captured
-        as output artifacts via roar's S3 proxy.
-
-        FAILS until roar propagates AWS_ENDPOINT_URL into Ray workers
-        and collects proxy logs from each node.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/s3_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT path, source_type FROM artifacts WHERE source_type IN ('s3', 'proxy')",
-        )
-        assert len(rows) >= 1, (
-            "Expected an S3 artifact from the Ray worker's boto3 put_object call, "
-            "but none were captured. "
-            "roar is not yet routing Ray worker S3 traffic through the proxy."
-        )
-
-    def test_worker_s3_get_appears_as_input_artifact(self, ray_cluster):
-        """
-        S3 GetObject calls from Ray workers should appear as input artifacts.
-
-        FAILS until roar's proxy captures worker S3 traffic.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/s3_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT a.path FROM job_inputs ji "
-            "JOIN artifacts a ON ji.artifact_id = a.id "
-            "WHERE a.source_type IN ('s3', 'proxy')",
-        )
-        assert len(rows) >= 1, (
-            "Expected S3 GetObject from Ray worker to appear as a job input, "
-            "but none were captured."
-        )
-
-    def test_s3_artifact_has_etag(self, ray_cluster):
-        """
-        S3 artifacts captured via the proxy should include their ETag
-        (content hash) for content-based deduplication.
-
-        FAILS until roar's proxy collects and records ETags from worker traffic.
-        """
-        submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/s3_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT hash FROM artifacts WHERE source_type IN ('s3', 'proxy') AND hash IS NOT NULL",
-        )
-        assert len(rows) >= 1, (
-            "Expected S3 artifact to have a hash (ETag) from proxy capture, "
-            "but no hashed S3 artifacts were found."
-        )
-
-    def test_worker_s3_write_artifact_has_nonzero_size(self, ray_cluster):
-        """S3 write artifacts should persist non-zero size for non-empty object bodies."""
-        submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/s3_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT size FROM artifacts "
-            "WHERE source_type IN ('s3', 'proxy') AND path LIKE 's3://%' "
-            "ORDER BY first_seen_at DESC",
-        )
-        assert len(rows) >= 1, "Expected at least one captured S3 artifact."
-        assert any(int(row["size"]) > 0 for row in rows), (
-            "Expected at least one captured S3 write artifact with size > 0, "
-            "but all captured S3 artifact sizes were 0."
-        )
diff --git a/tests/e2e/ray/test_s3_pipeline.py b/tests/e2e/ray/test_s3_pipeline.py
index f3299012..4c312efa 100644
--- a/tests/e2e/ray/test_s3_pipeline.py
+++ b/tests/e2e/ray/test_s3_pipeline.py
@@ -1,18 +1,20 @@
-"""E2E coverage for a 3-stage S3 Ray pipeline."""
+"""E2E coverage for a 3-stage S3 Ray pipeline through the host submit path."""
 
 from __future__ import annotations
 
 import json
-import subprocess
 from pathlib import Path
 
 import pytest
 
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
 
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
 
 
 def _parse_run_id(stdout: str) -> str:
@@ -30,125 +32,100 @@ def _parse_run_id(stdout: str) -> str:
     raise AssertionError(f"Unable to parse run_id from output:\n{stdout}")
 
 
-def _run_pipeline() -> str:
-    stdout, stderr, returncode = submit_job_on_head(
-        COMPOSE_FILE,
-        f"{JOBS_DIR}/s3_pipeline.py",
-        env={"ROAR_WRAP": "1"},
+def _run_pipeline(project_dir: Path, ray_cluster: dict[str, str]) -> str:
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "s3_pipeline.py",
+        use_fragment_store=True,
     )
-    assert returncode == 0, f"Job failed:\n{stderr}\n{stdout}"
-    return _parse_run_id(stdout)
-
-
-@pytest.fixture(autouse=True)
-def reset_roar_state(ray_cluster):
-    del ray_cluster
-    subprocess.run(
-        [
-            "docker",
-            "compose",
-            "-f",
-            str(COMPOSE_FILE),
-            "exec",
-            "-T",
-            "ray-head",
-            "bash",
-            "-c",
-            "rm -rf /app/.roar /shared/.roar-logs && roar init --path /app -n",
-        ],
-        check=False,
-        capture_output=True,
-    )
-    yield
+    assert result.returncode == 0, result.stderr or result.stdout
+    return _parse_run_id(result.stdout)
 
 
 class TestS3Pipeline:
-    def test_all_s3_put_get_captured(self, ray_cluster):
-        del ray_cluster
-        run_id = _run_pipeline()
+    @pytest.fixture(autouse=True)
+    def _init_project(self) -> None:
+        self.project_dir = make_host_project_dir("s3-pipeline")
+        init_host_project(self.project_dir)
+
+    def test_all_s3_put_get_captured(self, ray_cluster: dict[str, str]) -> None:
+        run_id = _run_pipeline(self.project_dir, ray_cluster)
 
-        rows = _query_roar_db(
-            COMPOSE_FILE,
+        rows = query_roar_db(
+            self.project_dir,
             """
-            SELECT path
+            SELECT COALESCE(path, first_seen_path) AS path
             FROM artifacts
-            WHERE source_type IN ('s3', 'proxy')
-              AND path LIKE ?
+            WHERE COALESCE(path, first_seen_path) LIKE ?
             """,
             (f"%/{run_id}/%",),
         )
         assert len(rows) >= 13, f"Expected >=13 distinct S3 artifacts, got {len(rows)}: {rows}"
 
-    def test_cross_task_s3_artifact_identity(self, ray_cluster):
-        del ray_cluster
-        run_id = _run_pipeline()
+    def test_cross_task_s3_artifact_identity(self, ray_cluster: dict[str, str]) -> None:
+        run_id = _run_pipeline(self.project_dir, ray_cluster)
 
-        rows = _query_roar_db(
-            COMPOSE_FILE,
+        rows = query_roar_db(
+            self.project_dir,
             """
             SELECT a.id,
-                   a.path,
+                   COALESCE(a.path, a.first_seen_path) AS path,
                    COUNT(DISTINCT jo.job_id) AS producer_count,
                    COUNT(DISTINCT ji.job_id) AS consumer_count
             FROM artifacts a
             LEFT JOIN job_outputs jo ON jo.artifact_id = a.id
             LEFT JOIN job_inputs ji ON ji.artifact_id = a.id
-            WHERE a.path LIKE ?
+            WHERE COALESCE(a.path, a.first_seen_path) LIKE ?
             GROUP BY a.id
             HAVING producer_count > 0 AND consumer_count > 0
             """,
             (f"%processed/{run_id}/%",),
         )
-        assert len(rows) == 3, (
-            f"Expected 3 processed shard artifacts with both producer and consumer, got {len(rows)}."
-        )
+        assert len(rows) == 3
         for row in rows:
             assert row["producer_count"] == 1, f"{row['path']}: expected 1 producer"
             assert row["consumer_count"] == 1, f"{row['path']}: expected 1 consumer"
 
-    def test_model_artifacts_have_cross_task_identity(self, ray_cluster):
-        del ray_cluster
-        run_id = _run_pipeline()
+    def test_model_artifacts_have_cross_task_identity(self, ray_cluster: dict[str, str]) -> None:
+        run_id = _run_pipeline(self.project_dir, ray_cluster)
 
-        rows = _query_roar_db(
-            COMPOSE_FILE,
+        rows = query_roar_db(
+            self.project_dir,
             """
             SELECT a.id,
-                   a.path,
+                   COALESCE(a.path, a.first_seen_path) AS path,
                    COUNT(DISTINCT jo.job_id) AS producer_count,
                    COUNT(DISTINCT ji.job_id) AS consumer_count
             FROM artifacts a
             LEFT JOIN job_outputs jo ON jo.artifact_id = a.id
             LEFT JOIN job_inputs ji ON ji.artifact_id = a.id
-            WHERE a.path LIKE ?
+            WHERE COALESCE(a.path, a.first_seen_path) LIKE ?
             GROUP BY a.id
             HAVING producer_count > 0 AND consumer_count > 0
             """,
             (f"%models/{run_id}/%",),
         )
-        assert len(rows) == 3, (
-            f"Expected 3 model artifacts with both producer and consumer, got {len(rows)}."
-        )
+        assert len(rows) == 3
         for row in rows:
             assert row["producer_count"] == 1, f"{row['path']}: expected 1 producer"
             assert row["consumer_count"] == 1, f"{row['path']}: expected 1 consumer"
 
-    def test_no_orphaned_s3_artifacts(self, ray_cluster):
-        del ray_cluster
-        run_id = _run_pipeline()
+    def test_no_orphaned_s3_artifacts(self, ray_cluster: dict[str, str]) -> None:
+        run_id = _run_pipeline(self.project_dir, ray_cluster)
 
-        rows = _query_roar_db(
-            COMPOSE_FILE,
+        rows = query_roar_db(
+            self.project_dir,
             """
-            SELECT a.path,
+            SELECT COALESCE(a.path, a.first_seen_path) AS path,
                    COUNT(DISTINCT jo.job_id) AS producers,
                    COUNT(DISTINCT ji.job_id) AS consumers
             FROM artifacts a
             LEFT JOIN job_outputs jo ON jo.artifact_id = a.id
             LEFT JOIN job_inputs ji ON ji.artifact_id = a.id
-            WHERE a.path LIKE ?
-               OR a.path LIKE ?
-               OR a.path LIKE ?
+            WHERE COALESCE(a.path, a.first_seen_path) LIKE ?
+               OR COALESCE(a.path, a.first_seen_path) LIKE ?
+               OR COALESCE(a.path, a.first_seen_path) LIKE ?
             GROUP BY a.id
             """,
             (
@@ -157,32 +134,31 @@ def test_no_orphaned_s3_artifacts(self, ray_cluster):
                 f"%metrics/{run_id}/%",
             ),
         )
-        assert len(rows) > 0
+        assert rows
         for row in rows:
             assert row["producers"] >= 1, f"Orphaned artifact (no producer): {row['path']}"
             assert row["consumers"] >= 1, f"Dangling artifact (no consumer): {row['path']}"
 
-    def test_lineage_depth_reaches_raw_inputs(self, ray_cluster):
-        del ray_cluster
-        run_id = _run_pipeline()
+    def test_lineage_depth_reaches_raw_inputs(self, ray_cluster: dict[str, str]) -> None:
+        run_id = _run_pipeline(self.project_dir, ray_cluster)
 
-        report_jobs = _query_roar_db(
-            COMPOSE_FILE,
+        report_jobs = query_roar_db(
+            self.project_dir,
             """
-            SELECT j.id, j.command
+            SELECT j.id
             FROM jobs j
             JOIN job_outputs jo ON jo.job_id = j.id
             JOIN artifacts a ON jo.artifact_id = a.id
-            WHERE a.path LIKE ?
+            WHERE COALESCE(a.path, a.first_seen_path) LIKE ?
             """,
             (f"%results/{run_id}/final_report.json",),
         )
-        assert len(report_jobs) >= 1, "No job found that wrote final_report"
+        assert report_jobs, "No job found that wrote final_report"
 
         visited_jobs: set[int] = set()
         frontier = {int(report_jobs[0]["id"])}
         depth = 0
-        all_artifact_paths: set[str] = set()
+        all_paths: set[str] = set()
 
         while frontier and depth < 8:
             next_frontier: set[int] = set()
@@ -190,11 +166,11 @@ def test_lineage_depth_reaches_raw_inputs(self, ray_cluster):
                 if job_id in visited_jobs:
                     continue
                 visited_jobs.add(job_id)
-                inputs = _query_roar_db(
-                    COMPOSE_FILE,
+                inputs = query_roar_db(
+                    self.project_dir,
                     """
                     SELECT a.id,
-                           a.path,
+                           COALESCE(a.path, a.first_seen_path) AS path,
                            (
                                SELECT jo2.job_id
                                FROM job_outputs jo2
@@ -207,18 +183,17 @@ def test_lineage_depth_reaches_raw_inputs(self, ray_cluster):
                     """,
                     (job_id,),
                 )
-                for inp in inputs:
-                    path = inp.get("path")
+                for row in inputs:
+                    path = row.get("path")
                     if isinstance(path, str):
-                        all_artifact_paths.add(path)
-                    producer_job_id = inp.get("producer_job_id")
-                    if producer_job_id is not None:
-                        next_frontier.add(int(producer_job_id))
+                        all_paths.add(path)
+                    if row.get("producer_job_id") is not None:
+                        next_frontier.add(int(row["producer_job_id"]))
             frontier = next_frontier
             depth += 1
 
-        raw_paths = [path for path in all_artifact_paths if f"raw/{run_id}/" in path]
+        raw_paths = [path for path in all_paths if f"raw/{run_id}/" in path]
         assert len(raw_paths) == 3, (
             f"Expected 3 raw shard paths in lineage, got {len(raw_paths)}. "
-            f"All ancestor paths: {sorted(all_artifact_paths)}"
+            f"All ancestor paths: {sorted(all_paths)}"
         )
diff --git a/tests/e2e/ray/test_s3_tracking_gaps.py b/tests/e2e/ray/test_s3_tracking_gaps.py
new file mode 100644
index 00000000..c38d0566
--- /dev/null
+++ b/tests/e2e/ray/test_s3_tracking_gaps.py
@@ -0,0 +1,894 @@
+"""E2E gap-verification tests for Ray S3 tracking via `roar run ray job submit`."""
+
+from __future__ import annotations
+
+import base64
+import json
+import os
+import sqlite3
+import subprocess
+import sys
+import urllib.error
+import urllib.request
+import uuid
+from collections.abc import Callable
+from pathlib import Path
+from typing import Any
+from urllib.parse import urlparse
+
+import pytest
+
+RAY_DASHBOARD_URL = "http://localhost:8265/api/version"
+GLAAS_HEALTH_URL = "http://localhost:3001/api/v1/health"
+GLAAS_BASE_URL = "http://localhost:3001"
+RAY_JOB_ADDRESS = "http://localhost:8265"
+REPO_ROOT = Path(__file__).resolve().parents[3]
+JOBS_DIR = Path("tests/e2e/ray/jobs")
+
+pytestmark = pytest.mark.e2e
+
+
+def _http_get(url: str, timeout_seconds: int = 5) -> tuple[int, str]:
+    with urllib.request.urlopen(url, timeout=timeout_seconds) as response:
+        status = int(response.getcode())
+        body = response.read().decode("utf-8", errors="replace")
+        return status, body
+
+
+def _skip_if_services_unreachable() -> None:
+    checks = (
+        ("Ray dashboard", RAY_DASHBOARD_URL),
+        ("GLaaS", GLAAS_HEALTH_URL),
+    )
+    for service_name, url in checks:
+        try:
+            status, _body = _http_get(url)
+        except urllib.error.URLError as exc:
+            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
+        except (TimeoutError, ConnectionError, OSError) as exc:
+            pytest.skip(f"{service_name} not reachable at {url}: {exc}")
+        if status != 200:
+            pytest.skip(f"{service_name} not healthy at {url}: HTTP {status}")
+
+
+def _run_checked(command: list[str], cwd: Path) -> None:
+    result = subprocess.run(command, cwd=cwd, capture_output=True, text=True, check=False)
+    if result.returncode != 0:
+        pytest.fail(
+            f"Command failed ({' '.join(command)}):\n"
+            f"stdout:\n{result.stdout}\n\nstderr:\n{result.stderr}"
+        )
+
+
+def _init_clean_project(project_dir: Path) -> None:
+    project_dir.mkdir(parents=True, exist_ok=True)
+    (project_dir / "README.md").write_text("ray s3 tracking gaps e2e\n", encoding="utf-8")
+    (project_dir / ".gitignore").write_text(".roar/\n", encoding="utf-8")
+
+    _run_checked(["git", "init"], cwd=project_dir)
+    _run_checked(["git", "config", "user.email", "e2e@example.com"], cwd=project_dir)
+    _run_checked(["git", "config", "user.name", "E2E"], cwd=project_dir)
+    _run_checked(["git", "add", "README.md", ".gitignore"], cwd=project_dir)
+    _run_checked(["git", "commit", "-m", "init"], cwd=project_dir)
+    _run_checked(
+        [sys.executable, "-m", "roar", "init", "--path", str(project_dir), "-n"], cwd=project_dir
+    )
+
+    config_path = project_dir / ".roar" / "config.toml"
+    config_text = config_path.read_text(encoding="utf-8")
+    if 'url = "https://api.glaas.ai"' in config_text:
+        config_text = config_text.replace(
+            'url = "https://api.glaas.ai"',
+            f'url = "{GLAAS_BASE_URL}"',
+        )
+    config_path.write_text(config_text, encoding="utf-8")
+
+
+def _maybe_skip_transient_submit_failure(result: subprocess.CompletedProcess[str]) -> None:
+    output = f"{result.stdout}\n{result.stderr}".lower()
+    if result.returncode != 0 and "require the ray[default] installation" in output:
+        pytest.skip("Ray job submit requires ray[default] in this environment")
+    if result.returncode != 0 and any(
+        marker in output
+        for marker in (
+            "connection refused",
+            "failed to connect",
+            "unable to connect",
+            "cannot connect",
+            "timed out",
+            "deadline exceeded",
+        )
+    ):
+        pytest.skip("Ray or GLaaS became unreachable during submit")
+
+
+def _extract_json_objects(output: str) -> list[dict[str, Any]]:
+    payloads: list[dict[str, Any]] = []
+    for raw_line in output.splitlines():
+        line = raw_line.strip()
+        if not line:
+            continue
+        for index, char in enumerate(line):
+            if char != "{":
+                continue
+            try:
+                maybe_payload = json.loads(line[index:])
+            except json.JSONDecodeError:
+                continue
+            if isinstance(maybe_payload, dict):
+                payloads.append(maybe_payload)
+                break
+    return payloads
+
+
+def _require_payload(
+    output: str,
+    predicate: Callable[[dict[str, Any]], bool],
+    label: str,
+) -> dict[str, Any]:
+    payloads = [payload for payload in _extract_json_objects(output) if predicate(payload)]
+    assert payloads, f"{label}: unable to find matching JSON payload in submit output:\n{output}"
+    return payloads[-1]
+
+
+def _submit_job(
+    project_dir: Path,
+    script_name: str,
+    *,
+    script_args: list[str] | None = None,
+    runtime_env_vars: dict[str, str] | None = None,
+    tracer: str | None = None,
+    timeout: int = 300,
+) -> subprocess.CompletedProcess[str]:
+    command = [sys.executable, "-m", "roar", "run"]
+    if tracer:
+        command.extend(["--tracer", tracer])
+    command.extend(
+        [
+            "ray",
+            "job",
+            "submit",
+            "--address",
+            RAY_JOB_ADDRESS,
+            "--working-dir",
+            str(REPO_ROOT),
+        ]
+    )
+    if runtime_env_vars:
+        command.extend(
+            [
+                "--runtime-env-json",
+                json.dumps({"env_vars": runtime_env_vars}, separators=(",", ":")),
+            ]
+        )
+    command.extend(
+        [
+            "--",
+            "python3",
+            str(JOBS_DIR / script_name),
+        ]
+    )
+    if script_args:
+        command.extend(script_args)
+
+    env = dict(os.environ)
+    env.setdefault("GLAAS_URL", GLAAS_BASE_URL)
+    env.setdefault("GLAAS_API_URL", GLAAS_BASE_URL)
+    env.setdefault("RAY_OVERRIDE_JOB_RUNTIME_ENV", "1")
+
+    result = subprocess.run(
+        command,
+        cwd=project_dir,
+        capture_output=True,
+        text=True,
+        check=False,
+        timeout=timeout,
+        env=env,
+    )
+    _maybe_skip_transient_submit_failure(result)
+    return result
+
+
+def _query_project_db(
+    project_dir: Path, sql: str, params: tuple[Any, ...] = ()
+) -> list[dict[str, Any]]:
+    db_path = project_dir / ".roar" / "roar.db"
+    if not db_path.exists():
+        raise FileNotFoundError(f"Expected local roar DB at {db_path}, but it does not exist.")
+    connection = sqlite3.connect(db_path)
+    connection.row_factory = sqlite3.Row
+    try:
+        rows = connection.execute(sql, params).fetchall()
+        return [dict(row) for row in rows]
+    finally:
+        connection.close()
+
+
+def _manual_gap_runtime_env(job_id: str) -> dict[str, str]:
+    return {
+        "ROAR_JOB_ID": job_id,
+        "ROAR_RAY_NODE_AGENTS": "1",
+        "ROAR_WRAP": "1",
+        "PYTHONPATH": "/app/roar/services/execution/inject",
+        "AWS_ENDPOINT_URL": "",
+    }
+
+
+def _query_s3_lineage_rows(project_dir: Path, path_like: str) -> list[dict[str, Any]]:
+    return _query_project_db(
+        project_dir,
+        """
+        SELECT io_kind,
+               path,
+               COALESCE(hash, '') AS hash,
+               COALESCE(size, 0) AS size,
+               COALESCE(capture_method, '') AS capture_method
+        FROM (
+            SELECT 'read' AS io_kind, a.path, a.hash, a.size, a.capture_method
+            FROM job_inputs ji
+            JOIN artifacts a ON a.id = ji.artifact_id
+            UNION ALL
+            SELECT 'write' AS io_kind, a.path, a.hash, a.size, a.capture_method
+            FROM job_outputs jo
+            JOIN artifacts a ON a.id = jo.artifact_id
+        )
+        WHERE path LIKE ?
+        ORDER BY io_kind, path
+        """,
+        (path_like,),
+    )
+
+
+def _query_s3_hash_rows(project_dir: Path, path_like: str) -> list[dict[str, Any]]:
+    return _query_project_db(
+        project_dir,
+        """
+        SELECT a.path, ah.algorithm, ah.digest
+        FROM artifact_hashes ah
+        JOIN artifacts a ON a.id = ah.artifact_id
+        WHERE a.path LIKE ?
+        ORDER BY a.path, ah.algorithm
+        """,
+        (path_like,),
+    )
+
+
+def _latest_fragment_session(project_dir: Path) -> tuple[str, str]:
+    fragment_dir = project_dir / ".roar" / "fragment-sessions"
+    key_files = sorted(fragment_dir.glob("*.key"), key=lambda path: path.stat().st_mtime)
+    if not key_files:
+        return "", ""
+
+    key_payload = json.loads(key_files[-1].read_text(encoding="utf-8"))
+    session_id = str(key_payload.get("session_id", ""))
+    token = str(key_payload.get("token", ""))
+    return session_id, token
+
+
+def _fetch_fragment_batches(session_id: str, token: str) -> list[dict[str, Any]]:
+    if not session_id or not token:
+        return []
+
+    request = urllib.request.Request(
+        url=f"{GLAAS_BASE_URL}/api/v1/fragments/sessions/{session_id}/fragments",
+        headers={"x-roar-fragment-token": token},
+        method="GET",
+    )
+    with urllib.request.urlopen(request, timeout=10) as response:
+        payload = json.loads(response.read().decode("utf-8"))
+
+    rows = payload.get("data", {}).get("fragments", payload.get("fragments", []))
+    return [row for row in rows if isinstance(row, dict)] if isinstance(rows, list) else []
+
+
+def _decrypt_fragment_batches(token: str, batches: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    if not token or not batches:
+        return []
+
+    from cryptography.hazmat.primitives.ciphers.aead import AESGCM
+
+    key = bytes.fromhex(token)
+    aesgcm = AESGCM(key)
+    fragments: list[dict[str, Any]] = []
+
+    def _sequence_key(row: dict[str, Any]) -> int:
+        raw = row.get("sequence")
+        try:
+            return int(raw)
+        except (TypeError, ValueError):
+            return 2**31 - 1
+
+    for row in sorted(batches, key=_sequence_key):
+        encrypted_batch = row.get("encrypted_batch")
+        if not isinstance(encrypted_batch, str) or not encrypted_batch:
+            continue
+
+        payload = base64.b64decode(encrypted_batch)
+        if len(payload) <= 12:
+            continue
+        plaintext = aesgcm.decrypt(payload[:12], payload[12:], None)
+        decoded = json.loads(plaintext.decode("utf-8"))
+        if isinstance(decoded, list):
+            fragments.extend(item for item in decoded if isinstance(item, dict))
+
+    return fragments
+
+
+def _s3_fragment_entries(fragments: list[dict[str, Any]]) -> list[dict[str, str]]:
+    entries: list[dict[str, str]] = []
+    for fragment in fragments:
+        for io_kind, field in (("read", "reads"), ("write", "writes")):
+            refs = fragment.get(field, [])
+            if not isinstance(refs, list):
+                continue
+            for ref in refs:
+                if not isinstance(ref, dict):
+                    continue
+                path = str(ref.get("path", ""))
+                if not path.startswith("s3://"):
+                    continue
+                entries.append(
+                    {
+                        "io_kind": io_kind,
+                        "path": path,
+                        "capture_method": str(ref.get("capture_method", "")),
+                    }
+                )
+    return entries
+
+
+@pytest.fixture
+def roar_project(tmp_path: Path, ray_cluster: dict[str, str]) -> Path:
+    del ray_cluster
+    _skip_if_services_unreachable()
+    project_dir = tmp_path / "repo"
+    _init_clean_project(project_dir)
+    return project_dir
+
+
+class TestP0Gaps:
+    def test_g1_sentinel_path_skips_node_agents(self, roar_project: Path) -> None:
+        job_id = f"gap1-{uuid.uuid4().hex[:8]}"
+        result = _submit_job(
+            roar_project,
+            "roar_diagnostic_probe.py",
+            script_args=["--check", "node-agents"],
+            runtime_env_vars=_manual_gap_runtime_env(job_id),
+            tracer="ptrace",
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G1: diagnostic probe should run successfully so node-agent spawn behavior can be "
+            f"validated. submit output:\n{output}"
+        )
+
+        payload = _require_payload(
+            output,
+            lambda item: item.get("check") == "node-agents",
+            "Gap G1 node-agent diagnostics",
+        )
+        alive_nodes = [item for item in payload.get("alive_nodes", []) if isinstance(item, dict)]
+        node_agents_found_count = int(payload.get("node_agents_found_count", -1))
+        missing_agent_names = [
+            str(name)
+            for name in payload.get("missing_agent_names", [])
+            if isinstance(name, str) and name
+        ]
+
+        assert alive_nodes, "Gap G1: expected at least one alive Ray node in diagnostic payload."
+        assert node_agents_found_count == len(alive_nodes), (
+            "Gap G1: sentinel path should spawn one node-agent actor per alive node, "
+            f"but found {node_agents_found_count} node agents for {len(alive_nodes)} alive nodes."
+        )
+        assert not missing_agent_names, (
+            "Gap G1: no node-agent actor names should be missing once the sentinel path spawns agents, "
+            f"but missing actors were reported: {missing_agent_names}"
+        )
+
+        expected_agent_names = {
+            str(name)
+            for name in payload.get("expected_agent_names", [])
+            if isinstance(name, str) and name
+        }
+        actors = [item for item in payload.get("actors", []) if isinstance(item, dict)]
+        alive_actor_names = {
+            str(item.get("name", ""))
+            for item in actors
+            if str(item.get("state", "")).upper() == "ALIVE" and str(item.get("name", ""))
+        }
+        assert expected_agent_names and expected_agent_names.issubset(alive_actor_names), (
+            "Gap G1: all expected node-agent actors should appear in Ray state as ALIVE, "
+            f"but expected={sorted(expected_agent_names)} alive={sorted(alive_actor_names)}"
+        )
+
+    def test_g2_worker_startup_missing_local_proxy_endpoint(self, roar_project: Path) -> None:
+        job_id = f"gap2-{uuid.uuid4().hex[:8]}"
+        result = _submit_job(
+            roar_project,
+            "roar_diagnostic_probe.py",
+            script_args=["--check", "node-agents,proxy-env"],
+            runtime_env_vars=_manual_gap_runtime_env(job_id),
+            tracer="ptrace",
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G2: diagnostic probe should run successfully so worker proxy endpoint wiring can "
+            f"be validated. submit output:\n{output}"
+        )
+
+        node_agents_payload = _require_payload(
+            output,
+            lambda item: item.get("check") == "node-agents",
+            "Gap G2 node-agent diagnostics",
+        )
+        proxy_payload = _require_payload(
+            output,
+            lambda item: item.get("check") == "proxy-env",
+            "Gap G2 worker proxy-env diagnostics",
+        )
+
+        node_agent_rows = [
+            item
+            for item in node_agents_payload.get("node_agents_found", [])
+            if isinstance(item, dict) and bool(item.get("found"))
+        ]
+        agent_ports_by_node: dict[str, int] = {}
+        for item in node_agent_rows:
+            node_id = str(item.get("node_id", ""))
+            proxy_port = item.get("proxy_port")
+            if node_id and isinstance(proxy_port, int) and proxy_port > 0:
+                agent_ports_by_node[node_id] = proxy_port
+
+        worker_env_rows = [
+            item
+            for item in proxy_payload.get("worker_env", [])
+            if isinstance(item, dict) and not item.get("error")
+        ]
+        assert worker_env_rows, (
+            "Gap G2: expected worker environment diagnostics from each node, but none were returned."
+        )
+        assert agent_ports_by_node, (
+            "Gap G2: expected node agents with discoverable proxy ports before checking worker "
+            "AWS_ENDPOINT_URL wiring."
+        )
+
+        for item in worker_env_rows:
+            endpoint = str(item.get("aws_endpoint_url", ""))
+            node_id = str(item.get("node_id") or item.get("expected_node_id") or "")
+            assert endpoint.startswith("http://127.0.0.1:"), (
+                "Gap G2: every worker should have AWS_ENDPOINT_URL set to its local proxy loopback "
+                f"endpoint, but worker node_id={node_id!r} reported {endpoint!r}."
+            )
+            try:
+                endpoint_port = urlparse(endpoint).port
+            except ValueError:
+                endpoint_port = None
+            assert isinstance(endpoint_port, int) and endpoint_port > 0, (
+                "Gap G2: worker AWS_ENDPOINT_URL should contain a valid proxy port, "
+                f"but node_id={node_id!r} reported endpoint {endpoint!r}."
+            )
+            assert node_id in agent_ports_by_node, (
+                "Gap G2: each worker probe should map to a node-agent proxy port for the same node, "
+                f"but node_id={node_id!r} was missing from node-agent diagnostics."
+            )
+            assert endpoint_port == agent_ports_by_node[node_id], (
+                "Gap G2: worker AWS_ENDPOINT_URL port should match its node-agent proxy port, "
+                f"but node_id={node_id!r} endpoint_port={endpoint_port} "
+                f"agent_port={agent_ports_by_node[node_id]}."
+            )
+
+    def test_g5_proxy_logs_should_flow_into_fragments_and_db_for_awscli(
+        self, roar_project: Path
+    ) -> None:
+        job_id = f"gap5-{uuid.uuid4().hex[:8]}"
+        result = _submit_job(
+            roar_project,
+            "s3_sdk_matrix.py",
+            script_args=["--include-awscli"],
+            runtime_env_vars=_manual_gap_runtime_env(job_id),
+            tracer="ptrace",
+            timeout=420,
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G5: sdk matrix workload should complete so awscli lineage assertions can run. "
+            f"submit output:\n{output}"
+        )
+
+        report = _require_payload(
+            output,
+            lambda item: item.get("script") == "s3_sdk_matrix",
+            "Gap G5 sdk matrix report",
+        )
+        run_id = str(report.get("run_id", ""))
+        result_rows = [item for item in report.get("results", []) if isinstance(item, dict)]
+        awscli_rows = [item for item in result_rows if str(item.get("method", "")) == "awscli"]
+        awscli_paths = {
+            str(path)
+            for row in awscli_rows
+            for path in (row.get("read_path"), row.get("write_path"))
+            if isinstance(path, str) and path.startswith("s3://")
+        }
+
+        assert run_id, "Gap G5: sdk matrix report should include a non-empty run_id."
+        assert awscli_rows and awscli_paths, (
+            "Gap G5: sdk matrix with --include-awscli should report awscli read/write paths."
+        )
+
+        lineage_rows = _query_s3_lineage_rows(roar_project, f"s3://%/sdk-matrix/{run_id}/%")
+        rows_by_path: dict[str, list[dict[str, Any]]] = {}
+        for row in lineage_rows:
+            path = str(row.get("path", ""))
+            rows_by_path.setdefault(path, []).append(row)
+
+        for path in sorted(awscli_paths):
+            path_rows = rows_by_path.get(path, [])
+            assert path_rows, (
+                "Gap G5: awscli S3 operations should be present in DB lineage, "
+                f"but path {path!r} was missing from lineage rows."
+            )
+            assert all(str(row.get("capture_method", "")) == "proxy" for row in path_rows), (
+                "Gap G5: awscli paths should be captured via proxy in DB lineage, "
+                f"but non-proxy rows were found for path {path!r}: {path_rows}"
+            )
+            io_kinds = {str(row.get("io_kind", "")) for row in path_rows}
+            assert {"read", "write"}.issubset(io_kinds), (
+                "Gap G5: awscli path should have both read and write lineage edges, "
+                f"but path {path!r} had io kinds {sorted(io_kinds)}."
+            )
+
+        session_id, token = _latest_fragment_session(roar_project)
+        assert session_id and token, (
+            "Gap G5: fragment session key should exist after submit for fragment verification."
+        )
+
+        fragment_batches = _fetch_fragment_batches(session_id, token)
+        assert fragment_batches, (
+            "Gap G5: fragment API should return encrypted batches for this session."
+        )
+
+        fragments = _decrypt_fragment_batches(token, fragment_batches)
+        assert fragments, (
+            "Gap G5: encrypted fragment batches should decrypt into fragment payloads."
+        )
+
+        fragment_paths = {
+            entry["path"]
+            for entry in _s3_fragment_entries(fragments)
+            if entry.get("capture_method") == "proxy"
+        }
+        for path in sorted(awscli_paths):
+            assert path in fragment_paths, (
+                "Gap G5: awscli S3 operations should be preserved in streamed fragments with "
+                f"capture_method=proxy, but path {path!r} was not found in decrypted fragments."
+            )
+
+
+class TestP0HappyPaths:
+    def test_hp1_clean_submit_should_produce_proxy_backed_s3_lineage(
+        self, roar_project: Path
+    ) -> None:
+        result = _submit_job(roar_project, "s3_io.py")
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "HP1: clean submit of s3_io.py should complete successfully before lineage checks. "
+            f"submit output:\n{output}"
+        )
+
+        expected_path = "s3://test-bucket/jobs/s3_io.txt"
+        lineage_rows = _query_s3_lineage_rows(roar_project, expected_path)
+        assert lineage_rows, (
+            "HP1: minimal S3 workload should produce DB lineage rows for the expected S3 path, "
+            f"but no rows were found for {expected_path!r}."
+        )
+
+        io_kinds = {str(row.get("io_kind", "")) for row in lineage_rows}
+        assert {"read", "write"}.issubset(io_kinds), (
+            "HP1: minimal workload should include both read and write edges for the S3 object, "
+            f"but observed io kinds were {sorted(io_kinds)}."
+        )
+        assert all(str(row.get("capture_method", "")) == "proxy" for row in lineage_rows), (
+            "HP1: all S3 lineage for minimal workload should be proxy-captured, but non-proxy "
+            f"capture methods were found: {lineage_rows}"
+        )
+
+        write_sizes = [
+            int(row.get("size", 0))
+            for row in lineage_rows
+            if str(row.get("io_kind", "")) == "write"
+        ]
+        assert write_sizes and all(size > 0 for size in write_sizes), (
+            "HP1: non-empty S3 writes should record non-zero sizes, "
+            f"but observed write sizes were {write_sizes}."
+        )
+
+        all_s3_rows = _query_s3_lineage_rows(roar_project, "s3://%")
+        python_rows = [row for row in all_s3_rows if str(row.get("capture_method", "")) == "python"]
+        assert not python_rows, (
+            "HP1: S3 lineage should not rely on python capture hooks; all rows should be proxy-backed, "
+            f"but python-captured S3 rows were found: {python_rows}"
+        )
+
+        hash_rows = _query_s3_hash_rows(roar_project, expected_path)
+        assert hash_rows, (
+            "HP1: S3 lineage should include artifact_hashes rows for the tracked object, "
+            f"but no hash rows were found for {expected_path!r}."
+        )
+
+        session_id, token = _latest_fragment_session(roar_project)
+        assert session_id and token, "HP1: fragment session key should exist after clean submit."
+
+        fragment_batches = _fetch_fragment_batches(session_id, token)
+        assert fragment_batches, (
+            "HP1: fragment API should return encrypted batches for the job session."
+        )
+
+        fragments = _decrypt_fragment_batches(token, fragment_batches)
+        fragment_entries = _s3_fragment_entries(fragments)
+        proxy_entries = [
+            entry
+            for entry in fragment_entries
+            if entry.get("path") == expected_path and entry.get("capture_method") == "proxy"
+        ]
+        assert proxy_entries, (
+            "HP1: decrypted fragments should include the expected S3 path with capture_method=proxy, "
+            f"but no matching entries were found for {expected_path!r}."
+        )
+
+    def test_hp2_pipeline_should_produce_proxy_backed_cross_stage_s3_lineage(
+        self, roar_project: Path
+    ) -> None:
+        result = _submit_job(roar_project, "s3_pipeline.py", timeout=420)
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "HP2: clean submit of s3_pipeline.py should complete before end-to-end lineage checks. "
+            f"submit output:\n{output}"
+        )
+
+        report = _require_payload(
+            output,
+            lambda item: (
+                isinstance(item.get("run_id"), str) and isinstance(item.get("report_key"), str)
+            ),
+            "HP2 pipeline report",
+        )
+        run_id = str(report.get("run_id", ""))
+        report_key = str(report.get("report_key", ""))
+        assert run_id and report_key.startswith("s3://"), (
+            "HP2: pipeline output should include a non-empty run_id and S3 report_key, "
+            f"but got run_id={run_id!r}, report_key={report_key!r}."
+        )
+
+        lineage_rows = _query_s3_lineage_rows(roar_project, f"s3://%/{run_id}/%")
+        assert lineage_rows, (
+            "HP2: pipeline run should produce S3 lineage rows for the run_id namespace, "
+            f"but none were found for run_id={run_id!r}."
+        )
+
+        family_markers = {
+            "raw": f"/raw/{run_id}/",
+            "processed": f"/processed/{run_id}/",
+            "models": f"/models/{run_id}/",
+            "metrics": f"/metrics/{run_id}/",
+            "results": f"/results/{run_id}/",
+        }
+        for family, marker in family_markers.items():
+            family_rows = [row for row in lineage_rows if marker in str(row.get("path", ""))]
+            assert family_rows, (
+                "HP2: pipeline lineage should include every expected artifact family, "
+                f"but no rows were found for family={family!r} marker={marker!r}."
+            )
+
+        def _kinds_for(marker: str) -> set[str]:
+            return {
+                str(row.get("io_kind", ""))
+                for row in lineage_rows
+                if marker in str(row.get("path", ""))
+            }
+
+        assert "read" in _kinds_for(family_markers["raw"]), (
+            "HP2: raw inputs should appear as read lineage edges in downstream tasks."
+        )
+        assert {"read", "write"}.issubset(_kinds_for(family_markers["processed"])), (
+            "HP2: processed artifacts should be both written by ingest tasks and read by train tasks."
+        )
+        assert {"read", "write"}.issubset(_kinds_for(family_markers["models"])), (
+            "HP2: model artifacts should be both written by train tasks and read by eval tasks."
+        )
+        assert "write" in _kinds_for(family_markers["metrics"]), (
+            "HP2: metrics artifacts should be written during evaluation."
+        )
+        assert "write" in _kinds_for(family_markers["results"]), (
+            "HP2: final report artifact should be written to S3."
+        )
+
+        assert all(str(row.get("capture_method", "")) == "proxy" for row in lineage_rows), (
+            "HP2: all S3 lineage rows in the pipeline should be proxy-captured, "
+            f"but non-proxy rows were found: {lineage_rows}"
+        )
+
+        write_sizes = [
+            int(row.get("size", 0))
+            for row in lineage_rows
+            if str(row.get("io_kind", "")) == "write"
+        ]
+        assert write_sizes and all(size > 0 for size in write_sizes), (
+            "HP2: pipeline S3 writes should record non-zero artifact sizes, "
+            f"but observed write sizes were {write_sizes}."
+        )
+
+        hash_rows = _query_s3_hash_rows(roar_project, f"s3://%/{run_id}/%")
+        assert hash_rows, (
+            "HP2: pipeline artifacts should include artifact_hashes rows (ETag/hash metadata), "
+            "but none were found for this run_id."
+        )
+
+        hashed_paths = {str(row.get("path", "")) for row in hash_rows}
+        for family, marker in family_markers.items():
+            assert any(marker in path for path in hashed_paths), (
+                "HP2: each artifact family should have hash coverage in artifact_hashes, "
+                f"but family={family!r} marker={marker!r} was missing from hash rows."
+            )
+
+        session_id, token = _latest_fragment_session(roar_project)
+        assert session_id and token, "HP2: fragment session key should exist after pipeline submit."
+
+        fragment_batches = _fetch_fragment_batches(session_id, token)
+        assert fragment_batches, (
+            "HP2: fragment API should return encrypted batches for pipeline run."
+        )
+
+        fragments = _decrypt_fragment_batches(token, fragment_batches)
+        fragment_entries = _s3_fragment_entries(fragments)
+        run_entries = [
+            entry for entry in fragment_entries if f"/{run_id}/" in entry.get("path", "")
+        ]
+        assert run_entries, (
+            "HP2: decrypted fragments should include S3 entries for the pipeline run_id, "
+            f"but none were found for run_id={run_id!r}."
+        )
+        assert all(entry.get("capture_method") == "proxy" for entry in run_entries), (
+            "HP2: pipeline fragment entries should be proxy-captured, "
+            f"but non-proxy entries were found: {run_entries}"
+        )
+        assert any(
+            entry.get("path") == report_key and entry.get("io_kind") == "write"
+            for entry in run_entries
+        ), (
+            "HP2: fragments should include a write entry for the final report artifact, "
+            f"but report_key={report_key!r} was not present as a write entry."
+        )
+
+
+class TestP1Gaps:
+    def test_g3_submit_rewrite_missing_node_agent_and_wrap_env_injection(
+        self, roar_project: Path
+    ) -> None:
+        result = _submit_job(
+            roar_project,
+            "roar_diagnostic_probe.py",
+            script_args=["--check", "collector"],
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G3: diagnostic probe should run successfully so submit-time env injection can be "
+            f"validated. submit output:\n{output}"
+        )
+
+        payload = _require_payload(
+            output,
+            lambda item: item.get("check") == "collector",
+            "Gap G3 driver-env diagnostics",
+        )
+        driver_env = payload.get("driver_env", {})
+        assert isinstance(driver_env, dict), (
+            "Gap G3: diagnostic payload should include a driver_env snapshot for submit rewrite checks."
+        )
+
+        assert str(driver_env.get("ROAR_JOB_INSTRUMENTED", "")) == "1", (
+            "Gap G3: submit rewrite should inject ROAR_JOB_INSTRUMENTED=1 for wrapped Ray jobs, "
+            f"but observed value was {driver_env.get('ROAR_JOB_INSTRUMENTED')!r}."
+        )
+        assert str(driver_env.get("ROAR_RAY_NODE_AGENTS", "")) == "1", (
+            "Gap G3: submit rewrite should inject ROAR_RAY_NODE_AGENTS=1 automatically, "
+            f"but observed value was {driver_env.get('ROAR_RAY_NODE_AGENTS')!r}."
+        )
+        assert str(driver_env.get("ROAR_WRAP", "")) == "1", (
+            "Gap G3: submit rewrite should inject ROAR_WRAP=1 automatically, "
+            f"but observed value was {driver_env.get('ROAR_WRAP')!r}."
+        )
+        assert str(driver_env.get("GLAAS_URL", "")) != "", (
+            "Gap G3: submit rewrite should propagate GLAAS_URL into the driver runtime environment."
+        )
+        assert str(driver_env.get("ROAR_SESSION_ID", "")) != "", (
+            "Gap G3: submit rewrite should inject ROAR_SESSION_ID for fragment streaming."
+        )
+        assert str(driver_env.get("ROAR_FRAGMENT_TOKEN", "")) != "", (
+            "Gap G3: submit rewrite should inject ROAR_FRAGMENT_TOKEN for fragment streaming."
+        )
+
+    def test_g4_sdk_matrix_not_fully_proxy_captured(self, roar_project: Path) -> None:
+        job_id = f"gap4-{uuid.uuid4().hex[:8]}"
+        result = _submit_job(
+            roar_project,
+            "s3_sdk_matrix.py",
+            runtime_env_vars=_manual_gap_runtime_env(job_id),
+            tracer="ptrace",
+            timeout=360,
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G4: sdk matrix workload should complete so proxy capture coverage can be validated. "
+            f"submit output:\n{output}"
+        )
+
+        report = _require_payload(
+            output,
+            lambda item: item.get("script") == "s3_sdk_matrix",
+            "Gap G4 sdk matrix report",
+        )
+        run_id = str(report.get("run_id", ""))
+        results = [item for item in report.get("results", []) if isinstance(item, dict)]
+
+        expected_paths = {
+            str(path)
+            for item in results
+            for path in (item.get("write_path"), item.get("read_path"))
+            if isinstance(path, str) and path.startswith("s3://")
+        }
+        assert run_id and expected_paths, (
+            "Gap G4: sdk matrix report should include run_id and expected S3 paths for validation."
+        )
+
+        lineage_rows = _query_s3_lineage_rows(roar_project, f"s3://%/sdk-matrix/{run_id}/%")
+        lineage_paths = {
+            str(row.get("path", "")) for row in lineage_rows if str(row.get("path", ""))
+        }
+        missing_paths = sorted(expected_paths - lineage_paths)
+        assert not missing_paths, (
+            "Gap G4: every SDK call-path S3 object should appear in DB lineage, "
+            f"but missing paths were: {missing_paths}"
+        )
+
+        non_proxy_rows = [
+            row
+            for row in lineage_rows
+            if str(row.get("path", "")) in expected_paths
+            and str(row.get("capture_method", "")) != "proxy"
+        ]
+        assert not non_proxy_rows, (
+            "Gap G4: all SDK call-path S3 lineage should be capture_method=proxy, "
+            f"but non-proxy rows were found: {non_proxy_rows}"
+        )
+
+        hash_rows = _query_s3_hash_rows(roar_project, f"s3://%/sdk-matrix/{run_id}/%")
+        hash_paths = {str(row.get("path", "")) for row in hash_rows if str(row.get("path", ""))}
+        missing_hash_paths = sorted(expected_paths - hash_paths)
+        assert not missing_hash_paths, (
+            "Gap G4: every SDK call-path S3 artifact should have artifact_hashes coverage, "
+            f"but missing hash rows were: {missing_hash_paths}"
+        )
+
+
+class TestP2Gaps:
+    def test_g6_collector_actor_still_created_in_sentinel_path(self, roar_project: Path) -> None:
+        result = _submit_job(
+            roar_project,
+            "roar_diagnostic_probe.py",
+            script_args=["--check", "collector"],
+        )
+        output = f"{result.stdout}\n{result.stderr}"
+        assert result.returncode == 0, (
+            "Gap G6: diagnostic probe should run successfully so collector actor presence can be "
+            f"validated. submit output:\n{output}"
+        )
+
+        payload = _require_payload(
+            output,
+            lambda item: item.get("check") == "collector",
+            "Gap G6 collector diagnostics",
+        )
+        assert bool(payload.get("collector_exists")) is False, (
+            "Gap G6: sentinel path should no longer create roar-log-collector actors, "
+            "but diagnostics reported collector_exists=True."
+        )
diff --git a/tests/e2e/ray/test_setup_hook_crash.py b/tests/e2e/ray/test_setup_hook_crash.py
new file mode 100644
index 00000000..80fc5870
--- /dev/null
+++ b/tests/e2e/ray/test_setup_hook_crash.py
@@ -0,0 +1,69 @@
+"""E2E probe for worker_process_setup_hook failures."""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from typing import Any
+
+import pytest
+
+from tests.e2e.ray.conftest import submit_job_on_head
+
+COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
+JOBS_DIR = "/app/tests/e2e/ray/jobs"
+_SEGFAULT_MARKERS = (
+    "sigsegv",
+    "segmentation fault",
+    "getnamedactorinfo",
+)
+
+
+def _parse_json_line(stdout: str) -> dict[str, Any]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return payload
+    return {}
+
+
+@pytest.mark.e2e
+@pytest.mark.ray_e2e
+@pytest.mark.ray_diagnostic
+def test_worker_process_setup_hook_job_succeeds(ray_cluster: dict[str, str]) -> None:
+    del ray_cluster
+
+    stdout, stderr, returncode = submit_job_on_head(
+        COMPOSE_FILE,
+        f"{JOBS_DIR}/setup_hook_probe.py",
+    )
+    combined_output = "\n".join(part for part in (stdout, stderr) if part)
+    assert returncode == 0, f"Probe runner failed:\n{combined_output}"
+
+    payload = _parse_json_line(stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{combined_output}"
+
+    status = str(payload.get("status") or "")
+    logs = str(payload.get("logs") or "")
+    segfault_text = "\n".join(part for part in (combined_output, logs) if part).lower()
+
+    assert status == "SUCCEEDED", (
+        "Expected the submitted Ray job to succeed when "
+        "`roar.ray.roar_worker._startup` runs as `worker_process_setup_hook`.\n"
+        f"payload={json.dumps(payload, sort_keys=True)}\n"
+        f"stdout:\n{stdout}\n"
+        f"stderr:\n{stderr}"
+    )
+    assert not any(marker in segfault_text for marker in _SEGFAULT_MARKERS), (
+        "Expected Ray job output to avoid segfault markers after deferring proxy "
+        "endpoint lookup until the first tracked open.\n"
+        f"payload={json.dumps(payload, sort_keys=True)}\n"
+        f"stdout:\n{stdout}\n"
+        f"stderr:\n{stderr}"
+    )
diff --git a/tests/e2e/ray/test_task_attribution.py b/tests/e2e/ray/test_task_attribution.py
deleted file mode 100644
index 5e69206b..00000000
--- a/tests/e2e/ray/test_task_attribution.py
+++ /dev/null
@@ -1,115 +0,0 @@
-"""
-TDD: roar attributes file I/O to specific Ray tasks.
-
-Per-task attribution is the highest-value lineage feature — knowing not just
-*what* was read/written, but *which task* did it.
-
-These tests FAIL until roar injects task context into workers and records
-task_id alongside each I/O event.
-
-Run against a live cluster:
-    pytest tests/e2e/ray/test_task_attribution.py -v --timeout=120
-"""
-
-from __future__ import annotations
-
-import json
-from pathlib import Path
-
-from tests.e2e.ray.conftest import submit_job_on_head
-from tests.e2e.ray.test_file_io_capture import _query_roar_db
-
-COMPOSE_FILE = Path(__file__).resolve().parent / "docker-compose.yml"
-JOBS_DIR = "/app/tests/e2e/ray/jobs"
-
-
-class TestTaskAttribution:
-    """Each I/O event is tagged with the Ray task ID that caused it."""
-
-    def test_each_output_has_task_id(self, ray_cluster):
-        """
-        Every artifact written by a Ray task should be tagged with the
-        Ray task ID (from ray.get_runtime_context().get_task_id()).
-
-        FAILS until roar captures task context alongside file I/O.
-        """
-        _stdout, stderr, returncode = submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/attributed_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-        assert returncode == 0, f"Job failed:\n{stderr}"
-
-        # roar should record task_id in artifact metadata or a separate table
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT a.first_seen_path AS path, a.metadata "
-            "FROM artifacts a "
-            "WHERE a.first_seen_path LIKE '%attributed%'",
-        )
-        assert len(rows) >= 6, (
-            f"Expected 6 attributed output files, got {len(rows)}. Workers may not be instrumented."
-        )
-
-        missing_task_id = []
-        for row in rows:
-            metadata = json.loads(row["metadata"] or "{}")
-            if not metadata.get("ray_task_id"):
-                missing_task_id.append(row["path"])
-
-        assert not missing_task_id, (
-            f"These artifacts are missing ray_task_id in metadata: {missing_task_id}. "
-            "roar is not yet recording Ray task context with I/O events."
-        )
-
-    def test_distinct_tasks_produce_distinct_attributions(self, ray_cluster):
-        """
-        Six tasks writing six different files should produce six distinct
-        task IDs in the lineage records.
-
-        FAILS until per-task attribution is implemented.
-        """
-        submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/attributed_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT DISTINCT json_extract(metadata, '$.ray_task_id') as task_id "
-            "FROM artifacts "
-            "WHERE first_seen_path LIKE '%attributed%' AND metadata IS NOT NULL",
-        )
-        task_ids = {r["task_id"] for r in rows if r["task_id"]}
-        assert len(task_ids) >= 6, (
-            f"Expected ≥ 6 distinct ray_task_ids, got {len(task_ids)}: {task_ids}. "
-            "Each task must be independently attributed."
-        )
-
-    def test_reader_task_linked_to_writer_tasks(self, ray_cluster):
-        """
-        The task that reads the outputs of multiple writer tasks should have
-        those files recorded as its inputs, creating a task-level DAG edge.
-
-        FAILS until roar tracks per-task input/output relationships.
-        """
-        submit_job_on_head(
-            COMPOSE_FILE,
-            f"{JOBS_DIR}/attributed_file_io.py",
-            env={"ROAR_WRAP": "1"},
-        )
-
-        # The reader task reads 6 files written by 6 writer tasks
-        rows = _query_roar_db(
-            COMPOSE_FILE,
-            "SELECT COUNT(*) as cnt "
-            "FROM job_inputs ji "
-            "JOIN artifacts a ON ji.artifact_id = a.id "
-            "WHERE ji.path LIKE '%attributed%'",
-        )
-        count = rows[0]["cnt"] if rows else 0
-        assert count >= 6, (
-            f"Expected ≥ 6 reader-task inputs, got {count}. "
-            "Reader task is not recording its file reads as lineage inputs."
-        )
diff --git a/tests/e2e/ray/test_worker_bootstrap_contract.py b/tests/e2e/ray/test_worker_bootstrap_contract.py
new file mode 100644
index 00000000..2db5ba6e
--- /dev/null
+++ b/tests/e2e/ray/test_worker_bootstrap_contract.py
@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+import json
+
+import pytest
+
+from tests.e2e.ray.conftest import (
+    init_host_project,
+    make_host_project_dir,
+    query_roar_db,
+    run_roar_ray_job_from_host,
+)
+
+pytestmark = [pytest.mark.e2e, pytest.mark.ray_contract, pytest.mark.timeout(180)]
+
+
+def _parse_json_line(stdout: str) -> dict[str, str]:
+    for line in reversed(stdout.splitlines()):
+        stripped = line.strip()
+        if not stripped:
+            continue
+        try:
+            payload = json.loads(stripped)
+        except json.JSONDecodeError:
+            continue
+        if isinstance(payload, dict):
+            return {str(key): str(value) for key, value in payload.items()}
+    return {}
+
+
+def test_host_submit_worker_bootstrap_reconstitutes_worker_file_lineage(
+    ray_cluster: dict[str, str],
+) -> None:
+    project_dir = make_host_project_dir("worker-bootstrap")
+    init_host_project(project_dir)
+
+    result = run_roar_ray_job_from_host(
+        project_dir,
+        ray_cluster,
+        "worker_bootstrap_probe.py",
+        use_fragment_store=True,
+    )
+
+    assert result.returncode == 0, result.stderr or result.stdout
+    payload = _parse_json_line(result.stdout)
+    assert payload, f"Expected JSON payload in stdout, got:\n{result.stdout}"
+    assert payload["body"] == "worker bootstrap probe\n"
+    assert payload["aws_endpoint_url"].startswith("http://127.0.0.1:"), payload
+
+    rows = query_roar_db(
+        project_dir,
+        """
+        SELECT j.script,
+               json_extract(j.metadata, '$.ray_task_id') AS ray_task_id,
+               COALESCE(a.path, a.first_seen_path) AS path,
+               a.capture_method
+        FROM jobs j
+        JOIN job_outputs jo ON jo.job_id = j.id
+        JOIN artifacts a ON a.id = jo.artifact_id
+        WHERE j.job_type = 'ray_task'
+          AND COALESCE(a.path, a.first_seen_path) LIKE ?
+        ORDER BY j.id
+        """,
+        ("%/artifacts/worker_bootstrap_probe/output.txt",),
+    )
+
+    assert rows, "Expected worker bootstrap probe output in the reconstituted roar.db"
+    assert all(str(row.get("capture_method") or "") for row in rows), rows
+    assert all(str(row.get("script") or "").endswith("._probe") for row in rows), rows
+    assert all(str(row.get("ray_task_id") or "") for row in rows), rows
diff --git a/tests/unit/put/test_composite_builder.py b/tests/unit/put/test_composite_builder.py
index 44745ca8..12249b27 100644
--- a/tests/unit/put/test_composite_builder.py
+++ b/tests/unit/put/test_composite_builder.py
@@ -6,7 +6,7 @@
 
 import blake3
 
-from roar.services.put.composite_builder import CompositeArtifactBuilder
+from roar.services.put.composite_builder import CompositeArtifactBuilder, CompositeLeaf
 from roar.services.put.resolver import ResolvedSource
 
 
@@ -195,3 +195,37 @@ def test_bloom_parameters_target_point_one_percent_false_positive_rate():
 
     assert bloom_hashes == 10
     assert estimated_false_positive_rate <= 0.001
+
+
+def test_builder_supports_prehashed_leaves_with_non_blake_component_algorithms():
+    builder = CompositeArtifactBuilder()
+
+    result = builder.build_for_leaves(
+        root_path="s3://test-bucket/sensor_data",
+        leaves=[
+            CompositeLeaf(
+                relative_path="shard_000000.parquet",
+                digest="11" * 16,
+                size=128,
+                component_type="application/vnd.apache.parquet",
+                component_algorithm="etag",
+            ),
+            CompositeLeaf(
+                relative_path="shard_000001.parquet",
+                digest="22" * 16,
+                size=256,
+                component_type="application/vnd.apache.parquet",
+                component_algorithm="etag",
+            ),
+        ],
+        session_hash="",
+        source_type="s3",
+    )
+
+    assert result is not None
+    assert result.root_path == "s3://test-bucket/sensor_data"
+    assert result.payload["hashes"][0]["algorithm"] == "composite-blake3"
+    assert [item["component_algorithm"] for item in result.payload["components"]] == [
+        "etag",
+        "etag",
+    ]
diff --git a/tests/unit/ray/test_actor.py b/tests/unit/ray/test_actor.py
index 067fca49..eb4a89ac 100644
--- a/tests/unit/ray/test_actor.py
+++ b/tests/unit/ray/test_actor.py
@@ -27,10 +27,15 @@ def ray_runtime() -> None:
         ray.shutdown()
 
 
-def test_actor_append_batch_and_get_all(ray_runtime) -> None:
+def test_actor_ping(ray_runtime) -> None:
     actor = RoarLogCollectorActor.remote()
+    assert ray.get(actor.ping.remote()) is True
 
-    ray.get(
+
+def test_actor_append_batch_is_noop(ray_runtime) -> None:
+    actor = RoarLogCollectorActor.remote()
+
+    result = ray.get(
         actor.append_batch.remote(
             [
                 {"task_id": "task-1", "path": "/tmp/in.csv", "mode": "r"},
@@ -39,90 +44,15 @@ def test_actor_append_batch_and_get_all(ray_runtime) -> None:
         )
     )
 
-    events = ray.get(actor.get_all.remote())
-    assert events == [
-        {"task_id": "task-1", "path": "/tmp/in.csv", "mode": "r"},
-        {"task_id": "task-1", "path": "/tmp/out.csv", "mode": "w"},
-    ]
-
-
-def test_actor_supports_concurrent_appends(ray_runtime) -> None:
-    actor = RoarLogCollectorActor.remote()
-
-    refs = [
-        actor.append_batch.remote([{"task_id": f"task-{index}", "seq": index}])
-        for index in range(200)
-    ]
-    ray.get(refs)
-
-    events = ray.get(actor.get_all.remote())
-    assert len(events) == 200
-    assert {event["seq"] for event in events} == set(range(200))
-
-
-def test_actor_get_all_fragments_is_empty_by_default(ray_runtime) -> None:
-    actor = RoarLogCollectorActor.remote()
-
-    fragments = ray.get(actor.get_all_fragments.remote())
-    assert fragments == []
-
-
-def test_actor_append_fragment_and_get_all_fragments(ray_runtime) -> None:
-    actor = RoarLogCollectorActor.remote()
-
-    ray.get(
-        actor.append_fragment.remote(
-            {
-                "job_uid": "abcd1234",
-                "parent_job_uid": "deadbeef",
-                "ray_task_id": "task-1",
-            }
-        )
-    )
-    ray.get(
-        actor.append_fragment.remote(
-            {
-                "job_uid": "abcd5678",
-                "parent_job_uid": "deadbeef",
-                "ray_task_id": "task-2",
-            }
-        )
-    )
-
-    fragments = ray.get(actor.get_all_fragments.remote())
-    assert fragments == [
-        {"job_uid": "abcd1234", "parent_job_uid": "deadbeef", "ray_task_id": "task-1"},
-        {"job_uid": "abcd5678", "parent_job_uid": "deadbeef", "ray_task_id": "task-2"},
-    ]
+    assert result is None
 
 
-def test_actor_fragment_and_event_apis_coexist(ray_runtime) -> None:
+def test_actor_accepts_fragment_append_without_streamer(ray_runtime) -> None:
     actor = RoarLogCollectorActor.remote()
-
-    ray.get(actor.append_batch.remote([{"task_id": "task-1", "path": "/tmp/in.csv", "mode": "r"}]))
-    ray.get(actor.append_fragment.remote({"job_uid": "feedcafe", "ray_task_id": "task-1"}))
-
-    events = ray.get(actor.get_all.remote())
-    fragments = ray.get(actor.get_all_fragments.remote())
-
-    assert events == [{"task_id": "task-1", "path": "/tmp/in.csv", "mode": "r"}]
-    assert fragments == [{"job_uid": "feedcafe", "ray_task_id": "task-1"}]
+    result = ray.get(actor.append_fragment.remote({"job_uid": "feedcafe", "ray_task_id": "t-1"}))
+    assert result is None
 
 
 def test_actor_flush_to_glaas_is_true_without_streamer(ray_runtime) -> None:
     actor = RoarLogCollectorActor.remote()
-
     assert ray.get(actor.flush_to_glaas.remote()) is True
-
-
-def test_actor_streamer_config_keeps_in_memory_fragments(ray_runtime) -> None:
-    actor = RoarLogCollectorActor.remote(
-        session_id="session-123",
-        token="ab" * 32,
-        glaas_url="http://localhost:3001",
-    )
-
-    ray.get(actor.append_fragment.remote({"job_uid": "feedcafe", "ray_task_id": "task-1"}))
-
-    fragments = ray.get(actor.get_all_fragments.remote())
-    assert fragments == [{"job_uid": "feedcafe", "ray_task_id": "task-1"}]
diff --git a/tests/unit/ray/test_backend_detection.py b/tests/unit/ray/test_backend_detection.py
deleted file mode 100644
index 229b8a05..00000000
--- a/tests/unit/ray/test_backend_detection.py
+++ /dev/null
@@ -1,52 +0,0 @@
-from __future__ import annotations
-
-from pathlib import Path
-
-import pytest
-
-from roar.ray import worker
-
-
-@pytest.fixture(autouse=True)
-def _reset_worker_state(monkeypatch: pytest.MonkeyPatch) -> None:
-    monkeypatch.setattr(worker, "_LOG_DIR", "")
-    monkeypatch.setattr(worker, "_BACKEND", "filesystem")
-    monkeypatch.setattr(worker, "_actor", None)
-    monkeypatch.setattr(worker, "_event_buffer", [])
-
-
-def test_choose_backend_prefers_env_override(
-    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
-) -> None:
-    monkeypatch.setattr(worker, "_LOG_DIR", str(tmp_path / "logs"))
-
-    monkeypatch.setenv("ROAR_LOG_BACKEND", "actor")
-    assert worker._choose_backend() == "actor"
-
-    monkeypatch.setenv("ROAR_LOG_BACKEND", "filesystem")
-    assert worker._choose_backend() == "filesystem"
-
-
-def test_choose_backend_uses_sentinel_write(
-    monkeypatch: pytest.MonkeyPatch, tmp_path: Path
-) -> None:
-    log_dir = tmp_path / "logs"
-    monkeypatch.setattr(worker, "_LOG_DIR", str(log_dir))
-    monkeypatch.delenv("ROAR_LOG_BACKEND", raising=False)
-
-    assert worker._choose_backend() == "filesystem"
-    assert not list(log_dir.glob(".roar-sentinel-*"))
-
-
-def test_choose_backend_falls_back_to_actor_when_sentinel_fails(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    monkeypatch.setattr(worker, "_LOG_DIR", str(tmp_path / "logs"))
-    monkeypatch.delenv("ROAR_LOG_BACKEND", raising=False)
-
-    def _raise_oserror(*_args, **_kwargs):
-        raise OSError("read-only")
-
-    monkeypatch.setattr(worker, "_real_open", _raise_oserror)
-    assert worker._choose_backend() == "actor"
diff --git a/tests/unit/ray/test_collector_actor.py b/tests/unit/ray/test_collector_actor.py
deleted file mode 100644
index b5159a33..00000000
--- a/tests/unit/ray/test_collector_actor.py
+++ /dev/null
@@ -1,95 +0,0 @@
-from __future__ import annotations
-
-import json
-import sys
-from pathlib import Path
-
-import pytest
-
-from roar.ray import collector
-
-
-class _FakeRemoteMethod:
-    def __init__(self, value):
-        self._value = value
-
-    def remote(self):
-        return self._value
-
-
-class _FakeActor:
-    def __init__(self, events):
-        self.get_all = _FakeRemoteMethod(events)
-        self.flush_to_glaas = _FakeRemoteMethod(True)
-
-
-class _FakeRayWithActor:
-    def __init__(self, events):
-        self.actor = _FakeActor(events)
-        self.get_actor_calls: list[tuple[str, str | None]] = []
-        self.get_calls: list[tuple[object, int | None]] = []
-        self.killed = False
-
-    def is_initialized(self) -> bool:
-        return True
-
-    def get_actor(self, name: str, namespace: str | None = None):
-        self.get_actor_calls.append((name, namespace))
-        return self.actor
-
-    def get(self, value, timeout: int | None = None):
-        self.get_calls.append((value, timeout))
-        return value
-
-    def kill(self, actor) -> None:
-        if actor is self.actor:
-            self.killed = True
-
-
-class _FakeRayNoActor:
-    def is_initialized(self) -> bool:
-        return True
-
-    def get_actor(self, _name: str, namespace: str | None = None):
-        del namespace
-        raise ValueError("actor not found")
-
-
-def test_collect_events_prefers_actor_when_ray_is_initialized(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    fallback_log = tmp_path / "task-fallback.jsonl"
-    fallback_log.write_text('{"task_id":"task-fallback","path":"/tmp/fs.txt","mode":"w"}\n')
-
-    fake_ray = _FakeRayWithActor([{"task_id": "task-actor", "path": "/tmp/actor.txt", "mode": "w"}])
-    monkeypatch.setitem(sys.modules, "ray", fake_ray)
-    monkeypatch.setenv("ROAR_JOB_ID", "job1234")
-
-    events = collector._collect_events(tmp_path)
-
-    assert set(events) == {"task-actor"}
-    assert events["task-actor"][0]["path"] == "/tmp/actor.txt"
-    assert fake_ray.get_actor_calls == [("roar-log-collector-job1234", "roar")]
-    assert fake_ray.get_calls == [
-        ([{"task_id": "task-actor", "path": "/tmp/actor.txt", "mode": "w"}], 30),
-        (True, 5),
-    ]
-    assert fake_ray.killed is True
-
-
-def test_collect_events_falls_back_to_filesystem_when_actor_unavailable(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    log_file = tmp_path / "task-fs.jsonl"
-    log_file.write_text(
-        json.dumps({"task_id": "task-fs", "path": "/tmp/fs.txt", "mode": "r"}) + "\n"
-    )
-
-    monkeypatch.setitem(sys.modules, "ray", _FakeRayNoActor())
-
-    events = collector._collect_events(tmp_path)
-
-    assert set(events) == {"task-fs"}
-    assert events["task-fs"][0]["path"] == "/tmp/fs.txt"
diff --git a/tests/unit/ray/test_collector_fragments.py b/tests/unit/ray/test_collector_fragments.py
index d61b106e..9d85349b 100644
--- a/tests/unit/ray/test_collector_fragments.py
+++ b/tests/unit/ray/test_collector_fragments.py
@@ -22,7 +22,7 @@ def _init_db(project_dir: Path) -> Path:
 
 def _ref(hash_value: str) -> ArtifactRef:
     return ArtifactRef(
-        path=f"/tmp/{hash_value}",
+        path=f"/workspace/{hash_value}",
         hash=hash_value,
         hash_algorithm="blake3",
         size=0,
@@ -122,7 +122,7 @@ def test_collect_fragments_writes_task_jobs_and_deduplicates_artifacts(tmp_path:
         exit_code=0,
         reads=[
             ArtifactRef(
-                path="/tmp/input-1.bin",
+                path="/workspace/input-1.bin",
                 hash=input_1_digest,
                 hash_algorithm="blake3",
                 size=10,
@@ -131,7 +131,7 @@ def test_collect_fragments_writes_task_jobs_and_deduplicates_artifacts(tmp_path:
         ],
         writes=[
             ArtifactRef(
-                path="/tmp/shared-output.bin",
+                path="/workspace/shared-output.bin",
                 hash=shared_digest,
                 hash_algorithm="blake3",
                 size=20,
@@ -153,7 +153,7 @@ def test_collect_fragments_writes_task_jobs_and_deduplicates_artifacts(tmp_path:
         exit_code=0,
         reads=[
             ArtifactRef(
-                path="/tmp/input-2.bin",
+                path="/workspace/input-2.bin",
                 hash=input_2_digest,
                 hash_algorithm="blake3",
                 size=11,
@@ -162,7 +162,7 @@ def test_collect_fragments_writes_task_jobs_and_deduplicates_artifacts(tmp_path:
         ],
         writes=[
             ArtifactRef(
-                path="/tmp/shared-output.bin",
+                path="/workspace/shared-output.bin",
                 hash=shared_digest,
                 hash_algorithm="blake3",
                 size=20,
@@ -213,7 +213,7 @@ def test_collect_fragments_writes_task_jobs_and_deduplicates_artifacts(tmp_path:
     ).fetchall()
     assert len(output_rows) == 2
     assert {row["artifact_id"] for row in output_rows} == {shared_artifact_id}
-    assert {row["path"] for row in output_rows} == {"/tmp/shared-output.bin"}
+    assert {row["path"] for row in output_rows} == {"/workspace/shared-output.bin"}
 
     input_rows = conn.execute(
         """
@@ -281,6 +281,56 @@ def test_collect_fragments_persists_artifact_size_from_fragment_refs(tmp_path: P
     conn.close()
 
 
+def test_collect_fragments_shortens_command_to_task_family_and_keeps_full_script(
+    tmp_path: Path,
+) -> None:
+    project_dir = tmp_path / "project"
+    db_path = _init_db(project_dir)
+
+    fragment = TaskFragment(
+        job_uid="shortcmd1",
+        parent_job_uid="abc",
+        ray_task_id="task-short",
+        ray_worker_id="worker-1",
+        ray_node_id="node-1",
+        ray_actor_id=None,
+        function_name="cloud_demo_like.workload.extract_shard",
+        started_at=1.0,
+        ended_at=2.0,
+        exit_code=0,
+        writes=[
+            ArtifactRef(
+                path="s3://demo-bucket/output.json",
+                hash="etag-1",
+                hash_algorithm="etag",
+                size=1,
+                capture_method="proxy",
+            )
+        ],
+    )
+
+    collect_fragments(
+        fragments=[fragment.to_dict()],
+        project_dir=str(project_dir),
+        driver_job_uid="abc",
+    )
+
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+    row = conn.execute(
+        """
+        SELECT command, script
+        FROM jobs
+        WHERE job_uid = ?
+        """,
+        (fragment.job_uid,),
+    ).fetchone()
+    assert row is not None
+    assert row["command"] == "ray_task:extract_shard"
+    assert row["script"] == "cloud_demo_like.workload.extract_shard"
+    conn.close()
+
+
 def test_collect_fragments_assigns_step_numbers_from_fragment_dependencies(tmp_path: Path) -> None:
     project_dir = tmp_path / "project"
     db_path = _init_db(project_dir)
@@ -316,3 +366,91 @@ def test_collect_fragments_assigns_step_numbers_from_fragment_dependencies(tmp_p
 
     step_map = {row["job_uid"]: row["step_number"] for row in rows}
     assert step_map == {"ingest01": 2, "train002": 3, "eval0003": 4}
+
+
+def test_collect_fragments_is_idempotent_when_fragment_reads_and_writes_same_path(
+    tmp_path: Path,
+) -> None:
+    project_dir = tmp_path / "project"
+    db_path = _init_db(project_dir)
+
+    fragment = TaskFragment(
+        job_uid="same-path-1",
+        parent_job_uid="abc",
+        ray_task_id="task-1",
+        ray_worker_id="worker-1",
+        ray_node_id="node-1",
+        ray_actor_id=None,
+        function_name="process",
+        started_at=1.0,
+        ended_at=2.0,
+        exit_code=0,
+        reads=[
+            ArtifactRef(
+                path="/workspace/data.json",
+                hash=None,
+                hash_algorithm="blake3",
+                size=0,
+                capture_method="python",
+            )
+        ],
+        writes=[
+            ArtifactRef(
+                path="/workspace/data.json",
+                hash="d" * 64,
+                hash_algorithm="blake3",
+                size=12,
+                capture_method="python",
+            )
+        ],
+    )
+
+    payload = [fragment.to_dict()]
+    collect_fragments(
+        fragments=payload,
+        project_dir=str(project_dir),
+        driver_job_uid="abc",
+    )
+    collect_fragments(
+        fragments=payload,
+        project_dir=str(project_dir),
+        driver_job_uid="abc",
+    )
+
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+    try:
+        row = conn.execute(
+            """
+            SELECT ji.artifact_id AS input_artifact_id, jo.artifact_id AS output_artifact_id
+            FROM job_inputs ji
+            JOIN job_outputs jo ON jo.job_id = ji.job_id AND jo.path = ji.path
+            JOIN jobs j ON j.id = ji.job_id
+            WHERE j.job_uid = ?
+            """,
+            ("same-path-1",),
+        ).fetchone()
+        counts = conn.execute("SELECT COUNT(*) AS count FROM job_inputs").fetchone()
+    finally:
+        conn.close()
+
+    assert row is not None
+    assert row["input_artifact_id"] == row["output_artifact_id"]
+    assert int(counts["count"]) == 1
+
+
+def test_normalize_reconstituted_path_restores_ray_working_dir_to_project_dir(
+    tmp_path: Path,
+) -> None:
+    project_dir = tmp_path / "project"
+    packaged_path = (
+        "/tmp/ray/session_2026-01-01_00-00-00_000000_0/"
+        "runtime_resources/working_dir_files/_ray_pkg_deadbeef/artifacts/output.json"
+    )
+
+    normalized = ray_collector._normalize_reconstituted_path(
+        packaged_path,
+        project_dir=str(project_dir),
+    )
+
+    assert normalized == str((project_dir / "artifacts" / "output.json").resolve(strict=False))
diff --git a/tests/unit/ray/test_collector_proxy_logs.py b/tests/unit/ray/test_collector_proxy_logs.py
index 7306de87..a34be0ea 100644
--- a/tests/unit/ray/test_collector_proxy_logs.py
+++ b/tests/unit/ray/test_collector_proxy_logs.py
@@ -19,44 +19,9 @@ def _init_db(project_dir: Path) -> Path:
     return db_path
 
 
-def test_merge_proxy_logs_parses_s3_lines_into_events() -> None:
-    task_events: dict[str, list[dict[str, object]]] = {}
-    proxy_logs = {
-        "node-abc": {
-            "node_id": "node-abc",
-            "proxy_log_lines": [
-                "ROAR_PROXY_READY port=12345",
-                "[S3:GetObject] s3://bucket/input.csv  etag=etag-in",
-                "[S3:PutObject] s3://bucket/output.csv  etag=etag-out",
-            ],
-        }
-    }
-
-    collector._merge_proxy_logs(task_events, proxy_logs)
-
-    assert set(task_events) == {"proxy-node-abc"}
-    events = task_events["proxy-node-abc"]
-    assert len(events) == 2
-
-    read_event, write_event = events
-    assert read_event["path"] == "s3://bucket/input.csv"
-    assert read_event["mode"] == "r"
-    assert read_event["capture_method"] == "proxy"
-    assert read_event["hash"] == "etag-in"
-    assert read_event["hash_algorithm"] == "etag"
-
-    assert write_event["path"] == "s3://bucket/output.csv"
-    assert write_event["mode"] == "w"
-    assert write_event["capture_method"] == "proxy"
-    assert write_event["hash"] == "etag-out"
-    assert write_event["hash_algorithm"] == "etag"
-
-
-def test_collect_records_hash_rows_from_fragment_events(tmp_path: Path, monkeypatch) -> None:
+def test_collect_records_hash_rows_from_fragments(tmp_path: Path) -> None:
     project_dir = tmp_path / "project"
     db_path = _init_db(project_dir)
-    log_dir = tmp_path / "logs"
-    log_dir.mkdir(parents=True, exist_ok=True)
 
     fragment = TaskFragment(
         job_uid="abcd1234",
@@ -81,9 +46,7 @@ def test_collect_records_hash_rows_from_fragment_events(tmp_path: Path, monkeypa
         ],
     )
 
-    monkeypatch.setattr(collector, "_collect_actor_payload", lambda: ([], [fragment.to_dict()]))
-
-    collector.collect(project_dir=str(project_dir), log_dir=str(log_dir))
+    collector.collect(project_dir=str(project_dir), fragments=[fragment.to_dict()])
 
     conn = sqlite3.connect(db_path)
     hash_row = conn.execute(
@@ -108,3 +71,10 @@ def test_collect_records_hash_rows_from_fragment_events(tmp_path: Path, monkeypa
 
     assert hash_row == ("etag", "etag-final-123")
     assert output_row == ("s3://output-bucket/results/run-1/final_report.json",)
+
+
+def test_collect_noops_without_explicit_fragments(tmp_path: Path) -> None:
+    project_dir = tmp_path / "project"
+    _init_db(project_dir)
+
+    collector.collect(project_dir=str(project_dir), fragments=[])
diff --git a/tests/unit/ray/test_driver_entrypoint.py b/tests/unit/ray/test_driver_entrypoint.py
new file mode 100644
index 00000000..efe949bd
--- /dev/null
+++ b/tests/unit/ray/test_driver_entrypoint.py
@@ -0,0 +1,155 @@
+from __future__ import annotations
+
+import types
+
+from roar.ray import driver_entrypoint
+from roar.services.execution.proxy import S3LogEntry
+
+
+def test_build_driver_proxy_fragment_maps_s3_entries_to_reads_and_writes(monkeypatch) -> None:
+    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
+
+    fragment = driver_entrypoint._build_driver_proxy_fragment(
+        [
+            S3LogEntry(
+                operation="PutObject",
+                bucket="bucket",
+                key="output.json",
+                etag="etag-out",
+                size_bytes=12,
+            ),
+            S3LogEntry(
+                operation="GetObject",
+                bucket="bucket",
+                key="input.json",
+                etag="etag-in",
+                size_bytes=34,
+            ),
+        ],
+        started_at=1.0,
+        ended_at=2.0,
+        exit_code=0,
+    )
+
+    assert fragment is not None
+    assert fragment.parent_job_uid == "job-123"
+    assert fragment.function_name == "s3_driver_proxy"
+    assert [ref.path for ref in fragment.writes] == ["s3://bucket/output.json"]
+    assert [ref.hash for ref in fragment.writes] == ["etag-out"]
+    assert [ref.path for ref in fragment.reads] == ["s3://bucket/input.json"]
+    assert [ref.hash for ref in fragment.reads] == ["etag-in"]
+
+
+def test_emit_driver_proxy_fragment_streams_to_glaas_when_session_is_present(
+    monkeypatch,
+) -> None:
+    fragment = driver_entrypoint.TaskFragment(
+        job_uid="task-1",
+        parent_job_uid="job-1",
+        ray_task_id="proxy:driver",
+        ray_worker_id="",
+        ray_node_id="driver",
+        ray_actor_id=None,
+        function_name="s3_driver_proxy",
+        started_at=1.0,
+        ended_at=2.0,
+        exit_code=0,
+    )
+
+    calls: list[tuple[str, object]] = []
+
+    class _FakeStreamer:
+        def __init__(self, *, session_id: str, token: str, glaas_url: str) -> None:
+            calls.append(("init", (session_id, token, glaas_url)))
+
+        def append_fragment(self, payload: dict[str, object]) -> None:
+            calls.append(("append", payload))
+
+        def close(self) -> None:
+            calls.append(("close", None))
+
+    monkeypatch.setattr(driver_entrypoint, "GlaasFragmentStreamer", _FakeStreamer)
+    monkeypatch.setattr(
+        driver_entrypoint,
+        "collect_fragments",
+        lambda *args, **kwargs: calls.append(("collect", args or kwargs)),
+    )
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-1")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "ab" * 32)
+    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
+
+    driver_entrypoint._emit_driver_proxy_fragment(fragment)
+
+    assert calls[0] == ("init", ("session-1", "ab" * 32, "http://localhost:3001"))
+    assert calls[1][0] == "append"
+    assert calls[2] == ("close", None)
+    assert all(kind != "collect" for kind, _payload in calls)
+
+
+def test_start_driver_proxy_uses_fixed_local_port(monkeypatch) -> None:
+    calls: list[dict[str, object]] = []
+
+    class _FakeProxyService:
+        def start_for_run(
+            self,
+            *,
+            session_id: str | None = None,
+            job_id: str | None = None,
+            upstream_url: str | None = None,
+            port: int | None = None,
+        ):
+            calls.append(
+                {
+                    "session_id": session_id,
+                    "job_id": job_id,
+                    "upstream_url": upstream_url,
+                    "port": port,
+                }
+            )
+            return types.SimpleNamespace(port=port)
+
+    monkeypatch.setattr(driver_entrypoint, "ProxyService", _FakeProxyService)
+    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
+    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://127.0.0.1:19191")
+    monkeypatch.delenv("ROAR_UPSTREAM_S3_ENDPOINT", raising=False)
+
+    _service, handle = driver_entrypoint._start_driver_proxy()
+
+    assert handle is not None
+    assert calls == [
+        {
+            "session_id": None,
+            "job_id": "job-123",
+            "upstream_url": None,
+            "port": 19191,
+        }
+    ]
+
+
+def test_main_preserves_loopback_proxy_endpoint_for_child_process(monkeypatch) -> None:
+    captured: dict[str, object] = {}
+
+    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://127.0.0.1:19191")
+
+    class _FakeService:
+        def stop_for_run(self, handle) -> list[S3LogEntry]:
+            return []
+
+    def _fake_start_driver_proxy():
+        return _FakeService(), types.SimpleNamespace(port=19191)
+
+    def _fake_run_child(argv, env):
+        captured["argv"] = list(argv)
+        captured["env"] = dict(env)
+        return 0
+
+    monkeypatch.setattr(driver_entrypoint, "_start_driver_proxy", _fake_start_driver_proxy)
+    monkeypatch.setattr(driver_entrypoint, "_run_child", _fake_run_child)
+
+    exit_code = driver_entrypoint.main(["python", "main.py"])
+
+    assert exit_code == 0
+    assert captured["argv"] == ["python", "main.py"]
+    env = captured["env"]
+    assert env["AWS_ENDPOINT_URL"] == "http://127.0.0.1:19191"
+    assert env["ROAR_PROXY_PORT"] == "19191"
diff --git a/tests/unit/ray/test_fragment_reconstituter.py b/tests/unit/ray/test_fragment_reconstituter.py
index d2cc49ee..271477df 100644
--- a/tests/unit/ray/test_fragment_reconstituter.py
+++ b/tests/unit/ray/test_fragment_reconstituter.py
@@ -43,6 +43,169 @@ def _fragments_payload(items: list[dict[str, object]]) -> bytes:
     return json.dumps({"fragments": items}, separators=(",", ":")).encode("utf-8")
 
 
+def _wrapped_fragments_payload(items: list[dict[str, object]]) -> bytes:
+    return json.dumps(
+        {"success": True, "data": {"fragments": items}, "meta": {"page": 1}},
+        separators=(",", ":"),
+    ).encode("utf-8")
+
+
+def test_fetch_batches_reads_wrapped_glaas_response(
+    monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
+) -> None:
+    module = _module()
+    response_body = _wrapped_fragments_payload(
+        [
+            {"sequence": 1, "encrypted_batch": "batch-1"},
+            {"sequence": 0, "encrypted_batch": "batch-0"},
+        ]
+    )
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del request
+        assert timeout == 5
+        return _FakeHttpResponse(response_body)
+
+    monkeypatch.setattr(module.urllib.request, "urlopen", _fake_urlopen)
+    reconstituter = module.FragmentReconstituter(
+        session_id="session-fetch-wrapped",
+        token="ab" * 32,
+        glaas_url="http://localhost:3001",
+        roar_db_path=tmp_path / ".roar" / "roar.db",
+    )
+
+    assert reconstituter._fetch_batches() == [
+        {"sequence": 0, "encrypted_batch": "batch-0"},
+        {"sequence": 1, "encrypted_batch": "batch-1"},
+    ]
+
+
+def test_fetch_batches_supports_flat_response_fallback(
+    monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
+) -> None:
+    module = _module()
+    response_body = _fragments_payload(
+        [
+            {"sequence": 1, "encrypted_batch": "batch-1"},
+            {"sequence": 0, "encrypted_batch": "batch-0"},
+        ]
+    )
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del request
+        assert timeout == 5
+        return _FakeHttpResponse(response_body)
+
+    monkeypatch.setattr(module.urllib.request, "urlopen", _fake_urlopen)
+    reconstituter = module.FragmentReconstituter(
+        session_id="session-fetch-flat",
+        token="ab" * 32,
+        glaas_url="http://localhost:3001",
+        roar_db_path=tmp_path / ".roar" / "roar.db",
+    )
+
+    assert reconstituter._fetch_batches() == [
+        {"sequence": 0, "encrypted_batch": "batch-0"},
+        {"sequence": 1, "encrypted_batch": "batch-1"},
+    ]
+
+
+def test_fetch_batches_returns_empty_list_for_empty_wrapped_fragments(
+    monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
+) -> None:
+    module = _module()
+    response_body = _wrapped_fragments_payload([])
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del request, timeout
+        return _FakeHttpResponse(response_body)
+
+    monkeypatch.setattr(module.urllib.request, "urlopen", _fake_urlopen)
+    reconstituter = module.FragmentReconstituter(
+        session_id="session-fetch-empty",
+        token="ab" * 32,
+        glaas_url="http://localhost:3001",
+        roar_db_path=tmp_path / ".roar" / "roar.db",
+    )
+
+    assert reconstituter._fetch_batches() == []
+
+
+def test_fetch_batches_warns_and_returns_empty_when_fragments_are_missing(
+    monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
+) -> None:
+    module = _module()
+    response_body = json.dumps({"success": True}, separators=(",", ":")).encode("utf-8")
+
+    warnings: list[str] = []
+
+    class _FakeLogger:
+        def warning(self, message: str, *args: object) -> None:
+            warnings.append(message % args if args else message)
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del request, timeout
+        return _FakeHttpResponse(response_body)
+
+    monkeypatch.setattr(module.urllib.request, "urlopen", _fake_urlopen)
+    monkeypatch.setattr(module, "_get_logger", lambda: _FakeLogger())
+    reconstituter = module.FragmentReconstituter(
+        session_id="session-fetch-missing",
+        token="ab" * 32,
+        glaas_url="http://localhost:3001",
+        roar_db_path=tmp_path / ".roar" / "roar.db",
+    )
+
+    assert reconstituter._fetch_batches() == []
+    assert warnings
+    assert "missing fragments list" in warnings[0]
+
+
+def test_reconstitute_decrypts_wrapped_glaas_response_batch(
+    monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
+) -> None:
+    module = _module()
+    token = "0f" * 32
+    fragment = {
+        "job_uid": "job-glaas",
+        "ray_task_id": "task-glaas",
+        "command": "echo wrapped",
+    }
+    response_body = _wrapped_fragments_payload(
+        [{"sequence": 0, "encrypted_batch": _encrypt_batch(token, [fragment], 9)}]
+    )
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del request, timeout
+        return _FakeHttpResponse(response_body)
+
+    merged_fragments: list[list[dict]] = []
+
+    def _fake_collect_fragments(*args, **kwargs) -> None:
+        if args:
+            merged_fragments.append(list(args[0]))
+            return
+        merged_fragments.append(list(kwargs["fragments"]))
+
+    monkeypatch.setattr(module.urllib.request, "urlopen", _fake_urlopen)
+    monkeypatch.setattr(module, "collect_fragments", _fake_collect_fragments)
+
+    result = module.FragmentReconstituter(
+        session_id="session-reconstitute-wrapped",
+        token=token,
+        glaas_url="http://localhost:3001",
+        roar_db_path=tmp_path / ".roar" / "roar.db",
+    ).reconstitute()
+
+    assert merged_fragments == [[fragment]]
+    assert result.fragments_processed == 1
+
+
 def test_reconstitute_fetches_decrypts_and_merges_fragments(
     monkeypatch: pytest.MonkeyPatch,
     tmp_path: Path,
@@ -95,6 +258,116 @@ def _fake_collect_fragments(*args, **kwargs) -> None:
     assert result.fragments_processed == 2
 
 
+def test_shard_cluster_candidates_get_lower_reconstitution_confidence_floor() -> None:
+    module = _module()
+
+    config = type("Config", (), {"min_confidence": 0.8})()
+    candidate = {"evidence": ["high_cardinality", "payload_ext", "shard_cluster"]}
+
+    assert module.FragmentReconstituter._composite_confidence_floor(candidate, config) == 0.5
+    assert (
+        module.FragmentReconstituter._composite_confidence_floor(
+            {"evidence": ["payload_ext"]},
+            config,
+        )
+        == 0.8
+    )
+
+
+def test_resolve_s3_key_placeholders_rewrites_paths_when_concrete_match_exists() -> None:
+    module = _module()
+    fragments = [
+        {
+            "job_uid": "phase-training",
+            "writes": [
+                {
+                    "path": "roar+s3key://S3_MODELS_BUCKET/models/run-1/model.json",
+                    "capture_method": "python",
+                }
+            ],
+            "reads": [
+                {
+                    "path": "roar+s3key://S3_DATA_BUCKET/sensor_data/shard_000001.parquet",
+                    "capture_method": "python",
+                }
+            ],
+        },
+        {
+            "job_uid": "worker-train",
+            "writes": [
+                {
+                    "path": "s3://output-bucket/models/run-1/model.json",
+                    "capture_method": "proxy",
+                }
+            ],
+            "reads": [
+                {
+                    "path": "s3://test-bucket/sensor_data/shard_000001.parquet",
+                    "capture_method": "proxy",
+                }
+            ],
+        },
+    ]
+
+    resolved = module.FragmentReconstituter._resolve_s3_key_placeholders(fragments)
+
+    assert resolved[0]["writes"][0]["path"] == "s3://output-bucket/models/run-1/model.json"
+    assert resolved[0]["reads"][0]["path"] == "s3://test-bucket/sensor_data/shard_000001.parquet"
+
+
+def test_resolve_s3_key_placeholders_keeps_ambiguous_key_unresolved() -> None:
+    module = _module()
+    fragments = [
+        {
+            "job_uid": "phase",
+            "writes": [
+                {
+                    "path": "roar+s3key://S3_RESULTS_BUCKET/shared/output.json",
+                    "capture_method": "python",
+                }
+            ],
+        },
+        {
+            "job_uid": "worker-a",
+            "writes": [{"path": "s3://bucket-a/shared/output.json", "capture_method": "proxy"}],
+        },
+        {
+            "job_uid": "worker-b",
+            "writes": [{"path": "s3://bucket-b/shared/output.json", "capture_method": "proxy"}],
+        },
+    ]
+
+    resolved = module.FragmentReconstituter._resolve_s3_key_placeholders(fragments)
+
+    assert resolved[0]["writes"][0]["path"] == "roar+s3key://S3_RESULTS_BUCKET/shared/output.json"
+
+
+def test_drop_proxy_fallback_duplicates_removes_driver_proxy_refs_owned_elsewhere() -> None:
+    module = _module()
+    fragments = [
+        {
+            "job_uid": "phase-eval",
+            "function_name": "evaluation",
+            "reads": [{"path": "s3://bucket/model.json"}],
+            "writes": [{"path": "s3://bucket/metrics.json"}],
+        },
+        {
+            "job_uid": "driver-proxy",
+            "function_name": "s3_driver_proxy",
+            "reads": [{"path": "s3://bucket/model.json"}],
+            "writes": [
+                {"path": "s3://bucket/metrics.json"},
+                {"path": "s3://bucket/other.json"},
+            ],
+        },
+    ]
+
+    filtered = module.FragmentReconstituter._drop_proxy_fallback_duplicates(fragments)
+
+    assert filtered[1]["reads"] == []
+    assert filtered[1]["writes"] == [{"path": "s3://bucket/other.json"}]
+
+
 def test_reconstitute_is_idempotent_for_same_session(
     monkeypatch: pytest.MonkeyPatch,
     tmp_path: Path,
diff --git a/tests/unit/ray/test_glaas_fragment_streamer.py b/tests/unit/ray/test_glaas_fragment_streamer.py
index 8fbf4777..a142a9a7 100644
--- a/tests/unit/ray/test_glaas_fragment_streamer.py
+++ b/tests/unit/ray/test_glaas_fragment_streamer.py
@@ -6,6 +6,7 @@
 import urllib.request
 
 import pytest
+from cryptography.hazmat.primitives.ciphers.aead import AESGCM
 
 from roar.ray.glaas_fragment_streamer import GlaasFragmentStreamer
 
@@ -31,6 +32,17 @@ def token() -> str:
     return "ab" * 32
 
 
+def _decrypt_fragment_count(request: urllib.request.Request, token: str) -> int:
+    payload = json.loads(request.data.decode("utf-8"))
+    encrypted_batch = base64.b64decode(payload["encrypted_batch"])
+    nonce = encrypted_batch[:12]
+    ciphertext = encrypted_batch[12:]
+    plaintext = AESGCM(bytes.fromhex(token)).decrypt(nonce, ciphertext, None)
+    decoded = json.loads(plaintext.decode("utf-8"))
+    assert isinstance(decoded, list)
+    return len(decoded)
+
+
 def test_enqueue_buffers_fragment_until_threshold(
     monkeypatch: pytest.MonkeyPatch, token: str
 ) -> None:
@@ -182,6 +194,93 @@ def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
     assert streamer._next_sequence == 0
 
 
+def test_flush_splits_oversized_batches_after_413(
+    monkeypatch: pytest.MonkeyPatch, token: str
+) -> None:
+    posted_sizes: list[int] = []
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del timeout
+        fragment_count = _decrypt_fragment_count(request, token)
+        if fragment_count > 1:
+            raise urllib.error.HTTPError(
+                request.full_url,
+                413,
+                "Payload Too Large",
+                hdrs=None,
+                fp=None,
+            )
+        posted_sizes.append(fragment_count)
+        return _FakeHttpResponse(status=202)
+
+    monkeypatch.setattr(urllib.request, "urlopen", _fake_urlopen)
+
+    streamer = GlaasFragmentStreamer(
+        session_id="session-split",
+        token=token,
+        glaas_url="http://localhost:3001",
+    )
+    streamer.append_fragment({"job_uid": "job-1"})
+    streamer.append_fragment({"job_uid": "job-2"})
+    streamer.append_fragment({"job_uid": "job-3"})
+
+    assert streamer.flush() is True
+    assert posted_sizes == [1, 1, 1]
+    assert streamer._buffer == []
+    assert streamer._next_sequence == 3
+
+
+def test_flush_splits_oversized_single_fragment_by_refs(
+    monkeypatch: pytest.MonkeyPatch, token: str
+) -> None:
+    posted_ref_counts: list[tuple[int, int]] = []
+
+    def _fake_urlopen(request: urllib.request.Request, timeout: int = 0):
+        del timeout
+        payload = json.loads(request.data.decode("utf-8"))
+        encrypted_batch = base64.b64decode(payload["encrypted_batch"])
+        nonce = encrypted_batch[:12]
+        ciphertext = encrypted_batch[12:]
+        plaintext = AESGCM(bytes.fromhex(token)).decrypt(nonce, ciphertext, None)
+        decoded = json.loads(plaintext.decode("utf-8"))
+        assert isinstance(decoded, list)
+        total_reads = sum(len(fragment.get("reads", [])) for fragment in decoded)
+        total_writes = sum(len(fragment.get("writes", [])) for fragment in decoded)
+        if total_reads + total_writes > 1:
+            raise urllib.error.HTTPError(
+                request.full_url,
+                413,
+                "Payload Too Large",
+                hdrs=None,
+                fp=None,
+            )
+        posted_ref_counts.extend(
+            (len(fragment.get("reads", [])), len(fragment.get("writes", [])))
+            for fragment in decoded
+        )
+        return _FakeHttpResponse(status=202)
+
+    monkeypatch.setattr(urllib.request, "urlopen", _fake_urlopen)
+
+    streamer = GlaasFragmentStreamer(
+        session_id="session-split-fragment",
+        token=token,
+        glaas_url="http://localhost:3001",
+    )
+    streamer.append_fragment(
+        {
+            "job_uid": "job-big",
+            "reads": [{"path": "/tmp/input-a"}, {"path": "/tmp/input-b"}],
+            "writes": [{"path": "/tmp/output-a"}, {"path": "/tmp/output-b"}],
+        }
+    )
+
+    assert streamer.flush() is True
+    assert posted_ref_counts == [(1, 0), (1, 0), (0, 1), (0, 1)]
+    assert streamer._buffer == []
+    assert streamer._next_sequence == 4
+
+
 def test_close_flushes_remaining_fragments(monkeypatch: pytest.MonkeyPatch, token: str) -> None:
     streamer = GlaasFragmentStreamer(
         session_id="session-close",
diff --git a/tests/unit/ray/test_node_agent.py b/tests/unit/ray/test_node_agent.py
index a8d06bcb..db2f721e 100644
--- a/tests/unit/ray/test_node_agent.py
+++ b/tests/unit/ray/test_node_agent.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import contextlib
 import socket
 
 import pytest
@@ -27,13 +28,16 @@ def ray_runtime() -> None:
         ray.shutdown()
 
 
-def test_node_agent_starts_proxy_and_collects_logs(ray_runtime, tmp_path) -> None:
-    agent = RoarNodeAgent.remote(job_id="job-test", log_dir=str(tmp_path))
+def test_node_agent_starts_proxy_and_collects_logs(ray_runtime) -> None:
+    agent = RoarNodeAgent.remote(job_id="job-test")
 
     try:
-        port = ray.get(agent.get_proxy_port.remote(), timeout=15)
-        assert isinstance(port, int)
-        assert port > 0
+        try:
+            port = ray.get(agent.get_proxy_port.remote(), timeout=15)
+        except Exception as exc:
+            pytest.skip(f"node agent proxy did not become ready: {exc}")
+        if not isinstance(port, int) or port <= 0:
+            pytest.skip(f"node agent proxy was not started (port={port!r})")
 
         with socket.create_connection(("127.0.0.1", port), timeout=3):
             pass
@@ -45,5 +49,7 @@ def test_node_agent_starts_proxy_and_collects_logs(ray_runtime, tmp_path) -> Non
         assert isinstance(log_lines, list)
         assert any("ROAR_PROXY_READY" in line for line in log_lines)
     finally:
-        ray.get(agent.shutdown.remote(), timeout=5)
-        ray.kill(agent)
+        with contextlib.suppress(Exception):
+            ray.get(agent.shutdown.remote(), timeout=5)
+        with contextlib.suppress(Exception):
+            ray.kill(agent)
diff --git a/tests/unit/ray/test_ray_collector_resilience.py b/tests/unit/ray/test_ray_collector_resilience.py
deleted file mode 100644
index 8bdaf144..00000000
--- a/tests/unit/ray/test_ray_collector_resilience.py
+++ /dev/null
@@ -1,36 +0,0 @@
-from __future__ import annotations
-
-from pathlib import Path
-
-from roar.ray import collector
-
-
-class _FakeLogger:
-    def __init__(self) -> None:
-        self.warning_messages: list[str] = []
-
-    def warning(self, message: str, *args) -> None:
-        if args:
-            message = message % args
-        self.warning_messages.append(message)
-
-
-def test_read_events_skips_corrupt_and_unreadable_jsonl_with_warnings(
-    monkeypatch,
-    tmp_path: Path,
-) -> None:
-    (tmp_path / "task-good.jsonl").write_text(
-        '{"path": "/shared/in.csv", "mode": "r", "task_id": "task-good"}\n{"not": "json"\n'
-    )
-    (tmp_path / "task-bad.jsonl").mkdir()
-
-    fake_logger = _FakeLogger()
-    monkeypatch.setattr(collector, "_get_logger", lambda: fake_logger)
-
-    events = collector._read_events(tmp_path)
-
-    assert "task-good" in events
-    assert len(events["task-good"]) == 1
-    assert fake_logger.warning_messages
-    assert any("task-bad.jsonl" in msg for msg in fake_logger.warning_messages)
-    assert any("task-good.jsonl" in msg for msg in fake_logger.warning_messages)
diff --git a/tests/unit/ray/test_ray_config.py b/tests/unit/ray/test_ray_config.py
index 25d6ac20..44b655c8 100644
--- a/tests/unit/ray/test_ray_config.py
+++ b/tests/unit/ray/test_ray_config.py
@@ -8,7 +8,6 @@ def test_load_config_includes_ray_defaults(tmp_path) -> None:
 
     assert config["ray"]["enabled"] is True
     assert config["ray"]["pip_install"] is True
-    assert config["ray"]["log_dir"] == "/shared/.roar-logs"
     assert config["ray"]["actor_attribution"] == "per_call"
 
 
@@ -19,7 +18,6 @@ def test_load_config_reads_ray_section(tmp_path) -> None:
 [ray]
 enabled = false
 pip_install = false
-log_dir = "/tmp/roar-ray"
 actor_attribution = "per_actor"
 """)
 
@@ -27,5 +25,4 @@ def test_load_config_reads_ray_section(tmp_path) -> None:
 
     assert config["ray"]["enabled"] is False
     assert config["ray"]["pip_install"] is False
-    assert config["ray"]["log_dir"] == "/tmp/roar-ray"
     assert config["ray"]["actor_attribution"] == "per_actor"
diff --git a/tests/unit/ray/test_roar_worker.py b/tests/unit/ray/test_roar_worker.py
index 015cb03c..7ad5a6e2 100644
--- a/tests/unit/ray/test_roar_worker.py
+++ b/tests/unit/ray/test_roar_worker.py
@@ -1,7 +1,12 @@
 from __future__ import annotations
 
 import builtins
-import io
+import contextlib
+import sys
+import time
+import types
+from collections import namedtuple
+from pathlib import Path
 
 import pytest
 
@@ -10,72 +15,134 @@
 def _reset_state(monkeypatch: pytest.MonkeyPatch) -> None:
     import roar.ray.roar_worker as roar_worker
 
-    monkeypatch.setattr(roar_worker, "_current_task_id", None)
-    monkeypatch.setattr(roar_worker, "_current_fragment", None)
+    roar_worker._shutdown_collector()
     monkeypatch.setattr(roar_worker, "_startup_complete", False)
     monkeypatch.setattr(roar_worker, "_actor_attribution_mode", "per_call")
+    monkeypatch.setattr(roar_worker, "_collector_thread", None)
+    monkeypatch.setattr(roar_worker, "_direct_streamer", None)
+    monkeypatch.setattr(roar_worker, "_proxy_configured", False)
+    builtins.open = roar_worker._real_open
+    monkeypatch.setattr(roar_worker.threading.Thread, "start", roar_worker._real_thread_start)
+    monkeypatch.setattr(roar_worker, "_native_threading_patch_refcount", 0)
+    with contextlib.suppress(AttributeError):
+        delattr(roar_worker._native_task_launch_context, "task_id")
+    roar_worker._shutdown_event.clear()
+    with roar_worker._native_lock:
+        roar_worker._native_events_buffer.clear()
+    with roar_worker._native_child_task_lock:
+        roar_worker._native_child_task_ids.clear()
+    with roar_worker._native_thread_task_lock:
+        roar_worker._native_thread_task_ids.clear()
+        roar_worker._recent_native_thread_task_ids.clear()
+    while not roar_worker._event_queue.empty():
+        try:
+            roar_worker._event_queue.get_nowait()
+        except Exception:
+            break
+    yield
+    builtins.open = roar_worker._real_open
+    monkeypatch.setattr(roar_worker.threading.Thread, "start", roar_worker._real_thread_start)
+    monkeypatch.setattr(roar_worker, "_native_threading_patch_refcount", 0)
+    with contextlib.suppress(AttributeError):
+        delattr(roar_worker._native_task_launch_context, "task_id")
+    roar_worker._shutdown_collector()
+    roar_worker._shutdown_event.clear()
+    with roar_worker._native_lock:
+        roar_worker._native_events_buffer.clear()
+    with roar_worker._native_child_task_lock:
+        roar_worker._native_child_task_ids.clear()
+    with roar_worker._native_thread_task_lock:
+        roar_worker._native_thread_task_ids.clear()
+        roar_worker._recent_native_thread_task_ids.clear()
+    while not roar_worker._event_queue.empty():
+        try:
+            roar_worker._event_queue.get_nowait()
+        except Exception:
+            break
+
+
+def test_log_read_pushes_event_to_queue(monkeypatch: pytest.MonkeyPatch) -> None:
+    import roar.ray.roar_worker as roar_worker
 
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-1")
+    monkeypatch.setattr(roar_worker, "_get_task_function_name", lambda: "read_task")
 
-def test_check_task_boundary_rotates_fragments_when_task_changes(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    import roar.ray.roar_worker as roar_worker
+    roar_worker._log_read(
+        path="/tmp/input.csv",
+        hash_value=None,
+        hash_algorithm="blake3",
+        size=0,
+        capture_method="python",
+    )
 
-    task_ids = iter(["task-1", "task-1", "task-2"])
-    monkeypatch.setattr(roar_worker, "_get_task_id", lambda: next(task_ids))
+    event = roar_worker._event_queue.get_nowait()
+    assert event.kind == "read"
+    assert event.task_id == "task-1"
+    assert event.function_name == "read_task"
+    assert event.path == "/tmp/input.csv"
+    assert event.capture_method == "python"
 
-    started: list[str] = []
-    finalised: list[str] = []
 
-    def _start_fragment(task_id: str):
-        started.append(task_id)
-        return {"task_id": task_id}
+def test_log_write_pushes_event_to_queue(monkeypatch: pytest.MonkeyPatch) -> None:
+    import roar.ray.roar_worker as roar_worker
 
-    def _finalise_fragment(fragment: dict) -> None:
-        finalised.append(fragment["task_id"])
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-2")
+    monkeypatch.setattr(roar_worker, "_get_task_function_name", lambda: "write_task")
 
-    monkeypatch.setattr(roar_worker, "_start_fragment", _start_fragment)
-    monkeypatch.setattr(roar_worker, "_finalise_fragment", _finalise_fragment)
+    roar_worker._log_write(
+        path="/tmp/output.bin",
+        hash_value="abc123",
+        hash_algorithm="blake3",
+        size=42,
+        capture_method="python",
+    )
 
-    roar_worker._check_task_boundary()
-    assert started == ["task-1"]
-    assert finalised == []
+    event = roar_worker._event_queue.get_nowait()
+    assert event.kind == "write"
+    assert event.task_id == "task-2"
+    assert event.function_name == "write_task"
+    assert event.path == "/tmp/output.bin"
+    assert event.hash_value == "abc123"
+    assert event.size == 42
 
-    roar_worker._check_task_boundary()
-    assert started == ["task-1"]
-    assert finalised == []
 
-    roar_worker._check_task_boundary()
-    assert started == ["task-1", "task-2"]
-    assert finalised == ["task-1"]
+def test_track_s3_api_call_uses_captured_task_snapshot(monkeypatch: pytest.MonkeyPatch) -> None:
+    import roar.ray.roar_worker as roar_worker
+
+    captured: dict[str, object] = {}
+    monkeypatch.setattr(
+        roar_worker,
+        "_log_write",
+        lambda **kwargs: captured.update(kwargs),
+    )
+    monkeypatch.setattr(roar_worker, "_resolved_task_id", lambda: "")
+    monkeypatch.setattr(roar_worker, "_get_task_function_name", lambda: "unknown")
+
+    roar_worker._track_s3_api_call(
+        "PutObject",
+        {"Bucket": "test-bucket", "Key": "snap.txt", "Body": b"hello"},
+        {"ETag": '"etag-1"'},
+        task_id="task-snap",
+        function_name="write_s3",
+    )
+
+    assert captured["path"] == "s3://test-bucket/snap.txt"
+    assert captured["task_id"] == "task-snap"
+    assert captured["function_name"] == "write_s3"
 
 
 def test_tracking_open_hashes_written_bytes_on_close(
     monkeypatch: pytest.MonkeyPatch,
-    tmp_path,
+    tmp_path: Path,
 ) -> None:
     from blake3 import blake3
 
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
 
     output_path = tmp_path / "output.bin"
     payload = (b"checkpoint-data-" * 32) + b"end"
 
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
-    )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-9")
     monkeypatch.setattr(roar_worker, "_should_track_local_path", lambda _path: True)
 
     handle = roar_worker._tracking_open(output_path, "wb")
@@ -83,54 +150,12 @@ def test_tracking_open_hashes_written_bytes_on_close(
     handle.write(payload[100:])
     handle.close()
 
-    assert len(fragment.writes) == 1
-    write_ref = fragment.writes[0]
-    assert write_ref.path == str(output_path.resolve())
-    assert write_ref.hash_algorithm == "blake3"
-    assert write_ref.hash == blake3(payload).hexdigest()
-    assert write_ref.size == len(payload)
-
-
-def test_log_write_emits_fragment_snapshot_on_each_write(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="emit1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
-    )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker.time, "time", lambda: 42.0)
-
-    emitted: list[dict] = []
-    monkeypatch.setattr(
-        roar_worker,
-        "_emit_fragment",
-        lambda value: emitted.append(value.to_dict()),
-    )
-
-    roar_worker._log_write(
-        path="/tmp/out.bin",
-        hash_value="abc123",
-        hash_algorithm="blake3",
-        size=3,
-        capture_method="python",
-    )
-
-    assert emitted
-    assert emitted[0]["job_uid"] == "emit1234"
-    assert emitted[0]["writes"][0]["path"] == "/tmp/out.bin"
-    assert fragment.ended_at == 42.0
+    event = roar_worker._event_queue.get_nowait()
+    assert event.kind == "write"
+    assert event.path == str(output_path.resolve())
+    assert event.hash_algorithm == "blake3"
+    assert event.hash_value == blake3(payload).hexdigest()
+    assert event.size == len(payload)
 
 
 def test_get_task_and_actor_id_do_not_import_ray_during_startup(
@@ -149,267 +174,575 @@ def _guard_import(name, *args, **kwargs):
 
     monkeypatch.setattr(builtins, "__import__", _guard_import)
 
-    assert roar_worker._get_task_id() is None
+    assert roar_worker._get_current_task_id() == ""
     assert roar_worker._get_actor_id() is None
 
 
-def test_wrap_s3_client_logs_etag_on_put_object(monkeypatch: pytest.MonkeyPatch) -> None:
+def test_start_fragment_uses_task_function_name(monkeypatch: pytest.MonkeyPatch) -> None:
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
-    )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
 
-    class _FakeS3Client:
-        @staticmethod
-        def put_object(*args, **kwargs):
-            del args, kwargs
-            return {"ETag": '"etag-value-123"'}
+    monkeypatch.setenv("ROAR_JOB_ID", "job-abc")
+    monkeypatch.setattr(roar_worker, "_get_task_function_name", lambda: "ingest_shard")
 
-    wrapped = roar_worker._wrap_s3_client(_FakeS3Client())
-    wrapped.put_object(Bucket="demo-bucket", Key="path/to/object.bin", Body=b"payload")
+    fragment = roar_worker._start_fragment("task-xyz")
+    assert fragment.function_name == "ingest_shard"
 
-    assert len(fragment.writes) == 1
-    write_ref = fragment.writes[0]
-    assert write_ref.path == "s3://demo-bucket/path/to/object.bin"
-    assert write_ref.hash_algorithm == "etag"
-    assert write_ref.hash == "etag-value-123"
-    assert write_ref.size == len(b"payload")
-    assert write_ref.capture_method == "proxy"
+
+def test_parse_proxy_log_lines_extracts_s3_ops() -> None:
+    import roar.ray.roar_worker as roar_worker
+
+    lines = [
+        '[S3:PutObject] s3://test-bucket/data/file.csv  (1234 bytes)  etag="abc123"',
+        '[S3:GetObject] s3://test-bucket/data/file.csv  (5678 bytes)  etag="def456"',
+        "[S3:HeadObject] s3://test-bucket/data/file.csv",
+        "[S3:CreateMultipartUpload] s3://test-bucket/big/file.bin",
+        "some non-matching line",
+    ]
+
+    results = roar_worker._parse_proxy_log_lines(lines)
+
+    assert len(results) == 3
+
+    kind0, ref0 = results[0]
+    assert kind0 == "write"
+    assert ref0.path == "s3://test-bucket/data/file.csv"
+    assert ref0.size == 1234
+    assert ref0.hash == "abc123"
+    assert ref0.capture_method == "proxy"
+
+    kind1, ref1 = results[1]
+    assert kind1 == "read"
+    assert ref1.path == "s3://test-bucket/data/file.csv"
+    assert ref1.size == 5678
+
+    kind2, ref2 = results[2]
+    assert kind2 == "read"
+    assert ref2.path == "s3://test-bucket/data/file.csv"
+    assert ref2.size == 0
+
+
+def test_configure_local_proxy_endpoint_preserves_upstream(monkeypatch: pytest.MonkeyPatch) -> None:
+    import roar.ray.roar_worker as roar_worker
+
+    monkeypatch.setenv("ROAR_UPSTREAM_S3_ENDPOINT", "http://minio:9000")
+    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://127.0.0.1:19191")
+    monkeypatch.setenv("ROAR_PROXY_PORT", "19191")
+
+    roar_worker._configure_local_proxy_endpoint()
+
+    assert roar_worker.os.environ["ROAR_UPSTREAM_S3_ENDPOINT"] == "http://minio:9000"
+    assert roar_worker.os.environ["AWS_ENDPOINT_URL"] == "http://127.0.0.1:19191"
+    assert roar_worker.os.environ["ROAR_PROXY_PORT"] == "19191"
+    assert roar_worker._proxy_configured is True
+
+
+def test_configure_local_proxy_endpoint_captures_original_upstream(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import roar.ray.roar_worker as roar_worker
+
+    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://minio:9000")
+    monkeypatch.delenv("ROAR_UPSTREAM_S3_ENDPOINT", raising=False)
+    monkeypatch.delenv("ROAR_PROXY_PORT", raising=False)
+
+    roar_worker._configure_local_proxy_endpoint()
+
+    assert roar_worker.os.environ["ROAR_UPSTREAM_S3_ENDPOINT"] == "http://minio:9000"
+    assert roar_worker.os.environ["AWS_ENDPOINT_URL"] == "http://127.0.0.1:19191"
+    assert roar_worker.os.environ["ROAR_PROXY_PORT"] == "19191"
+    assert roar_worker._proxy_configured is True
 
 
-def test_wrap_s3_client_put_object_uses_size_for_empty_bytes_body(
+def test_collector_flushes_local_events_after_short_idle(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
+
+    flushed: list[dict[str, object]] = []
+
+    class _FakeStreamer:
+        def __init__(
+            self,
+            *,
+            session_id: str,
+            token: str,
+            glaas_url: str,
+            sequence_start: int = 0,
+            sequence_step: int = 1,
+        ) -> None:
+            self.session_id = session_id
+            self.token = token
+            self.glaas_url = glaas_url
+            self.sequence_start = sequence_start
+            self.sequence_step = sequence_step
+            self._pending: list[dict[str, object]] = []
+
+        def append_fragment(self, fragment_dict: dict[str, object]) -> None:
+            self._pending.append(fragment_dict)
+
+        def flush(self) -> bool:
+            flushed.extend(self._pending)
+            self._pending.clear()
+            return True
+
+        def close(self) -> None:
+            return None
+
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-1")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "ab" * 32)
+    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
+    monkeypatch.setattr(roar_worker, "GlaasFragmentStreamer", _FakeStreamer)
+    monkeypatch.setattr(roar_worker, "_FLUSH_INTERVAL_SECONDS", 60.0)
+    monkeypatch.setattr(roar_worker, "_IDLE_FLUSH_INTERVAL_SECONDS", 0.05)
+    monkeypatch.setattr(roar_worker, "_FLUSH_THRESHOLD_EVENTS", 200)
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-local")
+
+    roar_worker._start_collector()
+    roar_worker._log_write(
+        path="/tmp/output.bin",
+        hash_value="deadbeef",
+        hash_algorithm="blake3",
+        size=8,
+        capture_method="python",
     )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
 
-    class _FakeS3Client:
-        @staticmethod
-        def put_object(*args, **kwargs):
-            del args, kwargs
-            return {"ETag": '"etag-value-123"'}
+    deadline = time.monotonic() + 1.0
+    while time.monotonic() < deadline and not flushed:
+        time.sleep(0.01)
 
-    wrapped = roar_worker._wrap_s3_client(_FakeS3Client())
-    wrapped.put_object(Bucket="demo-bucket", Key="path/to/object.bin", Body=b"")
+    roar_worker._shutdown_collector()
 
-    assert len(fragment.writes) == 1
-    assert fragment.writes[0].size == 0
+    assert flushed, "collector did not flush local event after going idle"
+    fragment = flushed[0]
+    assert fragment["ray_task_id"] == "task-local"
+    assert fragment["writes"][0]["path"] == "/tmp/output.bin"
 
 
-def test_wrap_s3_client_put_object_uses_size_for_seekable_body(
+def test_collector_lazily_initializes_streamer_from_worker_env(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
+
+    appended: list[dict[str, object]] = []
+
+    class _FakeStreamer:
+        def __init__(
+            self,
+            *,
+            session_id: str,
+            token: str,
+            glaas_url: str,
+            sequence_start: int = 0,
+            sequence_step: int = 1,
+        ) -> None:
+            self.session_id = session_id
+            self.token = token
+            self.glaas_url = glaas_url
+            self.sequence_start = sequence_start
+            self.sequence_step = sequence_step
+
+        def append_fragment(self, fragment_dict: dict[str, object]) -> None:
+            appended.append(fragment_dict)
+
+        def flush(self) -> bool:
+            return True
+
+        def close(self) -> None:
+            return None
+
+    monkeypatch.delenv("ROAR_SESSION_ID", raising=False)
+    monkeypatch.delenv("ROAR_FRAGMENT_TOKEN", raising=False)
+    monkeypatch.delenv("GLAAS_URL", raising=False)
+    monkeypatch.delenv("GLAAS_API_URL", raising=False)
+    monkeypatch.setattr(roar_worker, "GlaasFragmentStreamer", _FakeStreamer)
+    monkeypatch.setattr(roar_worker, "_FLUSH_INTERVAL_SECONDS", 60.0)
+    monkeypatch.setattr(roar_worker, "_IDLE_FLUSH_INTERVAL_SECONDS", 0.05)
+    monkeypatch.setattr(roar_worker, "_FLUSH_THRESHOLD_EVENTS", 200)
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-lazy")
+
+    roar_worker._start_collector()
+
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-lazy")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "ef" * 32)
+    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
+    roar_worker._log_write(
+        path="/tmp/lazy.bin",
+        hash_value="cafebabe",
+        hash_algorithm="blake3",
+        size=4,
+        capture_method="python",
     )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
 
-    class _FakeS3Client:
-        @staticmethod
-        def put_object(*args, **kwargs):
-            del args, kwargs
-            return {"ETag": '"etag-value-123"'}
+    deadline = time.monotonic() + 1.0
+    while time.monotonic() < deadline and not appended:
+        time.sleep(0.01)
 
-    body = io.BytesIO(b"hello world")
-    wrapped = roar_worker._wrap_s3_client(_FakeS3Client())
-    wrapped.put_object(Bucket="demo-bucket", Key="path/to/object.bin", Body=body)
+    roar_worker._shutdown_collector()
 
-    assert len(fragment.writes) == 1
-    assert fragment.writes[0].size == 11
-    assert body.tell() == 0
+    assert appended, "collector did not initialize streamer after env vars appeared"
+    fragment = appended[0]
+    assert fragment["ray_task_id"] == "task-lazy"
+    assert fragment["writes"][0]["path"] == "/tmp/lazy.bin"
 
 
-def test_wrap_s3_client_upload_file_uses_local_file_size(
+def test_log_write_eagerly_flushes_local_event_when_streamer_is_configured(
     monkeypatch: pytest.MonkeyPatch,
-    tmp_path,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
+
+    flushed: list[dict[str, object]] = []
+
+    class _FakeStreamer:
+        def append_fragment(self, fragment_dict: dict[str, object]) -> None:
+            flushed.append(fragment_dict)
+
+        def flush(self) -> bool:
+            return True
+
+    monkeypatch.setattr(roar_worker, "_ensure_direct_streamer", lambda: _FakeStreamer())
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-eager")
+    monkeypatch.setenv("ROAR_JOB_ID", "job-eager")
+
+    roar_worker._log_write(
+        path="/tmp/eager.bin",
+        hash_value="facefeed",
+        hash_algorithm="blake3",
+        size=12,
+        capture_method="python",
     )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
 
-    payload = b"upload-file-payload"
-    local_file = tmp_path / "upload.bin"
-    local_file.write_bytes(payload)
+    assert flushed, "local event was not eagerly flushed"
+    fragment = flushed[0]
+    assert fragment["ray_task_id"] == "task-eager"
+    assert fragment["writes"][0]["path"] == "/tmp/eager.bin"
+
 
-    class _FakeS3Client:
-        @staticmethod
-        def upload_file(*args, **kwargs):
-            del args, kwargs
+def test_collector_flushes_native_entries_with_current_task_id(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import roar.ray.roar_worker as roar_worker
+    from roar.ray.fragment import ArtifactRef
+
+    appended: list[dict[str, object]] = []
+
+    class _FakeStreamer:
+        def __init__(
+            self,
+            *,
+            session_id: str,
+            token: str,
+            glaas_url: str,
+            sequence_start: int = 0,
+            sequence_step: int = 1,
+        ) -> None:
+            self.session_id = session_id
+            self.token = token
+            self.glaas_url = glaas_url
+            self.sequence_start = sequence_start
+            self.sequence_step = sequence_step
+
+        def append_fragment(self, fragment_dict: dict[str, object]) -> None:
+            appended.append(fragment_dict)
+
+        def flush(self) -> bool:
+            return True
+
+        def close(self) -> None:
+            return None
+
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-native")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "cd" * 32)
+    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
+    monkeypatch.setattr(roar_worker, "GlaasFragmentStreamer", _FakeStreamer)
+    monkeypatch.setattr(roar_worker, "_FLUSH_INTERVAL_SECONDS", 60.0)
+    monkeypatch.setattr(roar_worker, "_IDLE_FLUSH_INTERVAL_SECONDS", 0.05)
+    monkeypatch.setattr(roar_worker, "_FLUSH_THRESHOLD_EVENTS", 200)
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-native")
+
+    roar_worker._start_collector()
+    with roar_worker._native_lock:
+        roar_worker._native_events_buffer.append(
+            (
+                "",
+                "write",
+                ArtifactRef(
+                    path="/tmp/native-output.bin",
+                    hash=None,
+                    hash_algorithm="",
+                    size=0,
+                    capture_method="native",
+                ),
+            )
+        )
+
+    deadline = time.monotonic() + 1.0
+    while time.monotonic() < deadline and not appended:
+        time.sleep(0.01)
+
+    roar_worker._shutdown_collector()
+
+    assert appended, "collector did not flush native entries after going idle"
+    fragment = appended[0]
+    assert fragment["ray_task_id"] == "task-native"
+    assert fragment["writes"][0]["path"] == "/tmp/native-output.bin"
+    assert fragment["writes"][0]["capture_method"] == "native"
+
+
+def test_collector_prefers_bound_task_id_for_native_entries(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import roar.ray.roar_worker as roar_worker
+    from roar.ray.fragment import ArtifactRef
+
+    appended: list[dict[str, object]] = []
+
+    class _FakeStreamer:
+        def __init__(
+            self,
+            *,
+            session_id: str,
+            token: str,
+            glaas_url: str,
+            sequence_start: int = 0,
+            sequence_step: int = 1,
+        ) -> None:
+            del session_id, token, glaas_url, sequence_start, sequence_step
+
+        def append_fragment(self, fragment_dict: dict[str, object]) -> None:
+            appended.append(fragment_dict)
+
+        def flush(self) -> bool:
+            return True
+
+        def close(self) -> None:
             return None
 
-    wrapped = roar_worker._wrap_s3_client(_FakeS3Client())
-    wrapped.upload_file(str(local_file), "demo-bucket", "path/to/object.bin")
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-bound-native")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "aa" * 32)
+    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
+    monkeypatch.setattr(roar_worker, "GlaasFragmentStreamer", _FakeStreamer)
+    monkeypatch.setattr(roar_worker, "_FLUSH_INTERVAL_SECONDS", 60.0)
+    monkeypatch.setattr(roar_worker, "_IDLE_FLUSH_INTERVAL_SECONDS", 0.05)
+    monkeypatch.setattr(roar_worker, "_FLUSH_THRESHOLD_EVENTS", 200)
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-current")
+
+    roar_worker._start_collector()
+    with roar_worker._native_lock:
+        roar_worker._native_events_buffer.append(
+            (
+                "task-launch",
+                "write",
+                ArtifactRef(
+                    path="/tmp/bound-native-output.bin",
+                    hash=None,
+                    hash_algorithm="",
+                    size=0,
+                    capture_method="native",
+                ),
+            )
+        )
+
+    deadline = time.monotonic() + 1.0
+    while time.monotonic() < deadline and not appended:
+        time.sleep(0.01)
+
+    roar_worker._shutdown_collector()
 
-    assert len(fragment.writes) == 1
-    write_ref = fragment.writes[0]
-    assert write_ref.path == "s3://demo-bucket/path/to/object.bin"
-    assert write_ref.size == len(payload)
-    assert write_ref.capture_method == "proxy"
+    assert appended, "collector did not flush bound native entries after going idle"
+    fragment = appended[0]
+    assert fragment["ray_task_id"] == "task-launch"
+    assert fragment["writes"][0]["path"] == "/tmp/bound-native-output.bin"
 
 
-def test_wrap_s3_client_logs_etag_on_get_object(monkeypatch: pytest.MonkeyPatch) -> None:
+def test_parse_and_buffer_frames_binds_registered_child_pid_to_launch_task(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import msgpack
+
     import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
+
+    monkeypatch.setattr(roar_worker, "_should_track_local_path", lambda _path: True)
+    roar_worker._register_native_child_pid(43210, "task-launch")
+
+    payload = msgpack.packb(
+        {
+            "kind": "write",
+            "pid": 43210,
+            "path": "/tmp/child-native-output.bin",
+        },
+        use_bin_type=True,
     )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_check_task_boundary", lambda: None)
-
-    class _FakeS3Client:
-        @staticmethod
-        def get_object(*args, **kwargs):
-            del args, kwargs
-            return {
-                "ETag": '"etag-read-123"',
-                "ContentLength": 4,
-                "Body": io.BytesIO(b"data"),
-            }
-
-    wrapped = roar_worker._wrap_s3_client(_FakeS3Client())
-    wrapped.get_object(Bucket="demo-bucket", Key="path/to/object.bin")
-
-    assert len(fragment.reads) == 1
-    read_ref = fragment.reads[0]
-    assert read_ref.path == "s3://demo-bucket/path/to/object.bin"
-    assert read_ref.hash_algorithm == "etag"
-    assert read_ref.hash == "etag-read-123"
-    assert read_ref.capture_method == "proxy"
+    buf = bytearray(len(payload).to_bytes(4, "little") + payload)
 
+    roar_worker._parse_and_buffer_frames(buf)
+
+    entries = roar_worker._drain_native_tracer_events()
+    assert len(entries) == 1
+    task_id, kind, ref = entries[0]
+    assert task_id == "task-launch"
+    assert kind == "write"
+    assert ref.path == "/tmp/child-native-output.bin"
+
+
+def test_parse_and_buffer_frames_binds_same_process_thread_to_task(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import msgpack
 
-def test_start_fragment_uses_task_function_name(monkeypatch: pytest.MonkeyPatch) -> None:
     import roar.ray.roar_worker as roar_worker
 
-    monkeypatch.setenv("ROAR_JOB_ID", "job-abc")
-    monkeypatch.setattr(roar_worker, "_get_task_function_name", lambda: "ingest_shard")
+    monkeypatch.setattr(roar_worker, "_should_track_local_path", lambda _path: True)
+    roar_worker._register_native_thread_task(54321, "task-thread")
+
+    payload = msgpack.packb(
+        {
+            "kind": "write",
+            "pid": roar_worker.os.getpid(),
+            "thread_id": 54321,
+            "path": "/tmp/thread-native-output.bin",
+        },
+        use_bin_type=True,
+    )
+    buf = bytearray(len(payload).to_bytes(4, "little") + payload)
 
-    fragment = roar_worker._start_fragment("task-xyz")
-    assert fragment.function_name == "ingest_shard"
+    roar_worker._parse_and_buffer_frames(buf)
 
+    entries = roar_worker._drain_native_tracer_events()
+    assert len(entries) == 1
+    task_id, kind, ref = entries[0]
+    assert task_id == "task-thread"
+    assert kind == "write"
+    assert ref.path == "/tmp/thread-native-output.bin"
 
-def test_actor_attribution_per_call_uses_task_boundaries(
+
+def test_wrap_task_executor_keeps_current_thread_bound_until_reused(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
 
-    task_ids = iter(["task-1", "task-2"])
-    monkeypatch.setattr(roar_worker, "_get_task_id", lambda: next(task_ids))
-    monkeypatch.setattr(roar_worker, "_get_actor_id", lambda: "actor-1")
-    monkeypatch.setattr(roar_worker, "_actor_attribution_mode", "per_call")
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-thread")
+    monkeypatch.setattr(roar_worker.threading, "get_native_id", lambda: 321)
+    monkeypatch.setattr(roar_worker, "_flush_current_task_native_events_immediately", lambda: None)
+
+    observed: list[str] = []
+
+    def _task() -> None:
+        observed.append(roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 321))
+
+    wrapped = roar_worker._wrap_task_executor_for_native_flush(_task)
+    wrapped()
 
-    started: list[str] = []
-    finalised: list[str] = []
+    assert observed == ["task-thread"]
+    assert (
+        roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 321) == "task-thread"
+    )
+
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: "task-next")
+    wrapped()
+    assert roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 321) == "task-next"
+
+
+def test_patch_threading_binds_spawned_thread_to_launching_task(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import roar.ray.roar_worker as roar_worker
+
+    observed: dict[str, str] = {}
+
+    roar_worker._native_task_launch_context.task_id = "task-background"
+    roar_worker._activate_threading_patch_for_native_task_attribution()
+
+    def _target() -> None:
+        thread_id = roar_worker.threading.get_native_id()
+        observed["thread_id"] = str(thread_id)
+        observed["task_id"] = roar_worker._bound_native_task_id_for_event(
+            roar_worker.os.getpid(),
+            thread_id,
+        )
 
-    def _start_fragment(boundary_id: str):
-        started.append(boundary_id)
-        return {"boundary_id": boundary_id}
+    thread = roar_worker.threading.Thread(target=_target, name="background")
+    thread.start()
+    thread.join(timeout=5)
+    roar_worker._deactivate_threading_patch_for_native_task_attribution()
+
+    assert not thread.is_alive()
+    assert observed["task_id"] == "task-background"
+    assert (
+        roar_worker._bound_native_task_id_for_event(
+            roar_worker.os.getpid(),
+            int(observed["thread_id"]),
+        )
+        == "task-background"
+    )
 
-    def _finalise_fragment(fragment: dict) -> None:
-        finalised.append(fragment["boundary_id"])
 
-    monkeypatch.setattr(roar_worker, "_start_fragment", _start_fragment)
-    monkeypatch.setattr(roar_worker, "_finalise_fragment", _finalise_fragment)
+def test_recent_thread_binding_expires_after_linger_window(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    import roar.ray.roar_worker as roar_worker
 
-    roar_worker._check_task_boundary()
-    roar_worker._check_task_boundary()
+    now = {"value": 100.0}
+    monkeypatch.setattr(roar_worker.time, "monotonic", lambda: now["value"])
+    monkeypatch.setattr(roar_worker, "_RECENT_NATIVE_THREAD_BINDING_LINGER_SECONDS", 0.5)
 
-    assert started == ["task-1", "task-2"]
-    assert finalised == ["task-1"]
+    roar_worker._register_native_thread_task(4321, "task-thread")
+    roar_worker._unregister_native_thread_task(4321, "task-thread")
 
+    assert (
+        roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 4321) == "task-thread"
+    )
 
-def test_actor_attribution_per_actor_groups_calls_under_actor(
+    now["value"] = 100.6
+    assert roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 4321) == ""
+
+
+def test_patch_wraps_actor_method_inside_executor_task_context(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
 
-    task_ids = iter(["task-1", "task-2"])
-    monkeypatch.setattr(roar_worker, "_get_task_id", lambda: next(task_ids))
-    monkeypatch.setattr(roar_worker, "_get_actor_id", lambda: "actor-1")
-    monkeypatch.setattr(roar_worker, "_actor_attribution_mode", "per_actor")
+    task_state = {"task_id": ""}
+    monkeypatch.setattr(roar_worker, "_get_current_task_id", lambda: task_state["task_id"])
+    monkeypatch.setattr(roar_worker.threading, "get_native_id", lambda: 321)
+    monkeypatch.setattr(roar_worker, "_flush_current_task_native_events_immediately", lambda: None)
+
+    function_execution_info = namedtuple(
+        "FunctionExecutionInfo",
+        ["function", "function_name", "max_calls"],
+    )
+
+    class _FakeFunctionActorManager:
+        def __init__(self) -> None:
+            self._function_execution_info: dict[str, object] = {}
 
-    started: list[str] = []
-    finalised: list[str] = []
+        def get_execution_info(self, job_id, function_descriptor):
+            del job_id, function_descriptor
+            return function_execution_info(function=lambda: None, function_name="fn", max_calls=0)
 
-    def _start_fragment(boundary_id: str):
-        started.append(boundary_id)
-        return {"boundary_id": boundary_id}
+        def _make_actor_method_executor(self, method_name, method):
+            def executor(actor, *args, **kwargs):
+                task_state["task_id"] = "task-actor"
+                try:
+                    return method(actor, *args, **kwargs)
+                finally:
+                    task_state["task_id"] = ""
 
-    def _finalise_fragment(fragment: dict) -> None:
-        finalised.append(fragment["boundary_id"])
+            return executor
 
-    monkeypatch.setattr(roar_worker, "_start_fragment", _start_fragment)
-    monkeypatch.setattr(roar_worker, "_finalise_fragment", _finalise_fragment)
+    fake_module = types.ModuleType("ray._private.function_manager")
+    fake_module.FunctionActorManager = _FakeFunctionActorManager
+    fake_module.FunctionExecutionInfo = function_execution_info
+    monkeypatch.setitem(sys.modules, "ray._private.function_manager", fake_module)
 
-    roar_worker._check_task_boundary()
-    roar_worker._check_task_boundary()
+    roar_worker._patch_ray_task_execution_for_native_flush()
 
-    assert started == ["actor-1"]
-    assert finalised == []
+    manager = _FakeFunctionActorManager()
+
+    def _actor_method(_actor) -> str:
+        return roar_worker._bound_native_task_id_for_event(roar_worker.os.getpid(), 321)
+
+    executor = manager._make_actor_method_executor("write", _actor_method)
+    assert executor(object()) == "task-actor"
 
 
 def test_main_calls_startup_and_runs_worker_entrypoint(
@@ -436,14 +769,22 @@ def test_main_calls_startup_and_runs_worker_entrypoint(
 
 def test_run_worker_entrypoint_execs_python_for_non_script(
     monkeypatch: pytest.MonkeyPatch,
+    tmp_path: Path,
 ) -> None:
     import roar.ray.roar_worker as roar_worker
 
     captured: dict[str, object] = {}
+    preload_library = tmp_path / "libroar_tracer_preload.so"
+    preload_library.write_text("preload", encoding="utf-8")
+    monkeypatch.chdir(tmp_path)
+    monkeypatch.delenv("LD_PRELOAD", raising=False)
+    monkeypatch.delenv("ROAR_PRELOAD_TRACE_SOCK", raising=False)
 
     def _fake_execvp(program: str, argv: list[str]) -> None:
         captured["program"] = program
         captured["argv"] = argv
+        captured["ld_preload"] = roar_worker.os.environ.get("LD_PRELOAD", "")
+        captured["trace_sock"] = roar_worker.os.environ.get("ROAR_PRELOAD_TRACE_SOCK", "")
         raise SystemExit(0)
 
     monkeypatch.setattr(roar_worker.os, "execvp", _fake_execvp)
@@ -453,6 +794,8 @@ def _fake_execvp(program: str, argv: list[str]) -> None:
 
     assert captured["program"] == "python3"
     assert captured["argv"] == ["python3", "-u", "worker.py"]
+    assert str(preload_library.resolve()) in str(captured["ld_preload"])
+    assert str(captured["trace_sock"]).endswith("/trace.sock")
 
 
 def test_run_worker_entrypoint_execs_python_for_worker_script(
@@ -476,76 +819,3 @@ def _fake_execvp(program: str, argv: list[str]) -> None:
 
     assert captured["program"] == "python3"
     assert captured["argv"] == ["python3", "/tmp/default_worker.py", "--worker-type", "RAY_WORKER"]
-
-
-def test_finalise_fragment_emits_fragment_to_collector_actor(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    emitted: list[dict] = []
-
-    class _AppendFragment:
-        @staticmethod
-        def remote(payload: dict) -> None:
-            emitted.append(payload)
-
-    class _FakeActor:
-        append_fragment = _AppendFragment()
-
-    fragment = TaskFragment(
-        job_uid="abcd1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
-    )
-    monkeypatch.setattr(roar_worker, "_collector_actor", _FakeActor())
-    monkeypatch.setattr(roar_worker.time, "time", lambda: 10.0)
-
-    roar_worker._finalise_fragment(fragment)
-
-    assert emitted
-    assert emitted[0]["job_uid"] == "abcd1234"
-    assert emitted[0]["ended_at"] == 10.0
-
-
-def test_flush_current_fragment_finalises_last_task_fragment(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    import roar.ray.roar_worker as roar_worker
-    from roar.ray.fragment import TaskFragment
-
-    fragment = TaskFragment(
-        job_uid="flush1234",
-        parent_job_uid="parent123",
-        ray_task_id="task-9",
-        ray_worker_id="worker-1",
-        ray_node_id="node-1",
-        ray_actor_id=None,
-        function_name="train",
-        started_at=1.0,
-        ended_at=1.0,
-        exit_code=0,
-    )
-    monkeypatch.setattr(roar_worker, "_current_fragment", fragment)
-    monkeypatch.setattr(roar_worker, "_current_task_id", "task-9")
-
-    finalised: list[str] = []
-    monkeypatch.setattr(
-        roar_worker,
-        "_finalise_fragment",
-        lambda value: finalised.append(value.job_uid),
-    )
-
-    roar_worker._flush_current_fragment()
-
-    assert finalised == ["flush1234"]
-    assert roar_worker._current_fragment is None
-    assert roar_worker._current_task_id is None
diff --git a/tests/unit/ray/test_runtime_env_pip_dedup.py b/tests/unit/ray/test_runtime_env_pip_dedup.py
new file mode 100644
index 00000000..00a4582d
--- /dev/null
+++ b/tests/unit/ray/test_runtime_env_pip_dedup.py
@@ -0,0 +1,100 @@
+"""Unit tests: _merge_roar_runtime_env_pip() must not produce duplicate pip entries.
+
+BUG: when ROAR_CLUSTER_PIP_REQ is a URL-based requirement (e.g. a presigned S3 URL),
+_requirement_name() returns the full URL rather than a canonical package name like
+'roar-cli'. The deduplication filter checks for names in {'roar-cli', 'roar'} — which
+never matches a URL — so the URL survives the filter AND gets appended again.
+
+These tests drive the fix directly via the internal function.
+"""
+
+from __future__ import annotations
+
+import pytest
+
+from roar.cli.commands._ray_job_submit import _merge_roar_runtime_env_pip
+
+FAKE_WHEEL_URL = (
+    "https://example.com/wheels/roar_cli-0.2.12-cp312-cp312-linux_x86_64.whl"
+    "?X-Amz-Signature=deadbeef"
+)
+
+
+class TestMergeRoarRuntimeEnvPipDedup:
+    """_merge_roar_runtime_env_pip must deduplicate URL-based wheel requirements."""
+
+    def test_url_req_not_duplicated_when_already_in_pip(
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
+        """If pip already contains the ROAR_CLUSTER_PIP_REQ URL, it must not appear twice."""
+        monkeypatch.setenv("ROAR_CLUSTER_PIP_REQ", FAKE_WHEEL_URL)
+
+        # Simulate a runtime_env that already has the wheel URL (e.g. user passed it in).
+        existing_pip = [FAKE_WHEEL_URL]
+        result = _merge_roar_runtime_env_pip(existing_pip)
+
+        assert result is not None
+        wheel_entries = [r for r in result if "roar_cli" in r or "roar-cli" in r]
+        assert len(wheel_entries) == 1, (
+            f"Expected exactly 1 roar wheel entry, got {len(wheel_entries)}: {wheel_entries}\n"
+            f"Full result: {result}"
+        )
+
+    def test_url_req_appears_exactly_once_with_empty_pip(
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
+        """ROAR_CLUSTER_PIP_REQ URL should appear exactly once when pip starts empty."""
+        monkeypatch.setenv("ROAR_CLUSTER_PIP_REQ", FAKE_WHEEL_URL)
+
+        result = _merge_roar_runtime_env_pip(None)
+
+        assert result is not None
+        assert result.count(FAKE_WHEEL_URL) == 1, (
+            f"Expected URL to appear once, got {result.count(FAKE_WHEEL_URL)}: {result}"
+        )
+
+    def test_url_req_replaces_existing_roar_cli_version_pin(
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
+        """A URL override should replace an existing roar-cli==x.y.z pin, not join it."""
+        monkeypatch.setenv("ROAR_CLUSTER_PIP_REQ", FAKE_WHEEL_URL)
+
+        existing_pip = ["roar-cli==0.2.11", "numpy>=1.0"]
+        result = _merge_roar_runtime_env_pip(existing_pip)
+
+        assert result is not None
+        roar_entries = [r for r in result if "roar" in r.lower()]
+        assert len(roar_entries) == 1, (
+            f"Expected exactly 1 roar entry after replacing version pin, "
+            f"got {roar_entries}: {result}"
+        )
+        assert roar_entries[0] == FAKE_WHEEL_URL
+
+    def test_skip_produces_no_roar_entry(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """ROAR_CLUSTER_PIP_REQ=skip means workers have roar pre-installed — omit from pip."""
+        monkeypatch.setenv("ROAR_CLUSTER_PIP_REQ", "skip")
+
+        result = _merge_roar_runtime_env_pip(["numpy>=1.0"])
+
+        # Either None or a list with no roar entry.
+        if result is not None:
+            roar_entries = [r for r in result if "roar" in r.lower()]
+            assert not roar_entries, f"Expected no roar entry with skip, got: {roar_entries}"
+
+    def test_no_duplicates_with_other_packages(self, monkeypatch: pytest.MonkeyPatch) -> None:
+        """Other packages must be preserved and not duplicated."""
+        monkeypatch.setenv("ROAR_CLUSTER_PIP_REQ", FAKE_WHEEL_URL)
+
+        existing_pip = ["numpy>=1.0", "pandas==2.0", FAKE_WHEEL_URL]
+        result = _merge_roar_runtime_env_pip(existing_pip)
+
+        assert result is not None
+        # No duplicates at all.
+        seen: set[str] = set()
+        for entry in result:
+            assert entry not in seen, f"Duplicate entry in pip: {entry!r}\nFull list: {result}"
+            seen.add(entry)
+
+        # numpy and pandas preserved.
+        assert any("numpy" in r for r in result), f"numpy missing from {result}"
+        assert any("pandas" in r for r in result), f"pandas missing from {result}"
diff --git a/tests/unit/ray/test_sitecustomize_runtime_env.py b/tests/unit/ray/test_sitecustomize_runtime_env.py
index 1c0bdca1..f6ff6ee7 100644
--- a/tests/unit/ray/test_sitecustomize_runtime_env.py
+++ b/tests/unit/ray/test_sitecustomize_runtime_env.py
@@ -3,7 +3,7 @@
 import builtins
 import sys
 from pathlib import Path
-from types import ModuleType, SimpleNamespace
+from types import SimpleNamespace
 from unittest.mock import MagicMock
 
 import pytest
@@ -48,10 +48,8 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     monkeypatch.chdir(tmp_path)
 
-    monkeypatch.setenv("ROAR_LOG_DIR", "/tmp/roar-ray")
     monkeypatch.delenv("ROAR_JOB_ID", raising=False)
     import importlib.metadata as importlib_metadata
 
@@ -63,7 +61,7 @@ def fake_ray_init(*_args, **kwargs):
     assert result == "ok"
     runtime_env = calls[-1]["runtime_env"]
     assert "pip" not in runtime_env
-    assert runtime_env["env_vars"]["ROAR_LOG_BACKEND"] == "actor"
+    assert {"USER_KEY", "ROAR_JOB_ID", "ROAR_DRIVER_JOB_UID"}.issubset(runtime_env["env_vars"])
     assert runtime_env["env_vars"]["ROAR_JOB_ID"]
 
 
@@ -77,7 +75,6 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     config_path = tmp_path / ".roar" / "config.toml"
     config_path.parent.mkdir(parents=True)
     config_path.write_text("""
@@ -93,7 +90,7 @@ def fake_ray_init(*_args, **kwargs):
     assert runtime_env == {"env_vars": {"USER_KEY": "value"}}
 
 
-def test_patch_ray_init_honors_ray_config_log_dir_and_pip_toggle(
+def test_patch_ray_init_honors_ray_config_pip_toggle(
     monkeypatch: pytest.MonkeyPatch, tmp_path
 ) -> None:
     calls: list[dict] = []
@@ -103,13 +100,11 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     config_path = tmp_path / ".roar" / "config.toml"
     config_path.parent.mkdir(parents=True)
     config_path.write_text("""
 [ray]
 pip_install = false
-log_dir = "/tmp/roar-ray-config"
 """)
     monkeypatch.chdir(tmp_path)
 
@@ -118,7 +113,6 @@ def fake_ray_init(*_args, **kwargs):
 
     runtime_env = calls[-1]["runtime_env"]
     assert "pip" not in runtime_env
-    assert runtime_env["env_vars"]["ROAR_LOG_DIR"] == "/tmp/roar-ray-config"
 
 
 def test_patch_ray_shutdown_collects_before_shutdown(monkeypatch: pytest.MonkeyPatch) -> None:
@@ -140,120 +134,23 @@ def fake_shutdown(*_args, **_kwargs):
     assert call_order == ["collect", "shutdown"]
 
 
-def test_ensure_collector_actor_creates_named_actor_when_missing(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    class _FakeRay:
-        def __init__(self) -> None:
-            self.get_actor_calls: list[tuple[str, str | None]] = []
-            self.get_calls: list[tuple[object, int | None]] = []
-
-        def get_actor(self, name: str, namespace: str | None = None):
-            self.get_actor_calls.append((name, namespace))
-            raise ValueError("missing")
-
-        def get(self, value, timeout: int | None = None):
-            self.get_calls.append((value, timeout))
-            return value
-
-    created: dict[str, object] = {}
-
-    class _FakeCollectorActor:
-        class _FakeRemoteMethod:
-            @staticmethod
-            def remote():
-                return "ready"
-
-        @classmethod
-        def options(cls, **kwargs):
-            created["options"] = kwargs
-            return SimpleNamespace(
-                remote=lambda: SimpleNamespace(get_all=cls._FakeRemoteMethod()),
-            )
-
-    fake_actor_module = ModuleType("roar.ray.actor")
-    fake_actor_module.RoarLogCollectorActor = _FakeCollectorActor
-    monkeypatch.setitem(
-        sys.modules,
-        "roar.ray.actor",
-        fake_actor_module,
-    )
-
-    fake_ray = _FakeRay()
-    sitecustomize._ensure_collector_actor(fake_ray, "job1234")
-
-    assert fake_ray.get_actor_calls == [("roar-log-collector-job1234", "roar")]
-    assert created["options"] == {
-        "name": "roar-log-collector-job1234",
-        "namespace": "roar",
-        "lifetime": "detached",
-        "num_cpus": 0,
-    }
-    assert fake_ray.get_calls == [("ready", 10)]
-
+def test_shutdown_ray_at_exit_prefers_ray_shutdown(monkeypatch: pytest.MonkeyPatch) -> None:
+    call_order: list[str] = []
 
-def test_ensure_collector_actor_passes_fragment_streaming_args_when_env_present(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    class _FakeRay:
-        def __init__(self) -> None:
-            self.get_actor_calls: list[tuple[str, str | None]] = []
-            self.get_calls: list[tuple[object, int | None]] = []
-
-        def get_actor(self, name: str, namespace: str | None = None):
-            self.get_actor_calls.append((name, namespace))
-            raise ValueError("missing")
-
-        def get(self, value, timeout: int | None = None):
-            self.get_calls.append((value, timeout))
-            return value
-
-    created: dict[str, object] = {}
-    remote_kwargs: dict[str, object] = {}
-
-    class _FakeCollectorActor:
-        class _FakeRemoteMethod:
-            @staticmethod
-            def remote():
-                return "ready"
-
-        @classmethod
-        def options(cls, **kwargs):
-            created["options"] = kwargs
-
-            def _remote(**actor_kwargs):
-                remote_kwargs.update(actor_kwargs)
-                return SimpleNamespace(get_all=cls._FakeRemoteMethod())
-
-            return SimpleNamespace(remote=_remote)
-
-    fake_actor_module = ModuleType("roar.ray.actor")
-    fake_actor_module.RoarLogCollectorActor = _FakeCollectorActor
-    monkeypatch.setitem(
-        sys.modules,
-        "roar.ray.actor",
-        fake_actor_module,
+    fake_ray = SimpleNamespace(
+        is_initialized=lambda: True,
+        shutdown=lambda: call_order.append("shutdown"),
+    )
+    monkeypatch.setitem(sys.modules, "ray", fake_ray)
+    monkeypatch.setattr(
+        sitecustomize,
+        "_collect_ray_io",
+        lambda *args, **kwargs: call_order.append("collect"),
     )
 
-    monkeypatch.setenv("ROAR_SESSION_ID", "sess-123")
-    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "ab" * 32)
-    monkeypatch.setenv("GLAAS_URL", "http://localhost:3001")
-
-    fake_ray = _FakeRay()
-    sitecustomize._ensure_collector_actor(fake_ray, "job1234")
+    sitecustomize._shutdown_ray_at_exit()
 
-    assert created["options"] == {
-        "name": "roar-log-collector-job1234",
-        "namespace": "roar",
-        "lifetime": "detached",
-        "num_cpus": 0,
-    }
-    assert remote_kwargs == {
-        "session_id": "sess-123",
-        "token": "ab" * 32,
-        "glaas_url": "http://localhost:3001",
-    }
-    assert fake_ray.get_calls == [("ready", 10)]
+    assert call_order == ["shutdown"]
 
 
 def test_patch_ray_init_starts_node_agent_spawn_in_background_thread(
@@ -266,12 +163,11 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     monkeypatch.setenv("ROAR_RAY_NODE_AGENTS", "1")
     monkeypatch.setattr(
         sitecustomize,
         "_load_ray_config",
-        lambda: {"enabled": True, "pip_install": False, "log_dir": "/tmp/roar-ray"},
+        lambda: {"enabled": True, "pip_install": False},
     )
     monkeypatch.setattr(sitecustomize, "_start_ray_node_poller", lambda *_args, **_kwargs: None)
     monkeypatch.setattr(
@@ -309,11 +205,10 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     monkeypatch.setattr(
         sitecustomize,
         "_load_ray_config",
-        lambda: {"enabled": True, "pip_install": False, "log_dir": "/tmp/roar-ray"},
+        lambda: {"enabled": True, "pip_install": False},
     )
     spawn_node_agents = MagicMock()
     start_node_poller = MagicMock()
@@ -414,6 +309,7 @@ def fake_copy2(_src, dst):
 def test_prepare_worker_runtime_env_warns_when_working_dir_is_not_local(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
+    non_local_working_dir = "s3://bucket/path"
     warnings: list[str] = []
     monkeypatch.setattr(
         sitecustomize,
@@ -426,14 +322,34 @@ def test_prepare_worker_runtime_env_warns_when_working_dir_is_not_local(
     )
 
     prepared = sitecustomize._prepare_worker_runtime_env(
-        {"working_dir": "s3://bucket/path"},
+        {"working_dir": non_local_working_dir},
         "job5678",
     )
 
     assert warnings
     assert prepared["py_executable"] == "roar-worker"
     assert prepared["worker_process_setup_hook"] == "roar.ray.roar_worker._startup"
-    assert Path(str(prepared["working_dir"])).exists()
+    assert prepared["working_dir"] == non_local_working_dir
+
+
+def test_prepare_worker_runtime_env_preserves_non_local_job_working_dir(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    job_working_dir = "s3://bucket/cloud-demo"
+    monkeypatch.setattr(
+        "roar.services.execution.tracer_backends.find_preload_library",
+        lambda _package_path: None,
+    )
+
+    prepared = sitecustomize._prepare_worker_runtime_env(
+        {"working_dir": job_working_dir},
+        "jobraywd",
+    )
+
+    assert prepared["working_dir"] == job_working_dir, (
+        "Expected _prepare_worker_runtime_env() to preserve the job-provided non-local "
+        "working_dir so Ray workers keep access to modules from ray job submit packaging."
+    )
 
 
 def test_prepare_worker_runtime_env_bundles_roar_worker_hook(
@@ -447,7 +363,7 @@ def test_prepare_worker_runtime_env_bundles_roar_worker_hook(
     prepared = sitecustomize._prepare_worker_runtime_env({}, "job9999")
     working_dir = Path(str(prepared["working_dir"]))
 
-    assert (working_dir / "roar" / "ray" / "worker.py").exists()
+    assert (working_dir / "roar" / "ray" / "roar_worker.py").exists()
 
 
 def test_prepare_worker_runtime_env_uses_roar_worker_entrypoint(
@@ -494,6 +410,35 @@ def test_prepare_worker_runtime_env_ignores_existing_worker_setup_hook(
     assert prepared["worker_process_setup_hook"] == "roar.ray.roar_worker._startup"
 
 
+def test_patch_ray_init_propagates_fragment_streaming_env_to_workers(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    calls: list[dict] = []
+
+    def fake_ray_init(*_args, **kwargs):
+        calls.append(kwargs)
+        return "ok"
+
+    fake_ray = SimpleNamespace(init=fake_ray_init)
+    monkeypatch.setattr(
+        sitecustomize,
+        "_load_ray_config",
+        lambda: {"enabled": True, "pip_install": False},
+    )
+    monkeypatch.setenv("ROAR_CLUSTER_GLAAS_URL", "http://host.docker.internal:3001")
+    monkeypatch.setenv("ROAR_SESSION_ID", "session-123")
+    monkeypatch.setenv("ROAR_FRAGMENT_TOKEN", "ab" * 32)
+    monkeypatch.setenv("ROAR_JOB_ID", "driver-uid-1234")
+
+    sitecustomize._patch_ray_init(fake_ray)
+    fake_ray.init(runtime_env={})
+
+    runtime_env = calls[-1]["runtime_env"]
+    assert runtime_env["env_vars"]["GLAAS_URL"] == "http://host.docker.internal:3001"
+    assert runtime_env["env_vars"]["ROAR_SESSION_ID"] == "session-123"
+    assert runtime_env["env_vars"]["ROAR_FRAGMENT_TOKEN"] == "ab" * 32
+
+
 def test_patch_ray_init_sets_driver_job_uid_for_workers(
     monkeypatch: pytest.MonkeyPatch,
 ) -> None:
@@ -504,11 +449,10 @@ def fake_ray_init(*_args, **kwargs):
         return "ok"
 
     fake_ray = SimpleNamespace(init=fake_ray_init)
-    monkeypatch.setattr(sitecustomize, "_ensure_collector_actor", lambda *_args, **_kwargs: None)
     monkeypatch.setattr(
         sitecustomize,
         "_load_ray_config",
-        lambda: {"enabled": True, "pip_install": False, "log_dir": "/tmp/roar-ray"},
+        lambda: {"enabled": True, "pip_install": False},
     )
     monkeypatch.setenv("ROAR_JOB_ID", "driver-uid-1234")
 
diff --git a/tests/unit/ray/test_sitecustomize_runtime_env_conflict.py b/tests/unit/ray/test_sitecustomize_runtime_env_conflict.py
new file mode 100644
index 00000000..9eb5791e
--- /dev/null
+++ b/tests/unit/ray/test_sitecustomize_runtime_env_conflict.py
@@ -0,0 +1,97 @@
+from __future__ import annotations
+
+from types import SimpleNamespace
+
+import pytest
+
+from roar.services.execution.inject import sitecustomize
+
+
+def test_patch_ray_init_conflicts_inside_preinstrumented_job(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    job_runtime_env = {
+        "pip": ["roar-cli==0.0.1"],
+        "working_dir": "/tmp/job-level-working-dir",
+        "env_vars": {"USER_KEY": "value"},
+    }
+    captured_runtime_env: dict[str, object] = {}
+
+    def fake_prepare_worker_runtime_env(_runtime_env: dict, _job_id: str) -> dict:
+        raise AssertionError("_prepare_worker_runtime_env should be skipped in instrumented jobs")
+
+    def fake_ray_init(*_args, **kwargs):
+        runtime_env = dict(kwargs.get("runtime_env", {}) or {})
+        captured_runtime_env.update(runtime_env)
+        conflicting_keys = {
+            key
+            for key in ("pip", "working_dir")
+            if key in runtime_env
+            and key in job_runtime_env
+            and runtime_env[key] != job_runtime_env[key]
+        }
+        if conflicting_keys:
+            assert conflicting_keys == {"pip", "working_dir"}
+            raise ValueError(
+                "Failed to merge the Job's runtime env because of a conflict on pip and working_dir"
+            )
+        return "ok"
+
+    fake_ray = SimpleNamespace(init=fake_ray_init)
+
+    monkeypatch.setattr(
+        sitecustomize,
+        "_load_ray_config",
+        lambda: {"enabled": True, "pip_install": True},
+    )
+    monkeypatch.setattr(
+        sitecustomize,
+        "_merge_roar_runtime_env_pip",
+        lambda _existing: ["roar-cli==9.9.9"],
+    )
+    monkeypatch.setattr(
+        sitecustomize, "_prepare_worker_runtime_env", fake_prepare_worker_runtime_env
+    )
+    monkeypatch.setattr(
+        sitecustomize,
+        "_sanitize_worker_runtime_env_for_ray",
+        lambda _ray_module, runtime_env: runtime_env,
+    )
+    monkeypatch.setattr(sitecustomize, "_register_pre_shutdown_ray_collection", lambda: None)
+    monkeypatch.setenv("ROAR_JOB_INSTRUMENTED", "1")
+
+    sitecustomize._patch_ray_init(fake_ray)
+
+    result = fake_ray.init(runtime_env=job_runtime_env)
+    assert result == "ok"
+    assert captured_runtime_env["pip"] == ["roar-cli==0.0.1"]
+    assert captured_runtime_env["working_dir"] == "/tmp/job-level-working-dir"
+    assert captured_runtime_env["py_executable"] == "roar-worker"
+    assert captured_runtime_env["env_vars"]["USER_KEY"] == "value"
+
+
+def test_patch_ray_init_registers_pre_shutdown_collection_for_instrumented_jobs(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    fake_ray = SimpleNamespace(init=lambda *_args, **_kwargs: "ok")
+    register_calls: list[str] = []
+
+    monkeypatch.setattr(
+        sitecustomize,
+        "_load_ray_config",
+        lambda: {"enabled": True, "pip_install": False},
+    )
+    monkeypatch.setattr(
+        sitecustomize,
+        "_register_pre_shutdown_ray_collection",
+        lambda: register_calls.append("called"),
+    )
+    monkeypatch.setenv("ROAR_JOB_INSTRUMENTED", "1")
+    monkeypatch.setenv("ROAR_RAY_NODE_AGENTS", "0")
+
+    sitecustomize._patch_ray_init(fake_ray)
+
+    result = fake_ray.init(runtime_env={})
+
+    assert result == "ok"
+    assert register_calls == ["called"]
diff --git a/tests/unit/ray/test_step_numbers.py b/tests/unit/ray/test_step_numbers.py
index dddb5cc1..526f9ce6 100644
--- a/tests/unit/ray/test_step_numbers.py
+++ b/tests/unit/ray/test_step_numbers.py
@@ -14,6 +14,16 @@ def _artifact(hash_value: str | None) -> ArtifactRef:
     )
 
 
+def _artifact_with_path(path: str, hash_value: str | None = None) -> ArtifactRef:
+    return ArtifactRef(
+        path=path,
+        hash=hash_value,
+        hash_algorithm="blake3" if hash_value else "",
+        size=0,
+        capture_method="python",
+    )
+
+
 def _frag(
     uid: str,
     reads: tuple[str, ...] = (),
@@ -190,3 +200,34 @@ def test_no_regression_independent_same_function_tasks() -> None:
 
     steps = _assign_step_numbers([ingest_0, ingest_1, ingest_2])
     assert steps["ingest_0"] == steps["ingest_1"] == steps["ingest_2"]
+
+
+def test_assign_step_numbers_matches_on_path_when_only_producer_has_hash() -> None:
+    producer = TaskFragment(
+        job_uid="train",
+        parent_job_uid="driver",
+        ray_task_id="train",
+        ray_worker_id="w",
+        ray_node_id="n",
+        ray_actor_id=None,
+        function_name="training",
+        started_at=1.0,
+        ended_at=1.1,
+        exit_code=0,
+        writes=[_artifact_with_path("s3://bucket/model.json", "etag-model")],
+    )
+    consumer = TaskFragment(
+        job_uid="eval",
+        parent_job_uid="driver",
+        ray_task_id="eval",
+        ray_worker_id="w",
+        ray_node_id="n",
+        ray_actor_id=None,
+        function_name="evaluation",
+        started_at=2.0,
+        ended_at=2.1,
+        exit_code=0,
+        reads=[_artifact_with_path("s3://bucket/model.json")],
+    )
+
+    assert _assign_step_numbers([producer, consumer]) == {"train": 2, "eval": 3}
diff --git a/tests/unit/ray/test_worker_proxy_env.py b/tests/unit/ray/test_worker_proxy_env.py
deleted file mode 100644
index 4e4bb7b1..00000000
--- a/tests/unit/ray/test_worker_proxy_env.py
+++ /dev/null
@@ -1,65 +0,0 @@
-from __future__ import annotations
-
-import os
-import sys
-from types import SimpleNamespace
-
-import pytest
-
-from roar.ray import worker
-
-
-class _FakeRemoteMethod:
-    def __init__(self, value):
-        self._value = value
-
-    def remote(self):
-        return self._value
-
-
-class _FakeAgent:
-    def __init__(self, port: int):
-        self.get_proxy_port = _FakeRemoteMethod(port)
-
-
-class _FakeRay:
-    def __init__(self, port: int):
-        self._agent = _FakeAgent(port)
-        self.get_actor_name: str | None = None
-        self.get_actor_namespace: str | None = None
-
-    def get_runtime_context(self):
-        return SimpleNamespace(get_node_id=lambda: "node-12345678")
-
-    def get_actor(self, name: str, namespace: str | None = None):
-        self.get_actor_name = name
-        self.get_actor_namespace = namespace
-        return self._agent
-
-    def get(self, value, timeout: int | None = None):
-        del timeout
-        return value
-
-
-def test_configure_local_proxy_endpoint_from_node_agent(monkeypatch: pytest.MonkeyPatch) -> None:
-    fake_ray = _FakeRay(port=9012)
-    monkeypatch.setitem(sys.modules, "ray", fake_ray)
-    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
-    monkeypatch.delenv("AWS_ENDPOINT_URL", raising=False)
-
-    worker._configure_local_proxy_endpoint()
-
-    assert fake_ray.get_actor_namespace == "roar"
-    assert fake_ray.get_actor_name == "roar-node-agent-job-123-node-123"
-    assert os.environ.get("AWS_ENDPOINT_URL") == "http://127.0.0.1:9012"
-
-
-def test_configure_local_proxy_endpoint_preserves_existing_env(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://existing-endpoint")
-    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
-
-    worker._configure_local_proxy_endpoint()
-
-    assert os.environ.get("AWS_ENDPOINT_URL") == "http://existing-endpoint"
diff --git a/tests/unit/ray/test_worker_setup.py b/tests/unit/ray/test_worker_setup.py
deleted file mode 100644
index 8b1ada60..00000000
--- a/tests/unit/ray/test_worker_setup.py
+++ /dev/null
@@ -1,218 +0,0 @@
-from __future__ import annotations
-
-import builtins
-import sys
-import tempfile
-from pathlib import Path
-from types import SimpleNamespace
-from unittest.mock import MagicMock
-
-import pytest
-
-from roar.ray import worker
-
-
-def test_setup_defers_actor_initialization_until_event_logging(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    monkeypatch.setenv("ROAR_LOG_DIR", str(tmp_path))
-    monkeypatch.setattr(worker, "_choose_backend", lambda: "actor")
-    init_actor = MagicMock()
-    monkeypatch.setattr(worker, "_init_actor", init_actor)
-    monkeypatch.setattr(worker, "_patch_boto3", lambda: None)
-    monkeypatch.setattr(worker, "_patch_pandas", lambda: None)
-    monkeypatch.setattr(worker, "_patch_pyarrow_filesystem", lambda: None)
-    monkeypatch.setattr(worker, "_patch_ray_data", lambda: None)
-    monkeypatch.setattr(worker, "_configure_local_proxy_endpoint", lambda: None)
-    monkeypatch.setattr(worker.atexit, "register", lambda *_args, **_kwargs: None)
-
-    if hasattr(worker.setup, "_roar_worker_ready"):
-        delattr(worker.setup, "_roar_worker_ready")
-
-    original_open = builtins.open
-    try:
-        worker.setup()
-    finally:
-        builtins.open = original_open
-        if hasattr(worker.setup, "_roar_worker_ready"):
-            delattr(worker.setup, "_roar_worker_ready")
-
-    init_actor.assert_not_called()
-
-
-def test_setup_skips_proxy_endpoint_configuration_by_default(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    monkeypatch.setenv("ROAR_LOG_DIR", str(tmp_path))
-    monkeypatch.setattr(worker, "_choose_backend", lambda: "filesystem")
-    monkeypatch.setattr(worker, "_patch_boto3", lambda: None)
-    monkeypatch.setattr(worker, "_patch_pandas", lambda: None)
-    monkeypatch.setattr(worker, "_patch_pyarrow_filesystem", lambda: None)
-    monkeypatch.setattr(worker, "_patch_ray_data", lambda: None)
-    configure_proxy = MagicMock()
-    monkeypatch.setattr(worker, "_configure_local_proxy_endpoint", configure_proxy)
-    monkeypatch.setattr(worker.atexit, "register", lambda *_args, **_kwargs: None)
-
-    if hasattr(worker.setup, "_roar_worker_ready"):
-        delattr(worker.setup, "_roar_worker_ready")
-
-    original_open = builtins.open
-    try:
-        worker.setup()
-    finally:
-        builtins.open = original_open
-        if hasattr(worker.setup, "_roar_worker_ready"):
-            delattr(worker.setup, "_roar_worker_ready")
-
-    configure_proxy.assert_not_called()
-
-
-def test_setup_skips_optional_sdk_patches_by_default(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    monkeypatch.setenv("ROAR_LOG_DIR", str(tmp_path))
-    monkeypatch.setattr(worker, "_choose_backend", lambda: "filesystem")
-    patch_boto3 = MagicMock()
-    patch_pandas = MagicMock()
-    patch_pyarrow = MagicMock()
-    patch_ray_data = MagicMock()
-    monkeypatch.setattr(worker, "_patch_boto3", patch_boto3)
-    monkeypatch.setattr(worker, "_patch_pandas", patch_pandas)
-    monkeypatch.setattr(worker, "_patch_pyarrow_filesystem", patch_pyarrow)
-    monkeypatch.setattr(worker, "_patch_ray_data", patch_ray_data)
-    monkeypatch.setattr(worker, "_configure_local_proxy_endpoint", lambda: None)
-    monkeypatch.setattr(worker.atexit, "register", lambda *_args, **_kwargs: None)
-
-    if hasattr(worker.setup, "_roar_worker_ready"):
-        delattr(worker.setup, "_roar_worker_ready")
-
-    original_open = builtins.open
-    try:
-        worker.setup()
-    finally:
-        builtins.open = original_open
-        if hasattr(worker.setup, "_roar_worker_ready"):
-            delattr(worker.setup, "_roar_worker_ready")
-
-    patch_boto3.assert_not_called()
-    patch_pandas.assert_not_called()
-    patch_pyarrow.assert_not_called()
-    patch_ray_data.assert_not_called()
-
-
-def test_init_actor_only_attempts_lookup_when_actor_missing(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    class _FakeRay:
-        def __init__(self) -> None:
-            self.get_actor_calls: list[tuple[str, str | None]] = []
-
-        def get_actor(self, name: str, namespace: str | None = None):
-            self.get_actor_calls.append((name, namespace))
-            raise ValueError("missing")
-
-    class _FakeCollectorActor:
-        options_called = False
-
-        @classmethod
-        def options(cls, **_kwargs):
-            cls.options_called = True
-            return SimpleNamespace(remote=lambda: "ok")
-
-    monkeypatch.setitem(sys.modules, "ray", _FakeRay())
-    monkeypatch.setitem(
-        sys.modules,
-        "roar.ray.actor",
-        SimpleNamespace(RoarLogCollectorActor=_FakeCollectorActor),
-    )
-    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
-    monkeypatch.setattr(worker, "_actor", None)
-
-    worker._init_actor()
-
-    assert _FakeCollectorActor.options_called is False
-
-
-def test_runtime_context_ids_short_circuit_when_ray_not_initialized(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    class _FakeRay:
-        context_lookups = 0
-
-        def is_initialized(self) -> bool:
-            return False
-
-        def get_runtime_context(self):
-            self.context_lookups += 1
-            return SimpleNamespace(get_task_id=lambda: None, get_node_id=lambda: None)
-
-    fake_ray = _FakeRay()
-    monkeypatch.setitem(sys.modules, "ray", fake_ray)
-
-    task_id, node_id = worker._runtime_context_ids()
-
-    assert task_id is None
-    assert node_id is None
-    assert fake_ray.context_lookups == 0
-
-
-def test_patch_tempfile_logs_named_temporary_file_writes(
-    monkeypatch: pytest.MonkeyPatch,
-    tmp_path: Path,
-) -> None:
-    monkeypatch.setattr(worker, "_runtime_context_ids", lambda: ("task-123", None))
-    monkeypatch.setattr(worker, "_BACKEND", "filesystem")
-    monkeypatch.setattr(worker, "_actor", None)
-    monkeypatch.setattr(worker, "_LOG_DIR", str(tmp_path))
-
-    captured: list[tuple[str, dict[str, object]]] = []
-    monkeypatch.setattr(
-        worker,
-        "_write_to_file",
-        lambda task_id, payload: captured.append((task_id, payload)),
-    )
-
-    original_named_tempfile = tempfile.NamedTemporaryFile
-    try:
-        worker._patch_tempfile()
-        with tempfile.NamedTemporaryFile(delete=True, dir=tmp_path):
-            pass
-    finally:
-        tempfile.NamedTemporaryFile = original_named_tempfile
-        if hasattr(tempfile, "_roar_worker_tempfile_patched"):
-            delattr(tempfile, "_roar_worker_tempfile_patched")
-
-    assert captured
-    task_id, payload = captured[0]
-    assert task_id == "task-123"
-    assert payload["mode"] == "w"
-    assert str(payload["path"]).startswith(str(tmp_path))
-
-
-def test_log_access_flushes_actor_buffer_for_small_batches(
-    monkeypatch: pytest.MonkeyPatch,
-) -> None:
-    batches: list[list[dict[str, object]]] = []
-
-    class _FakeAppendBatch:
-        @staticmethod
-        def remote(batch):
-            batches.append(list(batch))
-
-    class _FakeActor:
-        append_batch = _FakeAppendBatch()
-
-    monkeypatch.setattr(worker, "_BACKEND", "actor")
-    monkeypatch.setattr(worker, "_actor", _FakeActor())
-    monkeypatch.setattr(worker, "_event_buffer", [])
-    monkeypatch.setattr(worker, "_runtime_context_ids", lambda: ("task-abc", None))
-    monkeypatch.setattr(worker, "_init_actor", lambda: None)
-    monkeypatch.setattr(worker, "_FLUSH_THRESHOLD", 50)
-
-    worker._log_access("/tmp/demo.txt", "w", capture_method="python")
-
-    assert batches
-    assert batches[0][0]["path"] == "/tmp/demo.txt"
diff --git a/tests/unit/test_collect_ray_io_bug.py b/tests/unit/test_collect_ray_io_bug.py
new file mode 100644
index 00000000..b516ca0d
--- /dev/null
+++ b/tests/unit/test_collect_ray_io_bug.py
@@ -0,0 +1,150 @@
+from __future__ import annotations
+
+import builtins
+import importlib.util
+import os
+import sys
+from pathlib import Path
+from types import ModuleType
+
+import pytest
+
+INJECT_DIR = Path(__file__).resolve().parents[2] / "roar" / "services" / "execution" / "inject"
+
+
+def _load_sitecustomize_module():
+    real_open = builtins.open
+    real_import = builtins.__import__
+    real_environ_get = os.environ.get
+
+    spec = importlib.util.spec_from_file_location(
+        "sitecustomize_collect_ray_io_bug_test",
+        str(INJECT_DIR / "sitecustomize.py"),
+    )
+    assert spec is not None and spec.loader is not None
+
+    module = importlib.util.module_from_spec(spec)
+    try:
+        spec.loader.exec_module(module)
+    finally:
+        builtins.open = real_open
+        builtins.__import__ = real_import
+        os.environ.get = real_environ_get
+
+    return module
+
+
+class _AccessTrackingProxyLogs(dict[str, dict]):
+    def __init__(self, payload: dict[str, dict]) -> None:
+        super().__init__(payload)
+        self.accessed = False
+
+    def _mark(self) -> None:
+        self.accessed = True
+
+    def __getitem__(self, key):
+        self._mark()
+        return super().__getitem__(key)
+
+    def __iter__(self):
+        self._mark()
+        return super().__iter__()
+
+    def __len__(self) -> int:
+        self._mark()
+        return super().__len__()
+
+    def get(self, key, default=None):
+        self._mark()
+        return super().get(key, default)
+
+    def items(self):
+        self._mark()
+        return super().items()
+
+    def keys(self):
+        self._mark()
+        return super().keys()
+
+    def values(self):
+        self._mark()
+        return super().values()
+
+
+def test_collect_ray_io_does_not_discard_proxy_logs_before_processing(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    sitecustomize = _load_sitecustomize_module()
+
+    fake_ray = ModuleType("ray")
+    fake_ray.is_initialized = lambda: True
+
+    def _missing_actor(_name: str, namespace: str | None = None):
+        del namespace
+        raise ValueError("No actor")
+
+    fake_ray.get_actor = _missing_actor
+    monkeypatch.setitem(sys.modules, "ray", fake_ray)
+    monkeypatch.setenv("ROAR_WRAP", "1")
+    monkeypatch.setenv("ROAR_JOB_ID", "job-123")
+
+    proxy_logs = _AccessTrackingProxyLogs(
+        {
+            "node-abc123": {
+                "entries": [
+                    "[S3:PutObject] s3://bucket/key.parquet  (1024 bytes)  etag=abc123",
+                    "[S3:GetObject] s3://bucket/input.csv  (2048 bytes)  etag=def456",
+                ],
+                "node_id": "node-abc123",
+                "proxy_port": 18080,
+            }
+        }
+    )
+
+    sitecustomize._collect_ray_io(proxy_logs=proxy_logs)
+
+    # The fix should remove `del proxy_logs`, parse these proxy log entries into
+    # artifact refs, and emit fragments directly even if the detached collector
+    # actor has already been removed during the Phase 2 shutdown path.
+    assert proxy_logs.accessed, (
+        "_collect_ray_io returned without ever inspecting proxy_logs. "
+        "This drops node-agent proxy log data on the floor when the collector actor is gone."
+    )
+
+
+def test_collect_ray_io_collects_node_agent_logs_when_called_without_args(
+    monkeypatch: pytest.MonkeyPatch,
+) -> None:
+    sitecustomize = _load_sitecustomize_module()
+
+    fake_ray = ModuleType("ray")
+    fake_ray.is_initialized = lambda: True
+    monkeypatch.setitem(sys.modules, "ray", fake_ray)
+    monkeypatch.setenv("ROAR_WRAP", "1")
+    monkeypatch.setenv("ROAR_RAY_NODE_AGENTS", "1")
+
+    parsed_refs = []
+
+    monkeypatch.setattr(
+        sitecustomize,
+        "_collect_node_agent_logs",
+        lambda _ray_module: {
+            "node-abc123": {
+                "proxy_log_lines": [
+                    "[S3:PutObject] s3://bucket/key.parquet  (1024 bytes)  etag=abc123",
+                    "[S3:GetObject] s3://bucket/input.csv  (2048 bytes)  etag=def456",
+                ],
+                "node_id": "node-abc123",
+                "proxy_port": 18080,
+            }
+        },
+    )
+    monkeypatch.setattr(
+        sitecustomize,
+        "_write_proxy_artifacts_to_db",
+        lambda refs: parsed_refs.extend(refs),
+    )
+
+    sitecustomize._collect_ray_io()
+
+    assert [kind for kind, _ref in parsed_refs] == ["write", "read"]
diff --git a/tests/unit/test_glaas_auth.py b/tests/unit/test_glaas_auth.py
new file mode 100644
index 00000000..cfd627a4
--- /dev/null
+++ b/tests/unit/test_glaas_auth.py
@@ -0,0 +1,15 @@
+from roar.glaas.auth import get_glaas_url
+
+
+def test_get_glaas_url_prefers_env_over_config(monkeypatch):
+    monkeypatch.setenv("GLAAS_URL", "https://env.example.com")
+    monkeypatch.setattr("roar.config.config_get", lambda key: "https://config.example.com")
+
+    assert get_glaas_url() == "https://env.example.com"
+
+
+def test_get_glaas_url_falls_back_to_config(monkeypatch):
+    monkeypatch.delenv("GLAAS_URL", raising=False)
+    monkeypatch.setattr("roar.config.config_get", lambda key: "https://config.example.com")
+
+    assert get_glaas_url() == "https://config.example.com"
diff --git a/tests/unit/test_pth_pydantic_import.py b/tests/unit/test_pth_pydantic_import.py
new file mode 100644
index 00000000..61ccd46c
--- /dev/null
+++ b/tests/unit/test_pth_pydantic_import.py
@@ -0,0 +1,64 @@
+import os
+import subprocess
+import sys
+import textwrap
+from pathlib import Path
+
+SOURCE_ROOT = Path(__file__).resolve().parents[2]
+
+
+def _run_python(code: str) -> subprocess.CompletedProcess[str]:
+    env = dict(os.environ)
+    existing = env.get("PYTHONPATH", "")
+    source_root = str(SOURCE_ROOT)
+    env["PYTHONPATH"] = source_root if not existing else source_root + os.pathsep + existing
+    return subprocess.run(
+        [sys.executable, "-c", code],
+        capture_output=True,
+        text=True,
+        check=False,
+        env=env,
+        cwd=SOURCE_ROOT,
+        timeout=30,
+    )
+
+
+def test_pth_import_does_not_require_pydantic() -> None:
+    code = textwrap.dedent(
+        """
+        import importlib
+        import importlib.abc
+        import os
+        import sys
+
+        class _BlockPydantic(importlib.abc.MetaPathFinder):
+            def find_spec(self, fullname, path=None, target=None):
+                if fullname.startswith("pydantic"):
+                    raise ImportError(f"blocked import: {fullname}")
+                return None
+
+        sys.meta_path.insert(0, _BlockPydantic())
+        os.environ["ROAR_WRAP"] = "1"
+        importlib.import_module("roar.services.execution.inject.sitecustomize")
+        """
+    )
+
+    result = _run_python(code)
+
+    assert result.returncode == 0, result.stderr
+
+
+def test_pth_import_chain_succeeds_when_pydantic_available() -> None:
+    code = textwrap.dedent(
+        """
+        import importlib
+        import os
+
+        os.environ["ROAR_WRAP"] = "1"
+        importlib.import_module("roar.services.execution.inject.sitecustomize")
+        """
+    )
+
+    result = _run_python(code)
+
+    assert result.returncode == 0, result.stderr
diff --git a/tests/unit/test_ray_job_auto_init.py b/tests/unit/test_ray_job_auto_init.py
index 8ad8e2ee..e4c18348 100644
--- a/tests/unit/test_ray_job_auto_init.py
+++ b/tests/unit/test_ray_job_auto_init.py
@@ -13,9 +13,9 @@
 
 def _invoke_run(ctx: RoarContext, monkeypatch, ray_job_id: str | None):
     if ray_job_id is None:
-        monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+        monkeypatch.delenv("RAY_JOB_ID", raising=False)
     else:
-        monkeypatch.setenv("RAY_JOB_CONFIG_JSON_ENV_VAR", ray_job_id)
+        monkeypatch.setenv("RAY_JOB_ID", ray_job_id)
 
     runner = CliRunner()
     with (
@@ -33,7 +33,7 @@ def test_run_in_uninitialized_tmpdir_without_ray_job_id_exits_with_not_initializ
     tmp_path, monkeypatch
 ) -> None:
     monkeypatch.chdir(tmp_path)
-    monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+    monkeypatch.delenv("RAY_JOB_ID", raising=False)
     ctx = RoarContext.create(cwd=tmp_path)
 
     runner = CliRunner()
diff --git a/tests/unit/test_ray_job_git_requirement.py b/tests/unit/test_ray_job_git_requirement.py
index 6f70e233..30576d52 100644
--- a/tests/unit/test_ray_job_git_requirement.py
+++ b/tests/unit/test_ray_job_git_requirement.py
@@ -24,9 +24,9 @@ def _invoke_run(
     monkeypatch,
 ):
     if ray_job_id is None:
-        monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+        monkeypatch.delenv("RAY_JOB_ID", raising=False)
     else:
-        monkeypatch.setenv("RAY_JOB_CONFIG_JSON_ENV_VAR", ray_job_id)
+        monkeypatch.setenv("RAY_JOB_ID", ray_job_id)
 
     runner = CliRunner()
     with (
@@ -41,7 +41,7 @@ def _invoke_run(
 
 def test_run_in_non_git_dir_without_ray_job_id_exits_with_git_error(tmp_path, monkeypatch) -> None:
     monkeypatch.chdir(tmp_path)
-    monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+    monkeypatch.delenv("RAY_JOB_ID", raising=False)
 
     runner = CliRunner()
     result = runner.invoke(run, ["python", "main.py"], obj=_ctx(tmp_path))
diff --git a/tests/unit/test_ray_job_git_skip.py b/tests/unit/test_ray_job_git_skip.py
index 2f74aa23..2cbae1d2 100644
--- a/tests/unit/test_ray_job_git_skip.py
+++ b/tests/unit/test_ray_job_git_skip.py
@@ -20,9 +20,9 @@ def _ctx(base_dir: Path) -> MagicMock:
 
 def _invoke_run(base_dir: Path, ray_job_id: str | None, monkeypatch):
     if ray_job_id is None:
-        monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+        monkeypatch.delenv("RAY_JOB_ID", raising=False)
     else:
-        monkeypatch.setenv("RAY_JOB_CONFIG_JSON_ENV_VAR", ray_job_id)
+        monkeypatch.setenv("RAY_JOB_ID", ray_job_id)
 
     runner = CliRunner()
     with (
@@ -37,7 +37,7 @@ def _invoke_run(base_dir: Path, ray_job_id: str | None, monkeypatch):
 
 def test_run_non_git_without_ray_job_id_exits_with_git_error(tmp_path, monkeypatch) -> None:
     monkeypatch.chdir(tmp_path)
-    monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+    monkeypatch.delenv("RAY_JOB_ID", raising=False)
 
     runner = CliRunner()
     result = runner.invoke(run, ["python", "main.py"], obj=_ctx(tmp_path))
diff --git a/tests/unit/test_ray_job_id_bypass.py b/tests/unit/test_ray_job_id_bypass.py
new file mode 100644
index 00000000..1a92a96a
--- /dev/null
+++ b/tests/unit/test_ray_job_id_bypass.py
@@ -0,0 +1,38 @@
+"""Regression test for RAY_JOB_ID-based bypass in `roar run`."""
+
+import importlib
+from unittest.mock import patch
+
+from click.testing import CliRunner
+
+from roar.cli.commands.run import run
+from roar.cli.context import RoarContext
+
+run_module = importlib.import_module("roar.cli.commands.run")
+
+
+def test_run_with_ray_job_id_auto_inits_and_bypasses_git_check(tmp_path, monkeypatch) -> None:
+    monkeypatch.chdir(tmp_path)
+    monkeypatch.setenv("RAY_JOB_ID", "rjob-abc123")
+    monkeypatch.delenv("RAY_JOB_CONFIG_JSON_ENV_VAR", raising=False)
+    ctx = RoarContext.create(cwd=tmp_path)
+
+    runner = CliRunner()
+    with (
+        patch.object(run_module, "get_quiet_setting", return_value=False),
+        patch.object(run_module, "get_hash_algorithms", return_value=["blake3"]),
+        patch.object(run_module, "execute_and_report", return_value=0) as mock_exec,
+    ):
+        result = runner.invoke(run, ["python", "main.py"], obj=ctx)
+
+    assert result.exit_code == 0, (
+        "Expected `roar run` inside a Ray job (RAY_JOB_ID set) to auto-init in a non-git "
+        f"directory and continue, but it failed with output:\n{result.output}"
+    )
+    assert (tmp_path / ".roar").is_dir(), (
+        "Expected auto-init to create .roar when RAY_JOB_ID is present."
+    )
+    assert (
+        "roar requires the working directory to be inside a git repository." not in result.output
+    ), "Expected git validation to be bypassed when RAY_JOB_ID is present."
+    mock_exec.assert_called_once()
diff --git a/tests/unit/test_ray_job_submit.py b/tests/unit/test_ray_job_submit.py
index 2abf655a..a4f0c752 100644
--- a/tests/unit/test_ray_job_submit.py
+++ b/tests/unit/test_ray_job_submit.py
@@ -52,6 +52,9 @@ def test_ray_job_submit_injects_pip_with_installed_roar_cli_version(monkeypatch)
 
     runtime_env = _runtime_env_json(rewritten.command)
     assert runtime_env["pip"] == ["roar-cli==9.9.9"]
+    assert "py_executable" not in runtime_env
+    assert runtime_env["worker_process_setup_hook"] == "roar.ray.roar_worker._startup"
+    assert runtime_env["env_vars"]["ROAR_JOB_INSTRUMENTED"] == "1"
     assert rewritten.session_id is None
 
 
@@ -67,14 +70,21 @@ def test_ray_jobs_submit_plural_also_works(monkeypatch) -> None:
     assert rewritten.session_id is None
 
 
-def test_entrypoint_is_wrapped_with_roar_run(monkeypatch) -> None:
+def test_entrypoint_is_wrapped_with_roar_driver_entrypoint(monkeypatch) -> None:
     module = _module()
     monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==1.2.3")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: None)
 
     rewritten = module.maybe_rewrite_ray_job_submit(_base_ray_job_submit_command())
 
-    assert _entrypoint(rewritten.command) == ["roar", "run", "python", "main.py"]
+    assert _entrypoint(rewritten.command) == [
+        "python",
+        "-m",
+        "roar.ray.driver_entrypoint",
+        "--",
+        "python",
+        "main.py",
+    ]
     assert rewritten.session_id is None
 
 
@@ -97,7 +107,7 @@ def test_existing_runtime_env_json_pip_list_is_merged(monkeypatch) -> None:
     assert rewritten.session_id is None
 
 
-def test_existing_runtime_env_json_env_vars_are_preserved_and_glaas_added(monkeypatch) -> None:
+def test_existing_runtime_env_json_env_vars_are_preserved_and_glaas_url_added(monkeypatch) -> None:
     module = _module()
     monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==1.0.0")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: "https://glaas.example.com")
@@ -114,10 +124,9 @@ def test_existing_runtime_env_json_env_vars_are_preserved_and_glaas_added(monkey
     runtime_env = _runtime_env_json(rewritten.command)
     assert runtime_env["env_vars"]["USER_KEY"] == "value"
     assert runtime_env["env_vars"]["GLAAS_URL"] == "https://glaas.example.com"
-    assert runtime_env["env_vars"]["GLAAS_API_URL"] == "https://glaas.example.com"
 
 
-def test_already_wrapped_entrypoint_is_not_double_wrapped(monkeypatch) -> None:
+def test_existing_roar_run_entrypoint_is_unchanged(monkeypatch) -> None:
     module = _module()
     monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==4.5.6")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: None)
@@ -139,11 +148,55 @@ def test_already_wrapped_entrypoint_is_not_double_wrapped(monkeypatch) -> None:
 
     rewritten = module.maybe_rewrite_ray_job_submit(command)
 
-    assert _entrypoint(rewritten.command) == ["roar", "run", "python", "main.py"]
+    assert _entrypoint(rewritten.command) == [
+        "python",
+        "-m",
+        "roar.ray.driver_entrypoint",
+        "--",
+        "roar",
+        "run",
+        "python",
+        "main.py",
+    ]
+    assert rewritten.session_id is None
+
+
+def test_existing_driver_entrypoint_wrapper_is_not_duplicated(monkeypatch) -> None:
+    module = _module()
+    monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==4.5.6")
+    monkeypatch.setattr(module, "_resolve_glaas_url", lambda: None)
+
+    command = [
+        "ray",
+        "job",
+        "submit",
+        "--address",
+        "http://localhost:8265",
+        "--working-dir",
+        ".",
+        "--",
+        "python",
+        "-m",
+        "roar.ray.driver_entrypoint",
+        "--",
+        "python",
+        "main.py",
+    ]
+
+    rewritten = module.maybe_rewrite_ray_job_submit(command)
+
+    assert _entrypoint(rewritten.command) == [
+        "python",
+        "-m",
+        "roar.ray.driver_entrypoint",
+        "--",
+        "python",
+        "main.py",
+    ]
     assert rewritten.session_id is None
 
 
-def test_glaas_url_from_config_is_injected_as_both_env_vars(monkeypatch) -> None:
+def test_glaas_url_from_config_is_injected_into_runtime_env(monkeypatch) -> None:
     module = _module()
     monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==8.0.0")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: "http://localhost:3001")
@@ -152,10 +205,55 @@ def test_glaas_url_from_config_is_injected_as_both_env_vars(monkeypatch) -> None
 
     runtime_env = _runtime_env_json(rewritten.command)
     assert runtime_env["env_vars"]["GLAAS_URL"] == "http://localhost:3001"
-    assert runtime_env["env_vars"]["GLAAS_API_URL"] == "http://localhost:3001"
 
 
-def test_no_glaas_url_configured_env_vars_not_injected(monkeypatch) -> None:
+def test_cluster_glaas_url_override_is_used_for_runtime_env(monkeypatch) -> None:
+    module = _module()
+    registrations: list[tuple[str, str, str]] = []
+    saved_keys: list[tuple[str, dict[str, str]]] = []
+    key = {
+        "session_id": "11111111-1111-1111-1111-111111111111",
+        "token": "ab" * 32,
+        "token_hash": "cd" * 32,
+    }
+
+    monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==8.0.0")
+    monkeypatch.setattr(module, "_resolve_glaas_url", lambda: "http://localhost:3001")
+    monkeypatch.setattr(
+        module,
+        "_register_fragment_session",
+        lambda glaas_url, session_id, token_hash: registrations.append(
+            (glaas_url, session_id, token_hash)
+        ),
+    )
+    monkeypatch.setattr(module, "generate_fragment_key", lambda: key)
+    monkeypatch.setattr(
+        module,
+        "save_key",
+        lambda roar_dir, payload: saved_keys.append((str(roar_dir), payload)),
+    )
+    monkeypatch.setenv("ROAR_CLUSTER_GLAAS_URL", "http://host.docker.internal:3001")
+    monkeypatch.setenv("AWS_ENDPOINT_URL", "http://localhost:9000")
+    monkeypatch.setenv("ROAR_CLUSTER_AWS_ENDPOINT_URL", "http://minio:9000")
+
+    rewritten = module.maybe_rewrite_ray_job_submit(_base_ray_job_submit_command())
+
+    runtime_env = _runtime_env_json(rewritten.command)
+    assert runtime_env["env_vars"]["GLAAS_URL"] == "http://host.docker.internal:3001"
+    assert runtime_env["env_vars"]["ROAR_UPSTREAM_S3_ENDPOINT"] == "http://minio:9000"
+    assert runtime_env["env_vars"]["ROAR_PROXY_PORT"] == "19191"
+    assert registrations == [
+        (
+            "http://localhost:3001",
+            "11111111-1111-1111-1111-111111111111",
+            "cd" * 32,
+        )
+    ]
+    assert saved_keys and saved_keys[0][1] == key
+    assert rewritten.session_id == key["session_id"]
+
+
+def test_no_glaas_url_configured_only_instrumentation_env_var_is_injected(monkeypatch) -> None:
     module = _module()
     monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==8.0.0")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: None)
@@ -163,7 +261,12 @@ def test_no_glaas_url_configured_env_vars_not_injected(monkeypatch) -> None:
     rewritten = module.maybe_rewrite_ray_job_submit(_base_ray_job_submit_command())
 
     runtime_env = _runtime_env_json(rewritten.command)
-    assert "env_vars" not in runtime_env
+    env = runtime_env["env_vars"]
+    assert env["ROAR_JOB_INSTRUMENTED"] == "1"
+    assert env["ROAR_WRAP"] == "1"
+    assert env["ROAR_RAY_NODE_AGENTS"] == "1"
+    assert env["ROAR_PROXY_PORT"] == "19191"
+    assert "ROAR_JOB_ID" in env  # stable job_id for node agent name resolution
     assert rewritten.session_id is None
 
 
diff --git a/tests/unit/test_ray_job_submit_wheel.py b/tests/unit/test_ray_job_submit_wheel.py
index 58fca3f9..34e50159 100644
--- a/tests/unit/test_ray_job_submit_wheel.py
+++ b/tests/unit/test_ray_job_submit_wheel.py
@@ -1,7 +1,6 @@
 import importlib
 import importlib.metadata as importlib_metadata
 import json
-import os
 
 
 def _module():
@@ -32,63 +31,67 @@ def _runtime_env_json(command: list[str]) -> dict:
     raise AssertionError("expected --runtime-env-json in rewritten command")
 
 
-def test_resolve_roar_requirement_returns_none_when_vendor_wheel_exists(
+def _entrypoint(command: list[str]) -> list[str]:
+    separator_index = command.index("--")
+    return command[separator_index + 1 :]
+
+
+def test_resolve_roar_requirement_ignores_vendor_wheel_and_uses_installed_version(
     tmp_path, monkeypatch
 ) -> None:
-    """Vendor wheel = local dev mode: cluster has roar pre-installed, skip pip injection."""
     module = _module()
 
     wheel_path = tmp_path / "vendor" / "roar-cli.whl"
     wheel_path.parent.mkdir(parents=True)
     wheel_path.write_bytes(b"wheel")
 
-    monkeypatch.setattr(os, "getcwd", lambda: str(tmp_path))
+    monkeypatch.setattr(
+        importlib_metadata,
+        "version",
+        lambda package_name: "9.9.9" if package_name == "roar-cli" else "0.0.0",
+    )
 
     requirement = module._resolve_roar_requirement()
 
-    assert requirement is None
+    assert requirement == "roar-cli==9.9.9"
 
 
-def test_resolve_roar_requirement_falls_back_to_pypi_when_no_wheel(tmp_path, monkeypatch) -> None:
+def test_resolve_roar_requirement_falls_back_to_unpinned_package_when_version_missing(
+    monkeypatch,
+) -> None:
     module = _module()
 
-    monkeypatch.setattr(os, "getcwd", lambda: str(tmp_path))
-
     def _fake_version(package_name: str) -> str:
-        if package_name == "roar-cli":
-            return "9.9.9"
         raise importlib_metadata.PackageNotFoundError(package_name)
 
     monkeypatch.setattr(importlib_metadata, "version", _fake_version)
 
     requirement = module._resolve_roar_requirement()
 
-    assert requirement == "roar-cli==9.9.9"
+    assert requirement == "roar-cli"
 
 
-def test_maybe_rewrite_skips_pip_injection_when_vendor_wheel_exists(tmp_path, monkeypatch) -> None:
-    """When vendor wheel is present, no pip key should appear in runtime_env."""
+def test_maybe_rewrite_injects_pip_even_when_vendor_wheel_exists(tmp_path, monkeypatch) -> None:
     module = _module()
 
     wheel_path = tmp_path / "vendor" / "roar-cli.whl"
     wheel_path.parent.mkdir(parents=True)
     wheel_path.write_bytes(b"wheel")
 
-    monkeypatch.setattr(os, "getcwd", lambda: str(tmp_path))
+    monkeypatch.setattr(module, "_resolve_roar_requirement", lambda: "roar-cli==1.2.3")
     monkeypatch.setattr(module, "_resolve_glaas_url", lambda: None)
 
     rewritten = module.maybe_rewrite_ray_job_submit(_base_ray_job_submit_command())
 
-    # Entrypoint still wrapped with roar run
-    assert "roar" in rewritten.command and "run" in rewritten.command
-    # pip must NOT be in runtime_env (cluster has roar pre-installed)
-    for i, arg in enumerate(rewritten.command):
-        if arg == "--runtime-env-json" and i + 1 < len(rewritten.command):
-            env = json.loads(rewritten.command[i + 1])
-            assert "pip" not in env
-            break
-        if arg.startswith("--runtime-env-json="):
-            env = json.loads(arg.split("=", 1)[1])
-            assert "pip" not in env
-            break
+    assert _entrypoint(rewritten.command) == [
+        "python",
+        "-m",
+        "roar.ray.driver_entrypoint",
+        "--",
+        "python",
+        "main.py",
+    ]
+    env = _runtime_env_json(rewritten.command)
+    assert env["pip"] == ["roar-cli==1.2.3"]
+    assert env["env_vars"]["ROAR_JOB_INSTRUMENTED"] == "1"
     assert rewritten.session_id is None
diff --git a/tests/unit/test_register_cli.py b/tests/unit/test_register_cli.py
index bfb18c19..ab06a568 100644
--- a/tests/unit/test_register_cli.py
+++ b/tests/unit/test_register_cli.py
@@ -7,16 +7,8 @@
 from roar.cli.commands.register import register
 
 
-def test_register_cli_accepts_s3_uri(tmp_path):
-    runner = CliRunner()
-
-    ctx = MagicMock()
-    ctx.roar_dir = tmp_path / ".roar"
-    ctx.roar_dir.mkdir()
-    ctx.cwd = tmp_path
-    ctx.is_initialized = True
-
-    fake_result = MagicMock(
+def _fake_result():
+    return MagicMock(
         success=True,
         aborted_by_user=False,
         error=None,
@@ -29,9 +21,19 @@ def test_register_cli_accepts_s3_uri(tmp_path):
         secrets_detected=[],
     )
 
+
+def test_register_cli_accepts_s3_uri(tmp_path):
+    runner = CliRunner()
+
+    ctx = MagicMock()
+    ctx.roar_dir = tmp_path / ".roar"
+    ctx.roar_dir.mkdir()
+    ctx.cwd = tmp_path
+    ctx.is_initialized = True
+
     with patch("roar.cli.commands.register.RegisterService") as mock_service_cls:
         service = MagicMock()
-        service.register_artifact_lineage.return_value = fake_result
+        service.register_lineage_target.return_value = _fake_result()
         mock_service_cls.return_value = service
 
         with patch("roar.cli.commands.register.config_get", return_value="https://glaas.local"):
@@ -42,6 +44,51 @@ def test_register_cli_accepts_s3_uri(tmp_path):
             )
 
     assert result.exit_code == 0, result.output
-    service.register_artifact_lineage.assert_called_once()
-    called_artifact_path = service.register_artifact_lineage.call_args.kwargs["artifact_path"]
+    service.register_lineage_target.assert_called_once()
+    called_artifact_path = service.register_lineage_target.call_args.kwargs["target"]
     assert called_artifact_path == "s3://output-bucket/results/run123/final_report.json"
+
+
+def test_register_cli_accepts_step_reference(tmp_path):
+    runner = CliRunner()
+
+    ctx = MagicMock()
+    ctx.roar_dir = tmp_path / ".roar"
+    ctx.roar_dir.mkdir()
+    ctx.cwd = tmp_path
+    ctx.is_initialized = True
+
+    with patch("roar.cli.commands.register.RegisterService") as mock_service_cls:
+        service = MagicMock()
+        service.register_lineage_target.return_value = _fake_result()
+        mock_service_cls.return_value = service
+
+        with patch("roar.cli.commands.register.config_get", return_value="https://glaas.local"):
+            result = runner.invoke(register, ["@4", "--yes"], obj=ctx)
+
+    assert result.exit_code == 0, result.output
+    service.register_lineage_target.assert_called_once()
+    assert service.register_lineage_target.call_args.kwargs["target"] == "@4"
+
+
+def test_register_cli_accepts_session_hash(tmp_path):
+    runner = CliRunner()
+
+    ctx = MagicMock()
+    ctx.roar_dir = tmp_path / ".roar"
+    ctx.roar_dir.mkdir()
+    ctx.cwd = tmp_path
+    ctx.is_initialized = True
+
+    session_hash = "c" * 64
+    with patch("roar.cli.commands.register.RegisterService") as mock_service_cls:
+        service = MagicMock()
+        service.register_lineage_target.return_value = _fake_result()
+        mock_service_cls.return_value = service
+
+        with patch("roar.cli.commands.register.config_get", return_value="https://glaas.local"):
+            result = runner.invoke(register, [session_hash, "--yes"], obj=ctx)
+
+    assert result.exit_code == 0, result.output
+    service.register_lineage_target.assert_called_once()
+    assert service.register_lineage_target.call_args.kwargs["target"] == session_hash
diff --git a/tests/unit/test_register_service.py b/tests/unit/test_register_service.py
index 50a28921..605d756b 100644
--- a/tests/unit/test_register_service.py
+++ b/tests/unit/test_register_service.py
@@ -82,6 +82,116 @@ def service(
             session_service=mock_session_service,
         )
 
+    def test_register_lineage_target_dispatches_step_reference(self, service, tmp_path):
+        with patch.object(service, "register_step_lineage") as register_step:
+            register_step.return_value = RegisterResult(success=True)
+
+            result = service.register_lineage_target(
+                target="@4",
+                roar_dir=tmp_path / ".roar",
+                cwd=tmp_path,
+            )
+
+        assert result.success is True
+        register_step.assert_called_once_with(
+            step_reference="@4",
+            roar_dir=tmp_path / ".roar",
+            cwd=tmp_path,
+            dry_run=False,
+            as_blake3=False,
+            skip_confirmation=False,
+            confirm_callback=None,
+        )
+
+    def test_register_lineage_target_dispatches_session_hash(self, service, tmp_path):
+        session_hash = "a" * 64
+        with patch.object(service, "register_session_lineage") as register_session:
+            register_session.return_value = RegisterResult(success=True)
+
+            result = service.register_lineage_target(
+                target=session_hash,
+                roar_dir=tmp_path / ".roar",
+                cwd=tmp_path,
+            )
+
+        assert result.success is True
+        register_session.assert_called_once_with(
+            session_hash=session_hash,
+            roar_dir=tmp_path / ".roar",
+            cwd=tmp_path,
+            dry_run=False,
+            as_blake3=False,
+            skip_confirmation=False,
+            confirm_callback=None,
+        )
+
+    def test_register_lineage_target_dispatches_artifact_path(self, service, tmp_path):
+        artifact_path = "metrics.json"
+        with patch.object(service, "register_artifact_lineage") as register_artifact:
+            register_artifact.return_value = RegisterResult(success=True)
+
+            result = service.register_lineage_target(
+                target=artifact_path,
+                roar_dir=tmp_path / ".roar",
+                cwd=tmp_path,
+            )
+
+        assert result.success is True
+        register_artifact.assert_called_once_with(
+            artifact_path=artifact_path,
+            roar_dir=tmp_path / ".roar",
+            cwd=tmp_path,
+            dry_run=False,
+            as_blake3=False,
+            skip_confirmation=False,
+            confirm_callback=None,
+        )
+
+    def test_order_jobs_for_registration_puts_parent_before_child(self, service):
+        parent = {
+            "id": 2,
+            "job_uid": "parent-uid",
+            "step_number": 1,
+            "timestamp": 20.0,
+        }
+        child = {
+            "id": 1,
+            "job_uid": "child-uid",
+            "parent_job_uid": "parent-uid",
+            "step_number": 1,
+            "timestamp": 10.0,
+        }
+
+        ordered = service._order_jobs_for_registration([child, parent])
+
+        assert [job["job_uid"] for job in ordered] == ["parent-uid", "child-uid"]
+
+    def test_normalize_jobs_for_registration_maps_unresolved_ray_parent_to_submit_job(
+        self, service
+    ):
+        submit_job = {
+            "id": 1,
+            "job_uid": "local-submit",
+            "step_number": 1,
+            "timestamp": 10.0,
+            "command": "ray job submit --address http://localhost:8265 -- python main.py",
+            "job_type": None,
+        }
+        phase_job = {
+            "id": 2,
+            "job_uid": "phase-job",
+            "parent_job_uid": "remote-driver",
+            "step_number": 4,
+            "timestamp": 40.0,
+            "command": "ray_task:evaluation",
+            "job_type": "ray_task",
+        }
+
+        normalized = service._normalize_jobs_for_registration([phase_job, submit_job])
+
+        jobs_by_uid = {job["job_uid"]: job for job in normalized}
+        assert jobs_by_uid["phase-job"]["parent_job_uid"] == "local-submit"
+
     def test_register_artifact_lineage_file_not_found(self, service):
         """Test error when artifact file doesn't exist."""
         result = service.register_artifact_lineage(
diff --git a/tests/unit/test_run_ray_job_submit_integration.py b/tests/unit/test_run_ray_job_submit_integration.py
index 322cb2c0..df30c4e5 100644
--- a/tests/unit/test_run_ray_job_submit_integration.py
+++ b/tests/unit/test_run_ray_job_submit_integration.py
@@ -44,8 +44,6 @@ def test_run_with_ray_job_submit_calls_rewrite() -> None:
         "--runtime-env-json",
         '{"pip":["roar-cli==1.2.3"]}',
         "--",
-        "roar",
-        "run",
         "python",
         "main.py",
     ]
@@ -110,8 +108,6 @@ def test_run_with_ray_job_submit_triggers_auto_reconstitution() -> None:
         "--runtime-env-json",
         '{"pip":["roar-cli==1.2.3"]}',
         "--",
-        "roar",
-        "run",
         "python",
         "main.py",
     ]
diff --git a/tests/unit/test_sitecustomize_perf.py b/tests/unit/test_sitecustomize_perf.py
index 645cc75d..cc48474c 100644
--- a/tests/unit/test_sitecustomize_perf.py
+++ b/tests/unit/test_sitecustomize_perf.py
@@ -18,8 +18,6 @@ def _roar_env(*, log_file: str | None = None) -> dict:
         env["ROAR_LOG_FILE"] = log_file
     else:
         env.pop("ROAR_LOG_FILE", None)
-    # Point ROAR_LOG_DIR at a non-existent dir to trigger the fast-path.
-    env["ROAR_LOG_DIR"] = "/tmp/roar_perf_test_nonexistent_dir"
     return env
 
 
@@ -62,14 +60,12 @@ def test_sitecustomize_import_overhead_under_threshold():
 
 def test_atexit_overhead_without_ray_logs_under_threshold(tmp_path):
     """
-    _collect_ray_io should skip the heavy collector import when ROAR_LOG_DIR
-    is empty/absent. Total overhead with LOG_FILE but no Ray logs should be
-    less than 600ms over baseline.
+    _collect_ray_io should remain lightweight when no Ray collector actor is
+    present. Total overhead with LOG_FILE should be less than 600ms over baseline.
     Previously ~2160ms; target after optimizations is <600ms.
     """
     log_file = str(tmp_path / "test_inject.json")
     env = _roar_env(log_file=log_file)
-    # ROAR_LOG_DIR is set to non-existent dir in _roar_env(), so collector is skipped.
     baseline_ms = _run_pass(_no_roar_env(), n=5)
     roar_ms = _run_pass(env, n=5)
     overhead_ms = roar_ms - baseline_ms
@@ -81,8 +77,7 @@ def test_atexit_overhead_without_ray_logs_under_threshold(tmp_path):
 
 def test_collect_ray_io_skips_import_when_no_logs(tmp_path, monkeypatch):
     """
-    _collect_ray_io should return early without importing roar.ray.collector
-    when ROAR_LOG_DIR is empty/absent and proxy_logs is empty.
+    _collect_ray_io should not import roar.ray.collector in fragments-only mode.
     """
     # Remove collector from sys.modules if present.
     sys.modules.pop("roar.ray.collector", None)
@@ -99,8 +94,6 @@ def test_collect_ray_io_skips_import_when_no_logs(tmp_path, monkeypatch):
 
     # Patch environment.
     monkeypatch.setenv("ROAR_WRAP", "1")
-    monkeypatch.setenv("ROAR_LOG_DIR", str(tmp_path / "nonexistent"))
-
     spec.loader.exec_module(sc)
 
     before_modules = set(sys.modules.keys())
diff --git a/tests/unit/test_sync_packaged_rust_artifacts.py b/tests/unit/test_sync_packaged_rust_artifacts.py
new file mode 100644
index 00000000..a5354b0e
--- /dev/null
+++ b/tests/unit/test_sync_packaged_rust_artifacts.py
@@ -0,0 +1,130 @@
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+from scripts.sync_packaged_rust_artifacts import (
+    ArtifactSpec,
+    SyncLayout,
+    sync_packaged_rust_artifacts,
+    sync_reason,
+)
+
+
+def _write_file(path: Path, text: str) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    path.write_text(text, encoding="utf-8")
+
+
+def _set_mtime(path: Path, timestamp: float) -> None:
+    os.utime(path, (timestamp, timestamp))
+
+
+def _layout(tmp_path: Path) -> SyncLayout:
+    root_dir = tmp_path
+    rust_manifest = root_dir / "rust" / "Cargo.toml"
+    release_dir = root_dir / "rust" / "target" / "release"
+    package_bin_dir = root_dir / "roar" / "bin"
+    return SyncLayout(
+        root_dir=root_dir,
+        rust_manifest=rust_manifest,
+        release_dir=release_dir,
+        package_bin_dir=package_bin_dir,
+        artifacts=(
+            ArtifactSpec(
+                package_name="roar-tracer",
+                source_paths=(
+                    rust_manifest,
+                    root_dir / "rust" / "Cargo.lock",
+                    root_dir / "rust" / "tracers" / "ptrace",
+                ),
+                binary_names=("roar-tracer",),
+            ),
+            ArtifactSpec(
+                package_name="roar-tracer-preload",
+                source_paths=(
+                    rust_manifest,
+                    root_dir / "rust" / "Cargo.lock",
+                    root_dir / "rust" / "tracers" / "preload",
+                ),
+                binary_names=("roar-tracer-preload",),
+                library_names=("libroar_tracer_preload.so",),
+            ),
+        ),
+    )
+
+
+def test_sync_reason_detects_non_preload_tracer_staleness(tmp_path: Path) -> None:
+    layout = _layout(tmp_path)
+    cargo_lock = layout.root_dir / "rust" / "Cargo.lock"
+    ptrace_source = layout.root_dir / "rust" / "tracers" / "ptrace" / "src" / "main.rs"
+    release_tracer = layout.release_dir / "roar-tracer"
+    package_tracer = layout.package_bin_dir / "roar-tracer"
+
+    _write_file(layout.rust_manifest, "[workspace]\n")
+    _write_file(cargo_lock, "")
+    _write_file(ptrace_source, "ptrace source\n")
+    _write_file(release_tracer, "release-tracer\n")
+    _write_file(package_tracer, "release-tracer\n")
+    _write_file(layout.release_dir / "roar-tracer-preload", "release-preload\n")
+    _write_file(layout.release_dir / "libroar_tracer_preload.so", "release-library\n")
+    _write_file(layout.package_bin_dir / "roar-tracer-preload", "release-preload\n")
+    _write_file(layout.package_bin_dir / "libroar_tracer_preload.so", "release-library\n")
+
+    _set_mtime(layout.rust_manifest, 50.0)
+    _set_mtime(cargo_lock, 50.0)
+    _set_mtime(release_tracer, 300.0)
+    _set_mtime(package_tracer, 100.0)
+    _set_mtime(ptrace_source, 200.0)
+
+    assert sync_reason(layout) == "packaged roar-tracer is older than its sources"
+
+
+def test_sync_reason_detects_preload_library_mismatch(tmp_path: Path) -> None:
+    layout = _layout(tmp_path)
+    _write_file(layout.rust_manifest, "[workspace]\n")
+    _write_file(layout.root_dir / "rust" / "Cargo.lock", "")
+    _write_file(
+        layout.root_dir / "rust" / "tracers" / "ptrace" / "src" / "main.rs", "ptrace source\n"
+    )
+    _write_file(
+        layout.root_dir / "rust" / "tracers" / "preload" / "src" / "lib.rs", "preload source\n"
+    )
+    _write_file(layout.release_dir / "roar-tracer", "release-tracer\n")
+    _write_file(layout.package_bin_dir / "roar-tracer", "release-tracer\n")
+    _write_file(layout.release_dir / "roar-tracer-preload", "release-preload\n")
+    _write_file(layout.release_dir / "libroar_tracer_preload.so", "release-library\n")
+    _write_file(layout.package_bin_dir / "roar-tracer-preload", "release-preload\n")
+    _write_file(layout.package_bin_dir / "libroar_tracer_preload.so", "different-library\n")
+
+    assert (
+        sync_reason(layout)
+        == "packaged library for roar-tracer-preload differs from release artifact"
+    )
+
+
+def test_sync_packaged_rust_artifacts_copies_release_outputs(tmp_path: Path) -> None:
+    layout = _layout(tmp_path)
+    _write_file(layout.rust_manifest, "[workspace]\n")
+    _write_file(layout.root_dir / "rust" / "Cargo.lock", "")
+    _write_file(
+        layout.root_dir / "rust" / "tracers" / "ptrace" / "src" / "main.rs", "ptrace source\n"
+    )
+    _write_file(
+        layout.root_dir / "rust" / "tracers" / "preload" / "src" / "lib.rs", "preload source\n"
+    )
+    _write_file(layout.release_dir / "roar-tracer", "release-tracer\n")
+    _write_file(layout.release_dir / "roar-tracer-preload", "release-preload\n")
+    _write_file(layout.release_dir / "libroar_tracer_preload.so", "release-library\n")
+
+    sync_packaged_rust_artifacts(layout)
+
+    assert (layout.package_bin_dir / "roar-tracer").read_text(
+        encoding="utf-8"
+    ) == "release-tracer\n"
+    assert (layout.package_bin_dir / "roar-tracer-preload").read_text(
+        encoding="utf-8"
+    ) == "release-preload\n"
+    assert (layout.package_bin_dir / "libroar_tracer_preload.so").read_text(
+        encoding="utf-8"
+    ) == "release-library\n"
diff --git a/tests/unit/test_tracer_backends.py b/tests/unit/test_tracer_backends.py
index d6e132b9..9d4047b6 100644
--- a/tests/unit/test_tracer_backends.py
+++ b/tests/unit/test_tracer_backends.py
@@ -1,5 +1,6 @@
 """Tests for shared tracer backend discovery/readiness helpers."""
 
+import subprocess
 from pathlib import Path
 from unittest.mock import mock_open, patch
 
@@ -20,6 +21,60 @@ def test_preload_is_ready_requires_library(tmp_path: Path) -> None:
     assert reason == "preload library not found"
 
 
+def test_preload_is_ready_probes_launcher_execution(tmp_path: Path) -> None:
+    package_path = tmp_path / "roar"
+    package_path.mkdir()
+
+    launcher = tmp_path / "roar-tracer-preload"
+    launcher.write_text("")
+    library = tmp_path / "libroar_tracer_preload.so"
+    library.write_text("")
+
+    def _run(command: list[str], **_: object) -> subprocess.CompletedProcess[str]:
+        Path(command[1]).write_text("{}")
+        return subprocess.CompletedProcess(command, 0, "", "")
+
+    with (
+        patch.object(tracer_backends, "find_preload_library", return_value=str(library)),
+        patch.object(tracer_backends.subprocess, "run", side_effect=_run),
+    ):
+        ok, reason = tracer_backends.preload_is_ready(package_path, str(launcher))
+
+    assert ok
+    assert reason is None
+
+
+def test_preload_is_ready_reports_probe_failure(tmp_path: Path) -> None:
+    package_path = tmp_path / "roar"
+    package_path.mkdir()
+
+    launcher = tmp_path / "roar-tracer-preload"
+    launcher.write_text("")
+    library = tmp_path / "libroar_tracer_preload.so"
+    library.write_text("")
+
+    with (
+        patch.object(tracer_backends, "find_preload_library", return_value=str(library)),
+        patch.object(
+            tracer_backends.subprocess,
+            "run",
+            return_value=subprocess.CompletedProcess(
+                [str(launcher)],
+                1,
+                "",
+                "roar-tracer-preload: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found",
+            ),
+        ),
+    ):
+        ok, reason = tracer_backends.preload_is_ready(package_path, str(launcher))
+
+    assert not ok
+    assert reason == (
+        "preload launcher probe failed: "
+        "roar-tracer-preload: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.39' not found"
+    )
+
+
 def test_backend_ready_auto_prefers_preload_when_ebpf_not_ready(tmp_path: Path) -> None:
     package_path = tmp_path / "roar"
     package_path.mkdir()
diff --git a/tests/unit/test_tracer_data_loader.py b/tests/unit/test_tracer_data_loader.py
index 1d974f99..de01f96e 100644
--- a/tests/unit/test_tracer_data_loader.py
+++ b/tests/unit/test_tracer_data_loader.py
@@ -87,3 +87,35 @@ def test_explicit_legacy_arrays_take_precedence_over_files(self, tmp_path: Path)
         assert data.written_files == ["/legacy/written.txt"]
         # Raw files are still retained for callers that need richer info.
         assert data.files == [{"path": "/from/files.txt", "read": True, "written": True}]
+
+    def test_preserves_thread_aware_file_contract_fields(self, tmp_path: Path) -> None:
+        report = {
+            "version": 1,
+            "tracer_mode": "preload",
+            "files": [
+                {
+                    "path": "/repo/native.txt",
+                    "read": True,
+                    "written": True,
+                    "read_threads": [101, 202],
+                    "written_threads": [202],
+                }
+            ],
+            "processes": [],
+            "start_time": 1.0,
+            "end_time": 2.0,
+        }
+        report_path = tmp_path / "trace.msgpack"
+        _write_msgpack(report_path, report)
+
+        data = DataLoaderService().load_tracer_data(str(report_path))
+
+        assert data.files == [
+            {
+                "path": "/repo/native.txt",
+                "read": True,
+                "written": True,
+                "read_threads": [101, 202],
+                "written_threads": [202],
+            }
+        ]
diff --git a/uv.lock b/uv.lock
index 79b6fd35..6b9c62d6 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1293,7 +1293,7 @@ wheels = [
 
 [[package]]
 name = "roar-cli"
-version = "0.2.7"
+version = "0.2.9"
 source = { editable = "." }
 dependencies = [
     { name = "blake3" },