Skip to content

[Bug]: Can't save profiling data to GCS Bucket - VLLM_TORCH_PROFILER_DIR coerced with os.path() in base vllm library #1034

@RobMulla

Description

@RobMulla

Your current environment

The output of commands above
Your output of commands above

🐛 Describe the bug

When setting the VLLM_TORCH_PROFILER_DIR environment variable to a Google Cloud Storage (GCS) bucket path (e.g., gs://my-bucket/profiles) does not work as expected. Instead of saving the profiling data to the GCS bucket, it creates a local directory with a malformed name (e.g., ./gs:/...).

This issue stems from the base vllm library, which processes the VLLM_TORCH_PROFILER_DIR variable before it is used by tpu-inference. The base library unconditionally applies os.path.abspath() to the path, which is not compatible with GCS URIs and incorrectly converts it to a local path.

The problematic upstream code: https://github.com/vllm-project/vllm/blob/5e0c1fe69c516fe4796965185c7d7ca503e44e92/vllm/envs.py#L821-L827

# vllm/vllm/envs.py
"VLLM_TORCH_PROFILER_DIR": lambda: (
    None
    if os.getenv("VLLM_TORCH_PROFILER_DIR", None) is None
    else os.path.abspath(
        os.path.expanduser(os.getenv("VLLM_TORCH_PROFILER_DIR", "."))
    )
),

I'd suggest that we use a different ENV variable for profiling - since we are actually using the JAX profiler - it doesn't make sense to use a env variable with TORCH in the name - but we could phase it out. The vllm profiling docs could be updated to mention the new flag.

Something like VLLM_JAX_PROFILER_DIR would:

  • Avoid the naming confusion with the PyTorch profiler.
  • Allow tpu-inference to handle the path logic correctly without being affected by the base library's processing.
  • Provide a clear separation of concerns between the GPU and TPU profiling.

Before submitting a new issue...

  • Make sure you already searched for relevant issues and checked the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions