Skip to content

DeviceStatsMonitor lacks documentation #20807

@profPlum

Description

@profPlum

📚 Documentation

I only know of two places where DeviceStatsMonitor() is mentioned in the docs: here and here. Neither place documents what any of the metrics it logs mean!

  • e.g. "active.all.current" what does active mean? is this memory or compute? what units is it in?
  • or "active.large_pool.current" what is the large vs small pool?
  • Furthermore here it says "ensure that you’re using the full capacity of your accelerator (GPU/TPU/HPU). This can be measured with the DeviceStatsMonitor()" this implies it can measure GPU utilization but it does not say which metric records that.

cc @lantiga @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation relatedneeds triageWaiting to be triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @profPlum

      Issue actions

        DeviceStatsMonitor lacks documentation · Issue #20807 · Lightning-AI/pytorch-lightning