Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rearrange metric enablement, so that model metric reporter can procee… #321

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ClifHouck
Copy link

…d properly.

Addresses triton-inference-server/server#6815

The fix is to enable GPU metrics (assuming they're enabled at compile time and by the user at run time) prior to calling MetricModelReporter::Create. If GPU metrics are not enabled then MetricModelReporter::GetMetricsLabels will not get/populate relevant GPU labels.

@ClifHouck
Copy link
Author

To elaborate why this change fixes GPU metrics labels: enabling GPU metrics before initializing the server around line 2396: tc::Status status = lserver->Init() allows metric labels to be populated with GPU information.

@ClifHouck ClifHouck force-pushed the clif/fix_enablement_of_metric_labels branch from f93cf3a to b0a970d Compare January 26, 2024 14:29
@dyastremsky
Copy link
Contributor

dyastremsky commented Feb 21, 2024

Thank you for this PR!

These look good to me. Adding @rmccorm4 as a reviewer as well, since he is more familiar with these files.

Once Ryan is good with these changes, we'll run them through the CI and merge once all passes.

@dyastremsky dyastremsky self-assigned this Feb 21, 2024
@rmccorm4
Copy link
Contributor

rmccorm4 commented Feb 21, 2024

Hi @ClifHouck, thanks for this contribution!

While you have figured out a way to have the existing logic propagate the GPU labels to the generic per-model inference metrics - I wouldn't exactly say this is a bug at the moment.

Our per-model metrics are currently aggregated per-model, even if technically under the hood they are being tracked per-model-instance. By introducing these GPU labels for metrics other than the gpu mem/util metrics, it would start to expose the notion of per-model-instance metrics for the case of KIND_GPU models with multiple model instances.

To me I think there is some drawback to adding this support as-is, because it will introduce some inconsistency in how our metrics are reported and aggregated. With this change, KIND_GPU models will have per-model-instance metrics, but KIND_CPU/KIND_MODEL models will not. Similarly, I think this will also beg the question for models using multi-gpu (currently only supported via KIND_MODEL), why aren't the gpu_labels showing the multiple gpus being used for these model instances?

We have a ticket in our backlog (DLIS-3959) to increase the breakdown to per-model-instance metrics (generically for all model instances, irrespective of device type), but it hasn't been prioritized over other tasks yet. Not exposing the GPU labels for these inference metrics allows the metrics to be aggregated for consistency across all cases.


Can you elaborate more on your use case and needs, and how your proposed changes or our future changes for per-gpu or per-model-instance inference metrics would directly impact you?

Thanks,
Ryan

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking accidental merge while we discuss the above comments.

@ClifHouck
Copy link
Author

@rmccorm4 I have to disagree that this is not a bug. Given what you have said, there are at least two here:

  1. It is not possible for certain metric information to be gathered or initialized during server initialization. Clearly MetricModelReporter expected metrics to be decisively enabled or disabled by the time that InferenceServer::Init is called. I think that's a reasonable thing to expect.
  2. If MetricModelReporter shouldn't apply GPU labels to its metrics, then that code should be changed or removed.

I can add a commit to this PR which removes the gathering and applying of GPU UUID information to model metrics. That way we solve both issues outlined above.

@rmccorm4
Copy link
Contributor

rmccorm4 commented Feb 22, 2024

(1) Clearly MetricModelReporter expected metrics to be decisively enabled or disabled by the time that InferenceServer::Init is called. I think that's a reasonable thing to expect.

lserver->Init() initializes most components of the server, several of which are the components that get queried to perioridcally update metrics. For example, tc::Metrics::StartPollingThreadSingleton(); starts a thread to poll metrics from the PinnedMemoryManager, which is initialized along with the server. So swapping these two operations does not currently make sense without greater refactoring.

I agree that this flow may be a bit unintuitive currently, since the MetricModelReporters are initialized along with the models+model_repository_manager. In fact, if you were do use the --model-control-mode explicit and dynamically load a model with KIND_GPU after the server has started up, then the gpu labels will actually get populated for these per-model metrics.

I agree this should be resolved one way or the other to be consistent, but I think it's something we should take care to change and we need to balance our current list of priorities. If this behavior is having a significant impact on some workflow or use case, please do let us know. But otherwise I think this is something for us to revisit when we have the bandwidth to do so.

(2) If MetricModelReporter shouldn't apply GPU labels to its metrics, then that code should be changed or removed.

I agree that this code should probably be commented out with a note that it could be re-applied if per-model-instance metrics are exposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants