prometheus

History

Name		Name	Last commit message	Last commit date
parent directory ..
src		src
README.md		README.md
build.gradle.kts		build.gradle.kts
gradlew		gradlew

README.md

DJLServing prometheus metrics support

DJLServing metrics

DJL provide the types of metrics:

Server metrics: saved in $MODELSERVER_HOME/logs/server_metrics.log
Model metrics: save in $MODELSERVER_HOME/logs/model_metric.log

Prometheus metrics

Those metrics can be collected as prometheus metrics, you need set SERVING_PROMETHEUS environment variable or System properties to enable it.

Once you enabled prometheus metrics, you can use get metric management REST API to get metrics.

Built-in Metrics

Metric name	Description	Unit	Type
DJLServingStart	Counter for server startup	Count	Server metric
StartupLatency	Server startup latency	Microseconds	Server metric
GPUMemory_*	GPU memory	Bytes	Server metric
Response_2XX	Number of requests succeeded with 2XX response	Count	Server metric
RESPONSE_4_XX	Number of requests failed with 4XX response	Count	Server metric
Response_5XX	Number of requests failed with 5XX response	Count	Server metric
ServerError	Number of requests failed due to unknown exception	Count	Server metric
WlmError	Number of requests failed due to exceed job queue limit	Count	Server metric
CmdError	Counter for server startup failure due to invalid command line	Count	Server metric
StartupFailed	Counter for server startup failure due to unknown exception	Count	Server metric
ModelLoadingError	Counter for server startup failure due to failed to load model	Count	Server metric
RegisterWorkflow	Load workflow (model) latency	Microseconds	Per model metric
QueueTime	Time waiting in the queue	Microseconds	Per model metric
DynamicBatchSize	Dynamic batch size	Count/Item	Per model metric
RequestLatency	Request latency	Microseconds	Per model metric
DownloadModel	Download model latency	Microseconds	Per model metric
LoadModel	Load model latency	Microseconds	Per model metric
LoadAdapter	Load PEFT Adapter latency	Microseconds	Per model metric
ConvertTrtllm	Convert TensorRT-LLM model latency	Microseconds	Per model metric
RollingBatchSize	Average rolling batch size	Count/Item	Per request metric
TokenLatency	Average token latency	Microseconds	Per request metric
TokenThroughput	Average per token throughput	Count/Second	Per request metric
OutputTokens	Average output tokens per request	Count/Item	Per request metric
Preprocess	Average pre-processing latency	Microseconds	Per request metric
Inference	Average model inference call latency	Microseconds	Per request metric
Postprocess	Average post-processing latency	Microseconds	Per request metric
Prediction	Average model prediction call latency	Microseconds	Per request metric

Per request metric

Per request metrics is disabled by default, set the following option to enable per request metric:

log_request_metric=true

By default, per request metrics will be aggregated every 1000 requests. You can change aggregated with the following:

metrics_aggregation=100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

prometheus

prometheus

README.md

DJLServing prometheus metrics support

DJLServing metrics

Prometheus metrics

Built-in Metrics

Per request metric

Files

prometheus

Directory actions

More options

Directory actions

More options

Latest commit

History

prometheus

Folders and files

parent directory

README.md

DJLServing prometheus metrics support

DJLServing metrics

Prometheus metrics

Built-in Metrics

Per request metric